Andrea V. Bajcsy,
The presence of marginal marks on voting ballots is a ubiquitous problem in voting systems and it is a common source of dispute during federal and state-level elections. As of today, marginal marks are neither clearly countable as votes or as non-votes by optical mark scanners. We aim to establish quantitative measurements of marginal marks in order to provide an objective classification of ballot-mark types and ultimately improve algorithms in mark scanners. By utilizing 800 publicly available manually-marked ballot image scans from the 2009 Humboldt County, California election, we established a set of unique image features that distinguish between votes, non-votes, and five marginal mark types (check-mark, cross, partially filled, overfilled, lightly filled). The image features are related to semantic labels through both unsupervised and supervised machine-learning methods. We demonstrate the feasibility of developing an automated and quantifiable set of custom features to improve marginal mark type detection accuracy between 4 and 8 percent depending on classification model as compared to off-the-shelf features.
NIST Interagency/Internal Report (NISTIR) -
voting, marginal marks, supervised classification