
Maintaining accuracy while reducing human burden
Our machine learning-driven pipeline reduced the number of new articles needing manual review by over 99 percent. To achieve this, tens of thousands of articles were ingested nightly and run through processing steps including deduplication, named entity extraction, and relevance classification via machine learning. We worked with our client to define the most important accuracy metric (in this case, at least 95 percent of decedents retained) and implemented checks to ensure our output remained within the acceptable amount of error.
While we focused the majority of manual review on the reduced set of articles produced by our pipeline to maximize efficiency, we also labeled a one percent random sample of all articles monthly. We used this to continually validate our model. Periodically we compared the model with human coders by examining a list of cases where the two strongly disagreed. Finally, we merged our results with lists of decedents from multiple external sources to confirm our false-negative rate. We identified the cause of any missing decedents and tweaked parameters as necessary to ensure such mistakes would not be repeated.
Our methodology was successful in standardizing data collection so that the Justice Department no longer needs to rely on voluntary reporting by local law enforcement. The BJS published our technical report, and our method for counting ARDs – as reported in The Guardian – is the most comprehensive official effort so far to accurately record the number of deaths at the hands of American law enforcement and provide the “national, consistent data” described by the U.S. Attorney General. Numerous media outlets covered the story including The Guardian who featured it in 2015 and 2016, fivethirtyeight.com who included it among the Best Data Stories of 2016 and The Measure of Everyday Life podcast.