The Dirty Data Feeding Predictive Police Algorithms

Print More

Photo by G20 Voice via Flickr

The ability to predict crimes before they happen has long been a topic of fascination for science-fiction writers and filmmakers. In real life, predictive policing is getting a similar buzz, as dozens of police departments experiment with algorithm-driven programs to help them deploy resources more effectively.

But more attention should be focused on problems with the data that feed predictive algorithms, argues one researcher from the UC Davis School of Law.

“Predictive policing programs can’t be fully understood without an acknowledgment of the role police have in creating its inputs,” writes Elizabeth E. Joh, in a paper forthcoming in the William & Mary Bill of Rights Journal. Police aren’t just passive end-users of these data-driven programs–they generate the information that feeds them.

The difference between crime and crime data 

“A closer look at the ‘raw data’ fed to these algorithms reveals some familiar problems,” the study maintains.

Even under the best of circumstances, crime data only partially captures the actual crime that occurs in any given place. To become fact, a crime must first be discovered, investigated, and recorded by police.

See also: Measuring the “Dark Figure” of Crime

Racial bias is only one factor that can influence the way police record crime, as well as the rate at which they record it, writes Joh. Other factors include workplace pressures, contract disputes, funding crises, the seriousness of the offense, “wishes of the complainant, the social distance between the suspect and the complainant, and the respect shown to the police.”

Changes in policy, such as the “broken windows” campaign of the early 1990s, also leave an indelible footprint on crime data.

There is also a concern that the algorithms produce self-fulfilling prophecies. Send police to an area where crimes occurred in the past, and chances are that they’ll find something, reinforcing the the prediction.

As crime forecasting programs become more and more commonplace in police departments across the country, the consequences of data gaps will also grow in scale.

“Many of these issues will become even more difficult to isolate and identify as algorithmic decisionmaking becomes integrated into larger data management systems used by the police.”

The legitimacy of the “black box” algorithms themselves, which remain hidden behind proprietary information laws, is also uncertain. Last year, ProPublica investigated an algorithm created by Northpointe, Inc., and after comparing risk scores to actual recidivism rates, found the program to be only “somewhat more accurate than a coin flip.”

Ultimately, Joh cautions against “the assumption that algorithmic models don’t have subjectivity baked into them because they involve math.”

The same goes for law enforcement’s role in generating crime data: “As long as policing is fundamentally a set of decisions by people about other people, the data fed to the machine will remain a concern.”

A full copy of the report can be obtained here. Readers’ comments are welcome.

2 thoughts on “The Dirty Data Feeding Predictive Police Algorithms

  1. The army prepares for the last war, so why not have police responding to the last crime? that’s a loaded rhetorical question, in case you missed it.

    There is also some media about the algo of bail process, which is too driven by intimate personal factors like ZIP CODE! No wonder we hear the news of recently released parolee and his most recent murder – after being stopped with the (illegal handgun in possession) reason to put him back.

  2. Police data about detection is especially untrustworthy in unknown offenders cases, when they are known to involve innocent people on coerced confessions or evidence of snitches.

Leave a Reply

Your email address will not be published. Required fields are marked *