The Perils of Big Data Policing

Print More
big data

Illustration by KamiPhuc via Flickr.

As the criminal justice system comes to rely increasingly on computer analytics, so-called “big data” that captures and collates enormous amounts of information are being used in many cities to identify potential lawbreakers. But this form of predictive policing has also been criticized for its lack of transparency and potential for racial bias.

big dataIn his new book, The Rise of Big Data Policing: Surveillance, Race, and the Future of Law EnforcementAndrew Guthrie Ferguson, a law professor at the University of the District of Columbia, argues that the criticisms are valid, and should generate a conversation about the use of new technologies before this form of policing becomes institutionalized.

Ferguson, in a recent conversation with TCR’s Julia Pagnamenta, discussed the economic factors that have driven the use of big data in police departments, how it is has spread to prosecutors and judges, and why cities should consider holding regular “surveillance summits,” so communities can examine what’s being done in their name.

The Crime Report: In your book, you argue that explaining this technology to the public is only one element of the challenge. Can you elaborate?

Andrew Ferguson: One of the responses to the (growth) of big-data policing is the fear of a lack of transparency. People ask, “What do you mean I am being targeted by an algorithm that I don’t understand and that I can’t see?” But that doesn’t feel right to me. I make the argument that while we should be debating and thinking about this idea of transparency, transparency itself may not be an answer.

If you go forward just a few years, when we’re really going to be talking about machine learning, artificial-intelligence predictive systems are designed (so) you can’t look into them and see how they’re made, because they are constantly learning from the data that’s being inputted, which keeps changing, and so if you went back to look at what they did, it has changed already, because that’s how it’s being taught to learn.

We have to come up with systems of accountability, and that may (involve) a chief of police or the city council, or whoever is funding this, explaining to citizens why they are using this technology, why they think it’s accurate, why they think it’s a good use of taxpayer dollars, and what they’re going to do to audit it, to make sure it is working in the future. I call those moments of public accountability “surveillance summits.” We need to have surveillance summits in every city in America.

(If) you deal with the problem of transparency in that way, it won’t matter if you don’t understand what an algorithm is, and it won’t matter whether you understand what artificial intelligence does. You’ll feel comfortable that there is an open conversation (involving) the citizenry, the technologists, the civil libertarians, and also the police who have to justify why they think this particular technology makes sense for their particular city.

TCR: Big data isn’t singular to the criminal justice system. Our information is being collected every day when we use our credit cards or our smart phones.

Ferguson: In many ways the rise of big data in a consumer space far outpaces where we are in the criminal space. We live in a world where Google knows everything we search for; Amazon knows everything we’ve bought; our smart cars know everywhere we’ve driven; our smart houses reveal when we leave for the day, when we go home. Our digital trails are revealing the patterns and practices of our lives. Big data in a consumer space has recognized that insight and in some way has tried to monetize it. We are the product. Our data is the product. We are being sold using the data trails, and in many ways we’ve bought into that by the convenience of this data.

What began as a database—although certainly not a big database—is the idea of CompStat, of being able to start mapping crimes and understanding crime patterns using crime data. (Former New York Police Commissioner) Bill Bratton built a data- driven policing system. When he went to Los Angeles, he was overseeing the origination of predictive policing as we now know it, and he brought it back again to New York when he took over the NYPD again in his last iteration there. After the stop-and-frisk practice was declared unconstitutional, Bratton said, “don’t worry, we have a new technology that’s going to replace it, called predictive policing.” The NYPD is up there with the LAPD as the most sophisticated local surveillance system that we have in America. They have networked cameras. They are using their own social networking analysis. They’re mapping crimes. They’re connecting with the Manhattan D.A.’s office to prosecute crimes. They are the cutting edge of how big data technology is changing law enforcement.

TCR: Everything we do is being tracked. Do privacy issues even enter the debate in the criminal justice arena?

Ferguson: There is definitely a debate about it. In the book I point out how, as our lives have shifted to social media, and as our virtual selves are playing a role in communicating with other people, so has crime. People who are involved in criminal activity and gangs are posting threats on social media; they are bragging about crimes on social media. Naturally police are watching, building cases. That obviously impacts individual privacy rights.

The Supreme Court [recently] heard a case, Carpenter vs. the United States about whether our third-party records, in that case cell phone information—really if you think about our digital lives, everything is a third party record—can be obtained by police without a warrant or whether police need a probable cause warrant to obtain some of this information. When that case is decided in the next couple of months, it may very well shape our feelings about privacy and some of our expectations of privacy under the Fourth Amendment. What we share with others, we don’t necessarily think we are sharing with police, but without either legislation guiding us or constitutional law guiding us, it’s pretty uncertain about whether we can really claim any expectation of privacy in things we are putting out in the world.

TCR: How has big data affected the way prosecutors do their job?

 Ferguson: Cy Vance, Jr., and the Manhattan D.A.’s office, have been at the forefront of pushing a new type of prosecution, which is part data-driven, where they are looking for the areas that are creating and generating violence; and part intelligence driven. They call it intelligence-driven prosecution. Intelligence in the sense of how intelligence analysts and the intelligence community might try and figure out whom to target. Their stated goal is to identify the individuals who are the prime drivers (of crime) in a community, and take them out by any means they can.

Andrew Ferguson

Andrew Ferguson

They also get community intelligence from gang intelligence detectives, and other community organizers, to try to take out those people, under the theory that if they can incapacitate these folks, crime overall will go down. They have recognized that one way to do that is build a big data infrastructure.

(Like police), they too have partnered with companies like Palantir which built a data system to track various people. Lots of people get arrested in Manhattan, and if one of their targets should show up, even on a low-level offense like (subway) turnstile jumping, they’ll know about it, and they can get that information to the line D.A. in time because it’s sort of automated—instead of someone cycling through the system and getting out because it’s a low level offense, there will be a different request for a bail hold. There will be different sentencing considerations; there will be less of an opportunity to plead guilty on the theory that they are trying to bring this person out of the society, figuring that if person is out, they will reduce crime. It’s really been a change in mentality (for) prosecutors.

TCR: Critics of these strategies say they disproportionately affect the city’s more marginalized communities.

Ferguson: I think (you) run the risk of targeting the data you are trying to collect. So if what you care about is low-level “broken windows” policing and that becomes the data you are looking for, it’s obviously going to change policing patterns, and result in, lots of poor people being brought into the system who wouldn’t have otherwise if you weren’t targeting them.

It doesn’t have to be that way. You can use data-driven policing to target folks who you think are the most violent and most at risk—and not target others who aren’t in that category. Data is dependent on how you use it, and who uses it, and the choices being made to utilize it. There are many fair criticisms on the way the data-driven system changed what police were doing in New York. It became very much about quantifying arrests, and not about quality of life or policing.

We are able to target people that we think are the most at risk for violence in Chicago and intervene in a way that has similar concerns about who are the people being targeted. They tend to be poor; they tend to be people of color. They tend to live in certain communities. Is that simply reifying the same kinds of profiling by algorithms that we might be concerned about? I think these are real questions and concerns we should be raising. We should be having that debate right now, and (examine) how these new technologies are playing out in our communities.

TCR: You note that these algorithms falsely assess African-American defendants as high risk at almost twice the rate of whites. How do these high-priority target lists affect African American communities?

Ferguson: Take Chicago again. Part of the input of the Chicago “heat list” are arrests, and we know that arrests are discretionary decisions of police. We know that they are impacted by where police go, where they are sent, where the patrols are, where they are looking. You can’t be arrested if no one is looking at what you are doing.

We also know in Chicago, thanks to the Department of Justice Civil Rights Division report in 2017 that there is a real problem of racial bias throughout the Chicago Police Department. It’s pretty systemic. So if you think that some Chicago police have either implicit or explicit biases and if they are using their discretion to make their arrests, and if those arrests become part of your big data system of targeting, you have to worry that that discretion and that bias is going to affect the outputs of whatever system because it’s affecting the inputs.

My concern is that we tend to stop thinking about racial bias when we hear that something is data-driven. It sounds objective. It sounds like it doesn’t have the same concerns; it doesn’t raise the same concerns of ordinary policing. But if your inputs are based on ordinary policing, and that has some problems in some cities, well some of your outputs are going to be based on those and that’s a real problem.

(That is) part of the reason I wrote the book. People talk about this move to data- driven policing, as if it is response to the concerns we saw in Ferguson, and in Staten Island, and in Chicago. Really, the same concerns exist and we need to make sure that we are aware and conscious of them, and are working to overcome them, because they can be overcome.

TCR: You refer to Chicago quite a bit in the book. How have certain of the city’s neighborhoods, and their residents, been affected by these algorithms?

Ferguson: Chicago is at the forefront. They have created what they’ve called the strategic subjects list, also known as the heat list. And that list essentially looks at individuals who they believe are the most at risk of violence, or being the victim of violence, or the perpetrators of violence in society, and this is the algorithm. They look at past arrests for violent crimes. Narcotic offenses, and weapons offenses. They look at whether the person was a victim of a violent offense, or a shooting. They looked at the age of the last arrest; the younger the age, the higher the score. And they look at the trend line: is this moving forward; are there more events happening, or is it slowing down; is age less a factor?

They take those numbers, punch them into an algorithm, to come up with a rank ordered score from 0 to 500 about who are the most at-risk people, and then they act. If you have a 500-plus score, a Chicago Police detective or administrator shows up at your door, maybe with a social worker, and says: “Look you are on this list. And we know who you are, we know what you are doing, we know that you are at risk, and you’ve got a choice. You can either turn your life around, or if you don’t, we’re going to bring the hammer down. We know who you are and we are warning you.”

They might bring you in as a group, as a sort of call-in session, a scared-straight-session where they do the same thing but in a group setting. It is a measure of social control. It’s a measure of possibly offering social services (if there were money to offer those social services), and it is a recognition that there is a targeting mechanism going on in this town.

Depending on who you ask, it has not worked, or worked. It fluctuates. Shootings have obviously been a terrible problem in Chicago in certain districts where they are using it. Just this last year, they claim shootings have gone down, so they are taking credit for it.

Rand did a study of the early version of the strategic subjects list and said look we can’t find any correlation. It seems like it really became shorthand for the virtual most-wanted list of people you want to target anyway. The Chicago Police Department response to that was, well we changed the algorithm. We think we’ve improved it, and in some ways that’s a fair answer, because that’s what you do with computer modeling. You change it, if it’s not working, or if you think you can improve it. Computer models are supposed to keep evolving.

TCR: What do the police mean when they say these algorithms are working? 

 Ferguson: If you listen to the folks in Chicago who are defending it, they say: Look, if you want to know the people who are getting shot, they are on our list. Like 80 percent of the people are on our list on a particular weekend, and so we’re not wrong. We know that there are certain lifestyles and certain actions and certain groups that are more likely to be shot.

They tend to be folks who are in certain social groups, or gangs; they tend to have, a distrust of police, so if there is a violent action they respond with violence. This sort of reciprocal violence keeps things going forward, and the (police) response is “look, we’re not wrong about the people we are targeting, whether or not we can reduce violence.”

This isn’t an attempt to create some magic black box. There is a theory behind it, and the theory is that there are certain risk factors in life that will cause you to be more at-risk of violence then other people, so targeting those people might be an efficient use of police resources in a world of finite resources. The problem is that we just don’t know if the input is all that accurate and we don’t really know if this is the best use of our money and time and energy. It sounds like an idea or a solution when you don’t want to face the real solution.

Maybe, instead of using a “heat list,” we should invest in schools, invest in communities, invest in jobs, and invest in social programs that will change lives. But it takes more money than people are willing to spend. The chief of police of Chicago has one of the most difficult jobs in America. He has to answer the question, what are you going to do about crime? Sometimes (technology) is enough to quiet the critics who want you as chief to do the impossible, which is stop the shootings without the resources.

TCR: In the book you contrast New Orleans’ Palantir system with Chicago’s use of preventive algorithms. How do they differ?

Ferguson: In New Orleans had a terrible shooting problem, and the mayor partnered with Palantir to see if they could figure out who are the people who are most at risk for either being involved with violence or being violent themselves. And then to explore whether there are program services that can target them.

The difference I think, in New Orleans is that, at least initially, in addition to person-based identifications, they also brought in a lot of city data. They started looking at where the crime patterns were. They investigated whether there were institutional players who could be leveraged to stop some of the patterns of violence.

For instance, they’re able to say, that when students are let out of school at certain places there tend to be violent fights between different groups. Could we change the environment so when kids get out of school, they are not going to start the fights? Are there places where the lighting system is such that we constantly see crimes, because of course if you want to do a crime, you might want to go where no one can see you….Initially this sort of holistic data collection of city, local, state, and also law enforcement, brought down some of the shootings in New Orleans.

Unfortunately, it’s gone up since I published the book. But data can be used in a more holistic, constructive way, to identify some of the same problems (and) doesn’t have to necessarily require a pure policing response. It might actually be better to invest in social services, invest in fixing up neighborhoods. That might actually have a greater impact on crime reduction than simply putting a police car, or a police officer at a door, or a street corner.

TCR: You propose creating a risk map in the book. Can you talk about this suggested shift from mapping crime to mapping social services?

 Ferguson: I write about how, if you separate out the risk-identification innovations from the policing remedy, you might get other uses of predictive analytics. You could have a map of all the people who don’t get enough food in their lives, or all the individuals who’ve missed four days of school for whatever health reasons. You can use the same sort of identification for the social risks of society, for people who are in need.

Right now, we focus our energy on crime and policing because that’s where the funding came from, but it doesn’t have to be that way. Maybe some of the innovations being created by these same companies, and by these same technologies, can be used for different purposes, and maybe with different outcomes.

TCR: You say that we should be viewing violence as a public health problem rather than law enforcement related.

 Ferguson: You have to understand why big-data policing has arisen at this time. In part it was a reaction to the recession, and it was a reaction to the fact that police officers, and administrators, and departments all across the country were being gutted.

Police needed a solution. (They could claim) that predictive policing technology is going to help us do more with less. And when you are in that mindset, it’s really hard to then also ask for help with what we know are the real crime drivers: poverty, hopelessness, lack of economic opportunity, lack of education, and an inability to sort of get out of a certain socioeconomic reality. As we’ve moved out of the recession, data (remains) our guiding framework for policing. I think there is an opportunity for chiefs to say, we are at a better place to figure out why crime is happening in this area. We can tell you where it is. We can tell you the people involved, but there is something beneath that, which is the why?

The risk factor is here, and that’s in part because this parking lot has been abandoned for the last decade. If you fixed it up and made it into a nice park, maybe we wouldn’t have killings or robberies there, right? We know there are certain people dealing drugs because all the jobs in this area have gone away. If we could figure out sort of economic opportunity potentials for these same young men, we might not have that same crime problem.

The data can visualize what we all know is there, but does so in a different way that might potentially offer a way forward. We might have partnerships with police and communities that deal with some of the underlying environmental, socioeconomic issues, and the data can lead us to a part-policing, but also a part-social services, civic investment strategy.

TCR: Judges have also had to learn to adapt to this new technology. How has it influenced the outcomes of trials, and decision-making?

 Ferguson: I’m a law professor, and I got my start trying to figure out how new technologies like predictive policing and big-data policing would affect trial courts, and Fourth Amendment suppression hearings, and the idea of reasonable suspicion. Reasonable suspicion is the legal standard that is required for police to stop you on the street. They have to have some sort of particularized, articulable information that you are involved in criminal activity. In a small data world, which is the world we had when the law was being created, it was only what the officer saw. (For instance, a suspicious individual hanging around a jewelry store.)

Our laws are created based on that very tangible reality of what officers can see. In a big data world, there is a lot more information. Now that officer might know who that individual outside the jewelry store is. He might know his prior record. The data could tell him who the individual is associating with, and any prior criminal records. That information t has nothing to do with what that person is doing at that moment; but it might influence what the police officer thinks is happening. That’s natural. If you now know that an individual in front of you is not just a person walking by the jewelry store, but one of the computer driven, most-wanted, most at- risk of violence, it might change how you are suspicious of that person.

(In a similar way, big data) might change how a judge is going to evaluate that suspicion and say, well it makes sense. The officer knew from the dashboard that this person had a really high score. The judge knew that the reason for that high score was his gang involvement, and his prior arrests, and convictions for jewelry store robberies. Of course that should factor into his suspicion, and what has just happened is, data, big data, other information, has changed the constitutional analysis of how a police officer is watching an individual do the same thing.

That’s happening right now, as these cases make their way through the courts. And there’s a way that the courts haven’t really thought about what are we going to do with these predictive scores. What do we do if a police officer stops someone in a quote unquote predictive area of burglary? A computer told him to be on the look-out; how do we as a judge, or as a court, evaluate that information in our constitutional determination of whether this officer had enough suspicion? And we just haven’t seen great answers to that. We’ll see how it plays out in the future.

This conversation has been condensed and slightly edited. Julia Pagnamenta is a news intern with The Crime Report. She welcomes readers’ comments.

One thought on “The Perils of Big Data Policing

  1. Mr. Ferguson’s book is excellent. I operate a software company that serves law enforcement investigation and intelligence markets, and “The Rise of Big Data Policing” is now required reading here. This article was a nice synopsis, but for anyone truly interested in the subject of big data analytics and intelligence, the book is a must-read.

Leave a Reply

Your email address will not be published. Required fields are marked *