Professor Hartzog on Web Scraping, AI, and Privacy

On today’s episode, we are excited to welcome Professor Woodrow Hartzog from the Boston University School of Law. Professor Hartzog is a renowned privacy and technology law expert and a prolific writer. His upcoming paper, “Two AI Truths and a Lie,” will soon be published in the Yale Journal of Law and Technology. Additionally, he is well-versed in web scraping, a topic we previously explored here at A Little Privacy, Please!

Professor Hartzog, welcome. Can you refresh our audience’s memory of what web scraping is in the first place?

Web scraping is one of many techniques companies and individuals can use to collect information from the internet using automated means—bots in the background collecting and storing all sorts of information for later use. In very simple terms, that’s what web scraping is.

Companies use it all the time for things like searching and cataloging data for search engines. It is used for information collection. It is used for data collection; some are benign, and others are perhaps more objectionable. Scraping is used by companies to collect people’s photos and biometric information and to power facial recognition software in systems.

Can you help us understand your objections to the practice of scraping publicly available personal data?

My objection is largely around the idea that just because information is made publicly available on the World Wide Web, that it can no longer be private or have any sort of privacy related concerns. I think that’s not the way in which people live their lives, and it’s certainly not in step with most major principles of privacy law.

To begin, just because something is publicly available doesn’t mean it’s widely known. Sometimes, we share information with some but not all. Sometimes, we share information within certain contexts. If it’s scraped from that context and used in a different context, then that violates what I would say is a reasonable expectation of privacy of that information.

The idea that this information is just fair game, I think, is very misguided. Daniel Solove and I argue in the article “The Great Scrape: The Clash Between Scraping and Privacy” that it’s anathema to basically every major principle of data protection and privacy law. And so, a reckoning needs to happen to better align the concept of scraping with information privacy law. Because right now, it’s just sort of in complete opposition to a lot of basic, very established principles.

You’ve expressed the view that consent is a broken regulatory mechanism. Can you explain to the audience what you mean by that?

So much information collection is, of course, justified through the concept of consent. That may have made sense back in the 1960s or 70s when databases were relatively rare and expensive, and the idea of our information getting listed in a database was the thing that was worthy of asking consent on a certain per collection or per use basis. But now, the idea that consent could mean that it regulates information collection and use, I think, is just broken for three reasons.

One, it’s overwhelming. We used to maybe be asked once or twice every once in a while, if our information could be used or collected. Now, we pick up our phones multiple times per minute, so information is collected and exchanged every time we do that. The idea of having to time and time again say yes or no just wears us down—we’ve all experienced it every time we’ve seen a cookie banner or had to click an “I Agree” button. It’s overwhelming.

It’s also illusory. The idea that consent gives anybody meaningful control or agency over personal information is a figment. It’s not as though I can call Google up on Wednesday and say, “Google, I only want you to collect my geo-location when I’m driving to the pizza place, or I’m driving to work, and that’s it. And no other times. I don’t want my information to be used for x, y, or z things, and I want you to send me weekly updates about how that’s being done on my terms.” Of course, none of that is true. We can only click on the buttons that we’re given. We can only adjust the dials and knobs that they pre-created for us. It really is an illusion of agency. It’s sort of like an agency theatre, autonomy theatre, that’s being given to us when we’re really meaningfully exercising control, particularly, because it’s so hard to do threat modeling about the risks that could occur in the future. We’re just not meant to be able to process that sort of risk when we’re standing at the cash register or when we’re staring at the “purchase now” button. And so, it’s illusory.

And finally, control is myopic. The idea that the collective wisdom of billions of individual self-motivated decisions is what’s best for the overall collective use of data, I think is wrong-headed because I may only worry about what this data is going to do for me, but I don’t think about how my data is going to be used to train a system that then is going to be used to surveil marginalized communities, people of color, and members of the LBGTQ community that feel the brunt of that surveillance more significantly and more quickly than I would. And so, the idea that only individually self-motivated decisions should be determining our data. I think, is just myopic because there are bigger social concerns here.

And for those three reasons, I think that control is fundamentally broken.

Do you think that criticism over AI is overblown?

Sometimes, people say, “We’ve been worried about technology for a long time. People were worried about television when it first came out. This is all just a moral panic.”

I think there’s significant reason to worry about this technology and its affordances because the framework that we’ve got in place right now, the information privacy law framework, is basically built to normalize any sort of extractive behavior and surveillance behavior.

The way in which we become weaker with our relationships with technology and much more vulnerable happens just a little bit at a time. And so, we say, is this a privacy violation? And then everyone looks and says, well, it’s creepy, but it’s not exactly a legal violation. So, we get accustomed to it, and then the next privacy violation happens, and it’s creepy again, maybe a little more creepy than last time. We’ve already gotten accustomed to that previously creepy action, so again, it doesn’t trigger any sort of meaningful threshold, and again, we allow it to happen. Over time, we’ve become normalized to that.

And so, our frameworks are such that the law will allow anything humans can be conditioned to tolerate, and we are right now on track to tolerate everything. So, I worry about the long-term trajectory of this. I reject it as the idea of a moral panic. People are being injured right now. People are being marginalized. People are being micromanaged into misery in lots of different ways, and so the idea that this is just a moral panic, I don’t think, is borne out even by what’s happening right now.

A Little Privacy, Please!

A Little Privacy, Please!

Professor Hartzog, welcome. Can you refresh our audience’s memory of what web scraping is in the first place?

Can you help us understand your objections to the practice of scraping publicly available personal data?

You’ve expressed the view that consent is a broken regulatory mechanism. Can you explain to the audience what you mean by that?

Do you think that criticism over AI is overblown?

Insights And Happenings

Cybersecurity and Data Breach Response

India's Evolving Privacy Landscape

Strengthening Municipal Cybersecurity