Got Ads?
8/07/2006
  Is AOL's Search Dataset a Privacy Violation?

AOL releases a data set of search queries on their Labs site. The immediate reaction in the blogosphere was an massive outcry of "Privacy violation!". Here's famed blogger Mike Arrington on TechCrunch:

AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.

I think this is a massive over-reaction. Looking at the dataset it's not clear at all that it can be tied to an individual's IP address, much less their name, address, etc.

I'm surprised at the knee-jerk reaction. I think AOL's research team is doing something useful - share data so search can be studied and improved.

Here's a sample of the data

220 telephone directory ridgeville south carolina 2006-04-16 12:29:29  
220 florida atlantic university 2006-04-16 15:57:58  
220 florida international university 2006-04-20 06:18:32 5 http://hospitality.fiu.edu
220 house plans 2006-04-21 21:37:37  
220 house plans 2006-04-22 04:48:43  
220 house plans 2006-04-22 04:50:16  
220 house plans 2006-04-22 08:58:27  
220 windstorm insurance 2006-04-22 15:33:35 3 http://www.windnetwork.com
220 windstorm insurance 2006-04-22 15:33:35 9 

I don't see how Arrington's claims make any sense. Arrington's notion that the data will be easily analyzed and can "often lead people to easily determine who the user is" is strange. Certainly it's not easy to tie a query sequence to an individual, nor is it anything that any web site owner couldn't try to do with their web logs.

Could the data have a social security number in it? Of course it could, but what does that mean? Is that a violation of privacy for someone? How many people type their own social security numbers into web search engines... I'll try look for them in the dataset and report back.

In the end, I think the people decrying this type of data have somehow over-defined privacy to mean something rather imaginative and non-sensical.

 




<< Home

Subscribe to GotAds?



Links



Recent Posts

Is AOL's Search Dataset a Privacy Violation?


Archives

February 2005 /  March 2005 /  April 2005 /  May 2005 /  June 2005 /  July 2005 /  August 2005 /  September 2005 /  October 2005 /  November 2005 /  December 2005 /  January 2006 /  February 2006 /  March 2006 /  April 2006 /  May 2006 /  June 2006 /  July 2006 /  August 2006 /  September 2006 /  October 2006 /  November 2006 /  December 2006 /  January 2007 /  February 2007 /  March 2007 /  April 2007 /  May 2007 /  June 2007 /  July 2007 /  August 2007 /  September 2007 /  October 2007 /  November 2007 /  December 2007 /  January 2008 /  February 2008 /  March 2008 /  April 2008 /  May 2008 /  June 2008 /  July 2008 /  August 2008 /  September 2008 /  November 2008 /  December 2008 /  January 2009 /  March 2009 /