Got Ads?
9/23/2006
  How is ShoeMoney getting keyword lists?

ShoeMoney recently asked "What if you could get a list of every [key]word your competitor has been bidding on?"

Shoemoney (who's blog I like) demonstrated this in his comments, showing lists of keywords that send traffic to a wide variety of sites. A poster on the SEM2 mailing list wondered how this could be done.

I can think of 4 scenarios:

I seriously doubt that anyone has hacked into AdWords yet, but it is a lucrative target. And it will happen someday. But it's not a business model, nor the type of thing someone would promote.

Scraping Google probably isn't that sustainable on a large scale, but it would work reasonably well for a narrow set of sites (i.e. a small number of competitors). GoogSpy apparently works by scraping.

Toolbars - like the Yahoo toolbar, the Google toolbar, and hundreds of others sit in your browser and send data to a server based on every page you visit. They can provide a huge amount of valuable data about browsing habits - and in aggregate that data could be used to do very sophisticated targeting. Getting keywords that people typed in, or keywords used to get from an ad to a site would be a simple task with access to enough toolbar clients.

Finally, ISP proxy logs - which I think are the most likely source of Shoemoney's data can be used to capture clickstream. Hitwise uses ISP logs (along with some toolbar / panel data). Hitwise's logs represent the browsing activity of 10 Million users. Hitwise charges about $25k / year to get access to their tools.

So it's most likely that ShoeMoney has struck a deal with an ISP. Or at least he's getting web proxy logs somehow.

How could the data come from ISPs?

ISPs use proxies to reduce bandwidth costs. They can cache a large percentage of web page data, and serve the data from the proxy. In any case, they get to record quite a bit of clickstream data from the people accessing the web through their servers.

One of the things they record for each http request is referrer - which is a string that often has the query string from a search engine click...

By parsing their logs, and correlating clicks on ads / SERPs to destination pages, you can get a good idea of the keywords that advertisers are buying to drive traffic to their site, and the search strings people are typing to reach websites.

In a similar vein, the AOL dataset that caused a privacy kerfuffle when AOL research released it, could be used to derive this type of mapping of keywords / sites.

Of course, it is pretty hard to ensure that the results from processing millions of log lines are accurate. There are a huge number of variations and limitations in log processing, so it's to be expected that you can't find all the keywords with high accuracy. Furthermore, some keywords that an advertiser buys will never be clicked on at all by users in the clickstream one is processing. So those words won't show up in the final results, obviously.

Google, Yahoo and the other big sites have a TON of data like this. And they could do it incredibly accurately - not just for their customers, but for customers from other search engines. In other words, they "see" a huge number of referers.

I think it'd be interesting and good if google / yahoo somehow provided this data more transparently.

 




<< Home

Subscribe to GotAds?



Links



Recent Posts

How is ShoeMoney getting keyword lists?


Archives

February 2005 /  March 2005 /  April 2005 /  May 2005 /  June 2005 /  July 2005 /  August 2005 /  September 2005 /  October 2005 /  November 2005 /  December 2005 /  January 2006 /  February 2006 /  March 2006 /  April 2006 /  May 2006 /  June 2006 /  July 2006 /  August 2006 /  September 2006 /  October 2006 /  November 2006 /  December 2006 /  January 2007 /  February 2007 /  March 2007 /  April 2007 /  May 2007 /  June 2007 /  July 2007 /  August 2007 /  September 2007 /  October 2007 /  November 2007 /  December 2007 /  January 2008 /  February 2008 /  March 2008 /  April 2008 /  May 2008 /  June 2008 /  July 2008 /  August 2008 /  September 2008 /  November 2008 /  December 2008 /  January 2009 /  March 2009 /