How Do They Track You? Let Us Count the Ways
In my article in Monday’s Times, “To Aim Ads, Web Is Keeping Closer Eye on What You Click,” I worked with comScore to develop a new measure for Web companies: how much data they can collect from users.
On the Internet, companies are typically ranked by how many different people visit their sites in a given month. And when Microsoft announced its $41 billion bid for Yahoo, comScore and Nielsen Online promptly put out estimates counting how many people would be in the merged company’s total audience.
But audience size is not everything in the online world. Advertisers increasingly want media companies to find their most likely customers and show their ads only to those people, rather than to the site’s entire audience.
Such targeted advertising requires data, so there’s a good argument to be made that we can spot the companies that will lead the pack in online advertising by looking at the depth of data that large media companies can collect about each of their Web visitor. Here is some more detail about the methodology comScore and I came up with:
The comScore study tallied five types of “data collection events” on the Internet for 15 large media companies. Four of these events are actions that occur on the sites the media companies run: Pages displayed, search queries entered, videos played, and advertising displayed. Each time one of those four things occurs, there is a conversation between the user’s computer and the server of the company that owns the site or serves the ad.
The fifth area that comScore looked at was ads served on pages anywhere on the Web by advertising networks owned by the media companies. These include text ads provided by Google’s AdSense network, for example, and display ads from AOL’s Advertising.com unit. Ad networks add the ability for these companies to note where you are on other Web sites when they serve you an ad. Google, for example, can note that your Internet Protocol address is on Kelly Blue Book, if it serves you an AdSense ad there.
So each time one of these five things occur, it is an “data collection event.” The data that is transferred varies for each. Typically, Web company receives information about the type of page the user is looking at, the user’s I.P. address (which sometimes has clues to the user’s location), and for advertising, the content of the ad. Most Web sites and advertising networks place cookies on users’ browsers, allowing them to recognize each time they interact with that user in the future. Cookies themselves don’t identify the name of users, but if users register with a Web site, their identities can be linked to their cookies.
When all these data collection events are combined for users in the United States in December 2007, Yahoo had the potential to gather data, through 400 billion events in the month. Time Warner, which includes AOL, was second, with about 100 billion events. Google was not too far behind with 91 billion.
Interestingly, Microsoft, with 51 billion events in December is far behind not only the other big Internet companies, but also the News Corporation’s Fox Interactive Media, which owns MySpace.
Below is a view of this data. Here is an image that shows the data behind the graphic, as well as a version of the data that shows the average number of data collection events for each of the company’s users.
What is important here is not the precise numbers, but the overall picture that the biggest Internet companies are accumulating many different ways to collect data about users. Many caveats are needed: Not all of this data is useful; not all of it is retained by the companies with access to it; much of it cannot be traced back to individuals.
Moreover, this method often identifies several data collection events on a single Web page. That is because one page can contain search results, video players, and ads from several sources, each of which can send different data in a different direction.
Another caveat: ComScore’s method of measuring advertising networks has limitations that make it difficult to compare one network to another. For the networks run by Yahoo, Microsoft and AOL, comScore doesn’t count how many ads they actually display, but how many pages their ads could appear on. This substantially overcounts the networks’ data collection because some Web sites have several networks that compete to place ads on their pages. ComScore counts the page views on those pages – without knowing if that network did in fact serve an ad on that page view. So the ad network tallies for these companies represent potential data collection events, rather than definite ones.
For Google, comScore can actually identify when ads from its AdSense network are loaded on a Web
page. but this measure could overstate Google’s potential to collect data. That’s because Google may display several short text ads on one page, and comScore counts each of those text ads separately. To compensate in this study, comScore tried to figure out how many pages Google ads are loaded on pages. It took its count of ads displayed and divided that by 4.17, its estimate of the average number of AdSense ads that appear together on a page.
ComScore’s December 2007 figures for AOL, moreover, do not include the reach of Tacoda, the behavioral targeting firm AOL just bought.
I do not suggest using the ad network figures to make comparisons between the Internet giants. Instead, you should look at them as potential expansions of these companies’ reach. They do collect significant data from their ad networks – but possibly not as much as suggested by these figures.
These comScore figures – though eye-popping – provide only a minimum level of data collection events. There are other ways these companies obtain data that comScore was unable to capture. The two largest ways left out here are ad-serving data (from the likes of Microsoft’s Atlas and Google’s desired partner DoubleClick) and user-volunteered data. By the latter, I mean the information that users enter when they register for sites or e-mail accounts as well as all the juicy details they post on social networking pages.
Arnie Gullov-Singh, vice president of advertising technology at Fox Interactive Media, the owner of MySpace, likes to call this sort of information “hand-raiser data,” since people choose to type it in.
I hope what I’ve done here will start a conversation. It would be fascinating to see someone try to quantify the aspects of data collection left out of this analysis. Atlas serves 6 billion ads per day, for example, which could be added in.
It is also well worth watching whether most of the data proves lucrative. Perhaps there will be diminishing returns at some point, though Mike Galgon, chief advertising strategist at Microsoft (and co-founder of aQuantive), told me he didn’t think there would be.
Consumers get all kinds of free services and content on the Web because they are shown ads, and media companies are increasingly showing them ads based on data they have collected about them. So, in a sense, consumers “pay” for free content and features like e-mail by letting companies collect this data about them.
When regulators evaluate mergers from a consumer protection standpoint, they consider whether mergers would end up raising the prices that consumers pay for those companies’ products. Since people “pay” with information about themselves on the Internet, rather than with dollars, regulators should consider consumer data when they consider mergers.
If Yahoo is to merge with Microsoft or any company, the merged company will be an entity that has significantly more data about consumers. Will consumers get more – or better – free services in exchange?