More Long Tail: Everyone Is Still Wrong

blog archive
Author

Tom Slee

Published

September 25, 2009

Note

This page has been migrated from an earlier version of this site. Links and images may be broken.

Every now and then another study comes out about the long tail and gets discussed in the usual places. More often than not, the end result is additional confusion, because the thing they are talking about (Chris Anderson’s book) defines the concept in many different ways, depending on what the author feels like talking about at the moment.

The latest is a working paper called Is Tom Cruise Threatened: Using Netflix Prize Data to Examine the Long Tail of Electronic Commerce, by Tom Tan and Serguei Netessine of the Wharton Business School at the University of Pennsylvania [PDF, summarized here]. It got slashdotted here, written up in The Register here, and Chris Anderson responds to it here.

Here is what the paper does. It takes the sample of 100 million DVD ratings provided by Netflix for the recently-completed Netflix Prize, and breaks down the trends in ratings from 2000, when Netflix stocked relatively small numbers of titles and had relatively few users, to 2005 when Netflix had many more DVD titles and many more users. Then they ask whether demand for “hit movies” and “niche movies” increased or decreased over that time, as reflected in the number of ratings in Netflix’s sample data set. Not surprisingly, they notice that when measured absolutely (top 10, top 100) the demand for hits decreases and when measured in percentage terms (top 1%, top 10%) it increases.

The problem is that the Netflix Prize data set, while fascinating to explore, has nothing to say about the long tail by itself. Between 2000 and 2005 the DVD as a format exploded, with many old titles getting put on the new format, and Netflix exploded as the convenience of online movie rental took off. But comparing early Netflix to late Netflix doesn’t tell us anything at all about the evolution of consumer taste in the online world, or about the relative diversity of demand from online and ‘bricks and mortar’ stores, which is supposed to be what this is all about.

To be charitable, the nearest we can get is that it’s a comparison of a restricted set of choices (2000) and a broad set of choices (2005), but given that the size of the available set of titles increased by a factor of about 5 while the user base increased by a factor of 50, interpreting the results as “the effect on demand of this increase in variety [of titles]”, as Anderson does, is simply seeing what you’d like to see.

If we are going to take Anderson seriously then we should adopt his standard definition when the long tail gets challenged:

This is a good moment to remind everyone of the normal definition of “head” and “tail” in entertainment markets such as music. “Head” is the selection available in the largest bricks-and-mortar retailer in the market (that would be Wal-Mart in this case). “Tail” is everything else, most of which is only available online, where there is unlimited shelf space. [link]

It’s a definition that is skewed to guarantee success for his model, and which is completely uninteresting (as I have posted about ad nauseam) but hey, it’s his definition. And the Netflix data has nothing to say about it.

So when Chris Anderson posts his favourite graph from the data and claims it’s a vindication of the long tail (“Netflix data shows shifting demand down the Long Tail”), it can only be because it looks like the schematic, unlabelled, number-free graphs in his book. It’s cherry picking the data for the most simplistic of reasons because the two lines he’s comparing have no relation to what he talks about elsewhere, but hey, who cares?