Harry Potter and Pareto’s fat tail
In the US, it was published by Scholastic in October 1998. In December 1998, Harry Potter entered The Times' best-seller list and in August 1999 reached the top of the New York Times list. Book after book, Rowling's creations stayed on top till 2008 and about 450 million copies have been sold. The last book of the series, Harry Potter and the Deathly Hallows, was the fastest-selling book in history with more than 11 million copies sold in the first day. The books also have been translated to 67 languages, including Latin and Ancient Greek. But such phenomena hardly ever occur, and many are the books condemned to oblivion. Have you ever wondered how many?
Book sales can be described with a power law or Pareto distribution (a statistical distribution is a mathematical equation that describes how data is distributed along a defined range of values). Vilfredo Pareto was an Italian economist, who in 1906 observed that the distribution of land among landowners followed a power law, so that 20% of the people in Italy owned 80% of land (known as 80/20 rule). He also gathered income data from other countries and historical periods finding that a similar pattern appeared almost everywhere. Power laws appear constantly in the most unexpected areas of natural and social sciences: in astronomy (the diameter of moon’s craters, the distribution of mass in the Universe), informatics (the size of computer components, the number of people visiting a website), geology (the magnitude of earthquakes), demography (the population of cities), business (the size of a company), and of course in the book industry.
The actual form of the distribution varies from country to country, depending on the specific characteristics of the local market. In the US book business, instead of an 80/20 rule, we find a 97/20 rule, that is, 97% of sales are made by 20% of authors. US literary nonfiction sales are still more imbalanced; with 0.25% of books representing 50% of sales. In Canada, a 0.8% of books generated 60% of bookshop revenues. The form of the distribution can also change with time. In Italy, Pareto's homeland, the value of the exponent of the Pareto distribution for book sales in the mid-90s varied between 0.9 and 1.5, depending on the time of the year. The lowest values were found around Christmas. Mind that as the value of the exponent decreases, fewer books take a higher proportion of the sales, which means that, when buying Christmas presents, best-sellers are still more popular than during the rest of the year.
One of the characteristics of Pareto distributions is the existence of a fat tail (also, heavy or long tail), representing a significant proportion of the population. For traditional bookshops fat tails are of relative low interest. Understandably, their limited shelf space is occupied mainly by the most popular books. Instead, internet bookstores such as Amazon can take advantage of the existence of a fat tail as a commercial opportunity. Given their almost unlimited storage space they can offer books that cannot be found in traditional bookstores to thousands of millions of potential buyers all over the world. The result is that more than 50% of Amazon’s sales come from rare books. Even if these obscure volumes sell few copies every year, there are so many of them, that their overall sales overcome best-sellers!
Another interesting feature of long tails is that books can loom there, in the backstage, until the time is ripe for them to bloom. In fact, it took a year and a half for the first book in the Harry Potter series to climb up to the first positions of best-seller lists. And after a best-seller’s golden age has passed, it will move slowly towards the distribution’s long tail, eventually fading or blooming again. Harry Potter books will probably continue to be sold for many years to new generations of readers, somewhere in the middle of Pareto’s fat tail.