Alexa Internet is a web company owned by Amazon that provides a toolbar that offers advice to it’s user on navigation, and tracks a users behaviour transmitting it to Alexa for analysis of browsing patterns.

Alexa is widely used, especially when attempting to value a website for a potential sale. Some Webmasters focus on improving their Alexa ranking over improving their content in the pursuit of valuing their site higher. This seems like a waste of effort to me, but this practice does take place, but then you already knew that! With this behaviour taking place, and discrepancies when comparing Alexa rankings with analytic services such as Google’s, some concerns over Alexa’s accuracy have been raised.

The tracking tool that users download, tracking their activity for Alexa allows Alexa to show the proportion of a web sites global traffic rank, reach and page views in relation to all the other sites Alexa has data about. What Alexa does not do, is give precise numbers for any of it’s site detail categories.


Long Tail

A recent blog on the Alexa site “What’s Going On With My Alexa Rank?” offered an explanation for some dramatic change in Alexa rank that some smaller sites have experienced. The Long Tail represents how a tiny change in value can dramatically change a sites position on the curve (rank) dependent on it’s starting position.

A site at the top of the curve will see very minor changes in position even when visitor numbers change by the thousands, as the density of sites at this level is low. From the middle to the lowest point of the curve is the highest density of web sites from the 200 million reported by Netcraft in their July 2010 Survey. A small change in visitor numbers, maybe a spike of 5000, will dramatically effect a sites relative rank when 10 million other sites share similar traffic stats, thus leading to a dramatic change of an Alexa ranking.


Google Analytics

To gauge the accuracy of Google analytics before comparing to Alexa I carried out a number of comparisons between analytics data and our own web monitoring with awstats. Across the three sample sites below I found consistent correlation with plus or minus 5,000 page views on average for each day in the trailing three month range.


A comparison

I wanted to compare large sites we host with Alexa rankings to test the accuracy for myself, unfortunately I can’t give any names!


Large site #1

Comparing Alexa traffic reach with google analytics

Over three months Alexa is pretty accurate in tracking traffic patterns for my first sample site. Even the spike is fairly represented, and the time frame the spike takes place over is correct.

Large site #2

Comparing Alexa traffic reach with google analytics

Alexa does represent traffic patterns over the course of a week but the accuracy is certainly less than Site #1, and the time frame is inconsistent.

Large site #3

Comparing Alexa traffic reach with google analytics

This is the weakest correlation of the three graphs, with only a suggestion of this site’s traffic patterns. The spikes from Alexa for global reach do not seem to follow any spikes in page views on the actual site.


From my small sample the accuracy of Alexa varies greatly. Clearly there is no consistency in the accuracy of Alexa’s logging, although some ranks and global reach percentage correlate very closely. This certainly suggests that Alexa cannot be relied upon to give accurate visualisation in every case.


The numbers

As I have access to analytics data for numerous large sites on our cluster I can attempt to calculate the real number of site visitors 0.001% of global traffic in Alexa. Adding to my two example sites with Alexa graphs I will use three more large sites hosted on CatN vClusters. Before attempting to calculate the real visitor value of an Alexa percentage it is important to check the consistency across a sample of data for each site.


Site 1 Alexa page views % Google analytics page views
July 12th 0.0008 220403
July 26th 0.00058 266596
August 6th 0.0004 118727

Site 2 Alexa page views % Google analytics page views
July 12th 0.0002 59628
July 26th 0.0001 72821
August 6th 0.0001 71239

Site 3 Alexa page views % Google analytics page views
July 12th 0.0001 45178
July 26th 0.0001 37790
August 6th 0.0001 27290

Site 4 Alexa page views % Google analytics page views
July 12th 0.0007 341092
July 26th 0.0007 431614
August 6th 0.0003 27893

Site 5 Alexa page views % Google analytics page views
July 12th 0.0001 60716
July 26th 0.0001 54384
August 6th 0.00005 49529

From these sample sites, there are obvious changes in consistency. For sample Site 1 on July 12th there were 220,403 page views giving an Alexa global page views of 0.0008%, but on July 26th 40,000 more page views (266,596) results in Alexa giving a lower percentage of global page views, 0.00058.

Continuing to sample Site 2, on July 12th 59,628 resulted in 0.0002% of global page views and two weeks later 10,000 more page views (72,821) gave a lower Alexa percentage, 0.0001%.

Finally in sample Site 5‘s data July 12th and July 26th data is consistent with a difference of 6,332 page views (60,716 – 54,384) giving the same Alexa percentage in both cases 0.0001%. However, on August 6th the site page views drops to 49529, approximately 5,000 less page views, but the Alexa percentage halves to 0.00005%, not a relative representation of the change in page views.

Sites gaining an Alexa rank of approximately 0.0001% have a wide range of traffic figures suggesting that the Long Tail effect is apparent in this range of sites which must be densely populated. The lowest traffic I have recorded within the 0.0001% range is Site 3 with 27,290 page views and the highest is Site 2 with 72,821 page views. With a difference of almost 50,000 page views the Alexa ranking for sites of this size must be densely populated with large change in ranks from relatively small changes in page views.

However, sites with page views within this range (27,290 to 72,821 page views) do not have corresponding Alexa global page views percentages. Site 4 on August 6th fits within the 0.0001% range from Google analytic page view data (27,893 page views), but Alexa gave a global page views percentage of 0.0003%, 0.0002% higher, revealing clear inaccuracies in Alexa ranking. Due to these inaccuracies I feel it is impossible to attempt to calculate the real world value in page views of a reach of 0.0001% from Alexa.


Users

I wanted to find out how many Internet users actually have the Alexa Toolbar installed, as this could explain varying ranks from Alexa, and the demographic of user’s would clearly effect the amount of data Alexa is gathering about certain sites.

To do this we searched our web server logs for occurrences of the “Alexa Toolbar” user-agent string, and variations of. An example user agent string containing the target words would look like this:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506; Alexa Toolbar)


I cannot disclose actual website request numbers or a websites name, however from sample sites receiving requests over a period of time from 1 million to 10 million, an average 0.019% of those requests came from users with the Alexa toolbar installed.

The sample sites demographic is wide ranging from niche technical sites to property and employment sites giving a wide range of sample demographic data. Another point to note is that from these sample sites, the highest percentage of requests from Alexa users was 0.026%, still a very low percentage, suggesting this level of Alexa requests is applicable across other web sites.


Is it possible for Alexa to give accurate rankings and percentage of global page views for a website, considering many sites are being visited by a tiny number of users that actually have the Alexa toolbar installed? Clearly Alexa cannot be relied on for accurate ranking information for critical site analysis, and perhaps should only be used as a rough guide, especially for smaller sites subject to the long tail effect. What are your experiences of Alexa’s accuracy? How far do you trust it?

Joe Gardiner General Manager

Joe is the General Manager of CatN. He oversees product development, customer engagement and commercial activities. You can find him on Google Plus.