Search Engine Indexing Speed

Tristan Louis has written a couple of articles on the number of hits for various bloggers in three search engines: Google, MSN and Technorati. See http://www.tnl.net/blog/entry/Secrets_of_the_A-list_bloggers:_Technorati_vs._Google and http://www.tnl.net/blog/entry/Technorati_Yahoo_and_Google_Too.

A number of people have pointed out that there are problems with his methodology and the aim of the experiment itself. Tim Bray says it well: “Almost all the modern engines do a pretty damn good job of getting you something appropriate and useful in the first handful of results. Who cares about the next million?”, but if you want all the details of what is wrong with this study, see Danny Sullivan's post.

Anyway, I'm interested in search engine comparisons, but right now I'm more interested in how fast things get in the index than how many million results something returns, so over the last couple of daya I conducted a small experiment.

Firstly, I posted a blog post entitled “Agro the Aggregator” and then about 12 hours later I used my Argos search engine library to poll six search engines ever half hour with the query “Agro the Aggregator” for 19 hours. I then counted the results by iterating over them all (ie, the links were manually counted without relying on the “result count” returned by the search engines which can be inaccurate).

Unfortunately I started the experiment too late to catch which engine found a result first, but Blogdigger, Google and Yahoo all had results by the time I started searching.

However, the results do show the following:

  • Google finds the most results, although they fluctuate. I could not replicate the way those results dropped back to 2 hits using manual search, so it is possible that this is an artifact of using the Google API. IN the manual search, Google also correctly identifies a number of these 16 posts as being duplicate content (ie, my blog post re-aggregated).
  • Blogdigger returned results the quickest out of any of the 3 specialist blog search engines (Blogdigger, Feedster and Technorati). This was despite the fact that Technorati was pinged directly with the blog posting. I suspect this may have something to do with Blogdiggers use of the FeedMesh to find new posts quickly.

Hits for

Leave a Reply

Your email address will not be published. Required fields are marked *