Interesting Search Paper

For all you search engine geeks out there: Block-level Link Analysis – analysis of web pages at the block, rather than page level, and using with algorithms such as Page Rank & HITS. (From MS Research, via Doug Cutting on the nutch-dev list)

To quote the abstract:

Link Analysis has shown great potential in improving the per-formance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.

My understanding is that this is similar to the method used by Google's AdSense program to work oout what advertisment to display on each page.

Leave a Reply

Your email address will not be published. Required fields are marked *