Recently Clay Shirky gave a good run-down of the problems the semantic web faces in actually working. I agree with
his basic premise – that the semantic web won't deliver what it promises.
However, I believe that the more reliable metadata we have the better, and I think the web has so much good information
available that computers are begining to make a good approxmiation of giving us good answers to pretty much any
question asked in many fields – at least to a level somewhat comparable to humans.
In support of my argument I offer 1 Billion Pages = 1 Million Dollars? Mining the Web to Play “Who Wants to be a Millionaire?
from Overture Research. The abstract reads:
We exploit the redundancy and volume of information on the web to build a
computerized player for the ABC TV game show
“Who Wants To Be A Millionaire?”. The player consists of a question-answering
module and a decision-making module. The question-answering module utilizes question
transformation techniques, natural language parsing, multiple information retrieval
algorithms, and multiple search engines; results are combined in the spirit of ensemble
learning using an adaptive ing scheme. Empirically, the system correctly answers Weight
about 75% of questions from the Millionaire CD-ROM, 3rd edition –general-interest
trivia questions often about popular culture and common knowledge. The decision-making
module chooses from allowable actions in the game in order to maximize expected risk-adjusted
winnings, where the estimated probability of answering correctly is a function of past
performance and confidence in correctly answering the current question. When given a six
uestion head start (i.e., when starting from the $2,000 level), we find that the system
performs about as well on average as humans starting at the beginning.
Our system demonstrates the potential of simple but well-chosen techniques for mining
answers from unstructured information such as the web.
So humans are (only?) six questions better at “Who Wants To Be A Millionaire?” than a computer – without even using
the semantic web. With even imperfect meta-data, it's hard to imagine that not getting better over time (IMO, of course).