Category Archives: tech

The (s|S)emantic (w|W)eb

April 30, 2008random, techrant, rdf, semantic webNick Lothian

“The semantic web is the future of the web and always will be”

Peter Norvig, speaking at YCombinator Startup School

I’m sick of Semantic Web hype from people who don’t understand what they are talking about. In the past I’ve often said <insert Semantic Web rant here> – now it’s time to write it down.

There’s two things people mean when they say the “semantic web”. They might mean the W3C vision of the “Semantic Web” (note the capitalization) of intelligent data, usually in the form of RDF, but sometime microformats. Most of the time people who talk about this aren’t really having a technology discussion but are attempting a religious conversion. I’ve been down that particular road to Damascus, and the bright light turned out to be yet another demonstrator system which worked well on a very limited dataset, but couldn’t cope with this thing we call the web.

The other thing people mean by the “semantic web” is the use of algorithms to attempt to extract meaning (semantics) from data. Personally I think there’s a lot of evidence to show that this approach works well and can cope with real world data (from the web or elsewhere). For example, the Google search engine (ignoring Google Base) is primarily an algorithmic way of extracting meaning from data and works adequately in many situations. Bayesian filtering on email is another example – while it’s true that email spam remains a huge problem it’s also true that algorithmic approaches to filtering it have been the best solution we’ve found.

The problem with this dual meaning is that many people use it to weasel out of addressing challenges. Typically, the conversation will go something like this:

Semantic Web great, solve world hunger, cure the black plague bring peace and freedom to the world blah blah blah…

But what about spam?

Semantic Web great, trusted data sources automagically discovered, queries can take advantage of these relationships blah blah blah…

But isn’t that hard?

No, it’s what search engines have to do at the moment. The semantic web (note the case change!) will also extract relationships in the same way.

So.. we just have to mark up all our data using a strict format, and then we still have to do the thing that is hard about writing a search engine now – spam detection.

Yes, but it’s much easier because the data is much better.

Well, it’s sort of easier to parse, and in RDF form it is more self descriptive (but more complicated), but that only helps if you trust it already.

Well that’s easy then – you only use it from trusted sources

Excellent – lets create another demo system that works well on limited data but can’t cope with this thing called the web.

Look – I don’t t think the RDF data model is bad – in fact, I’m just starting a new project where I’m basing my data model on it. But the problem is that people claim that RDF, microformats and other “Semantic Web” technologies will somehow make extracting infomation from the web easier. That’s true insofar as it goes – extracting information will be easier. But the hard problem – working out what is trustable and useful – is ignored.

The Semantic Web needs a tagline – I’d suggest something like:

Semantic Web technologies: talking about trying to solve easy problems since 2001.

RDF could have one, too:

RDF: Static Typing for the web – now with added complexity tax.

So that’s my rant over. One day I promise to write something other than rants here – I’ve actually been studying Java versions of Quicksort quite hard, and I’ve got some interesting observations about micro optimizations. One day.. I promise…

Google Search API is back

April 14, 2008techgoogle, searchNick Lothian

It seems that no one has noticed that Google have brought back programmatic access to their search results. It’s now REST based (instead of the old SOAP API), and is documented as part of the AJAX APIs.

Interestingly, they only return results in JSON format, which is less than ideal but better than nothing.

Dataportability: Did anyone ask the users? – Part 2

April 3, 2008social networks, techNick Lothian

I got a bit of feedback on my previous post about dataportability. The general gist was that because you can move your contacts from one email system to another (or export them) then data portability is good.

I’m not sure I agree. I think that joining a new social application and automatically finding existing contacts on that system is functionality that is likely to cause problems for users.

Each social application is a different context and people use them in different ways. Mid last year I expressed my concerns about this on the Social Network Portability group like this:

Everyone’s heard the stories of how employers are checking out possible
employees on Facebook. This system will not only find them on
Facebook, but find their user id on that new Playboy social network
for college students (http://www.techcrunch.com/2007/08/22/new-playboy-
social-network-built-on-ning/). That’s not a good thing to do to
people..

dahna boyd wrote about similar issues:

I lost control over my Facebook tonight. Or rather, the context got destroyed. For months, I’ve been ignoring most friend requests. Tonight, I gave up and accepted most of them. I have been facing the precise dilemma that I write about in my articles: what constitutes a “friend”? Where’s the line?

….

I know people generally believe that growth is nothing but candy-coated goodness. And while I hate using myself as an example (cuz I ain’t representative), I do feel the need to point out that context management is still unfun, especially for early adopters, just as it has been on every other social network site. It sucks for teens trying to balance mom and friends. It sucks for college students trying to have a social life and not piss off their profs. It sucks for 20-somethings trying to date and balance their boss’s presence.

Back then I was all over using bloom filters as a way of attempting to preserve people’s privacy. I’ve given that up now – it’s a nice hack but it doesn’t really fix anything.

Moving your email contacts between systems is fine for both parties because it’s the same context – email. Being linked to your boss on LinkedIn and having them automatically find you on a dating site you are both a member of is going to put a lot of users off.

Firefox 3 on Linux

March 25, 2008techNick Lothian

I’ve been using Ubuntu at home on one of my computers for close to a year now. I’ve been pretty happy with it, although Gnome struggled on my computer (a circa 2003 Athlon). Switching to Xfce fixed that, and my one remaining problem was Firefox.

For those who haven’t tried Firefox 2 on Linux, it’s pretty bad. If leave a Javascript heavy site (eg GMail) open the browser will slowly grind to a halt over a course of a few hours.

I recently upgraded to Firefox 3 (see this video for how to do that), and it’s made a HUGE difference. The one issue I had was that I could get it to start – I hadn’t realized that the executable was now firefox-3.0 instead of firefox. Makes sense, though.

Why tech predictions are stupid (and a small prediction)

March 13, 2008java, predictions, techNick Lothian

Every year hundreds of tech pundits go and make their predictions for the year – a trend I’m not immune to either. Alan Kay explained the problem with this the best: “The best way to predict the future is to invent it”. In a field like computing it is so easy for a single person to build something new it makes trying to make predictions a pointless Lose Weight Exercise.

None the less, here’s something that is less of a prediction and more an Lose Weight Exercise in deduction and rumor mongering. Sun is planning to launch a direct competitor to Amazon’s EC2 in the near future (not sure when exactly, but 2008 for sure). Note that this is different to the existing Sun Grid product (which will presumably continue).

Podcasted

March 7, 2008personal, random, techNick Lothian

Does it make sense to say that I got podcasted?

Anyway, I did – or at least I had a good conversation when my network connection didn’t drop out. I haven’t listened to it yet – not sure I enjoy listening to myself.. But as I said on twitter – now I’m a legend in my own lunchbox.

BadMagicNumber

My Blog, Take 4