Category Archives: random

Fixing Firefox performance and lock-ups on Linux

I’ve been using Ubuntu (8.04 then 8.10) reasonably heavily over the last 12 months or so as my main operating system on two of my home computers (a Dell Mini 9″ and a quad core desktop).

I’ve been pretty happy with it except for the infuriating habit Firefox has for “locking up” periodically. The symptoms of this include non-reponsiveness, screen freezing and even the computer being unusable for 30 seconds at a time. The only clue I had was there seemed to be large amounts of disk usage.

After a while bitching and moaning about it I got so annoyed I started looking for a fix.

The first thing I tried was the Chromium Linux nightly builds. Google says “blogging about this isn’t helpful”, so I won’t except to say that I’ve got pretty high expectations of Chrome on Linux and so far I haven’t had to re-apprise that.

All the same, I wanted to fix Firefox. The next thing I tried was moving Firefox cache to a RAM drive. That’s pretty easy – just set it to use a directory under /dev/shm/ for the cache location.

I think that improved the situation marginally, but not enough to call it a fix.

The next thing I tried was to raise a Firefox bug. Somewhat to my surprise that got linked to another bug which was marked as fixed.

The comments on that bug are quite long, but the story is this:

  • Firefox uses SQLite as a database for its history and bookmarks.
  • SQLite, being a database is very concerned about data integrity, and to implement this it relies on the fsync system call.
  • fsync has performance issues on ext3 filesystems. See for example,

    The problem, in short, is this: the ext3 filesystem, when running in the default data=ordered mode, can exhibit lengthy stalls when some process calls fsync() to flush data to disk. This issue most famously manifested itself as the much-lamented Firefox system-freeze problem, but it goes beyond just Firefox.

  • SQLite has a no-sync mode, which trades reliability for performance.
  • Firefox can use this mode via a config setting.

So the outcome of all that is this:

Create a new config key “” and set it to the integer 0 to stop Firefox lock-ups on Linux (but be aware that there is some chance a power failure could cause loss of your history and/or bookmarks).

Random MP3 metadata code

I’ve been doing random MP3 metadata work lately. Here’s some code which others might find useful.

Extracting MP3 tags from mp3 file hosted on server using HTTP Range queries.

So I was using Apache Tika for various metadata stuff. I wanted to get the song title for a file hosted on a server, but Tika only supports MP3 ID3v1 metadata, which exists at the end of a file. Downloading an entire MP3 just for the title is wasteful, but fortunatly HTTP Range queries can help us out.

HttpClient httpClient = new HttpClient();

String address = "http://address of mp3 file here";

HttpMethod method = new HeadMethod();
method.setURI(new URI(address,true));

Header contentLengthHeader = null;
Header acceptHeader = null;

try {
	contentLengthHeader = method.getResponseHeader("Content-Length");
	acceptHeader = method.getResponseHeader("Accept-Ranges");
} finally {

if ((contentLengthHeader != null) && (acceptHeader != null) && "bytes".equals(acceptHeader.getValue())) {
	long contentLength = Long.parseLong(contentLengthHeader.getValue());
	long metaDataStartRange = contentLength - 128;
	if (metaDataStartRange > 0) {
		method = new GetMethod();
		method.setURI(new URI(address,true));
		method.addRequestHeader("Range", "bytes=" + metaDataStartRange + "-" + contentLength);
		try {
			Parser parser = new AutoDetectParser();

			Metadata metadata = new Metadata();
			metadata.set(Metadata.RESOURCE_NAME_KEY, address);
			InputStream stream = method.getResponseBodyAsStream();
			try {
				parser.parse(stream, new DefaultHandler(), metadata);
			} catch (Exception e) {
			} finally {
			System.out.println("Title: " + metadata.get("title"));
			System.out.println("Author: " + metadata.get("Author"));
		} finally {
} else {
	System.err.println("Range not supported. Headers were: ");

The next thing I needed to do was extract song titles from a shoutcast stream. Shoutcast streams are kinda-but-not-quite http. Metadata is embedded in the stream (not as part of the MP3). That makes the code pretty ugly, but whatever… This code will open a connection, read the metadata and close, so you don’t need to keep downloading gigs of data.

URL url = new URL("");
URLConnection con = url.openConnection();
con.setRequestProperty("Icy-MetaData", "1");

InputStream stream = con.getInputStream();
try {

	BufferedReader in = new BufferedReader(new InputStreamReader(stream));

	String metaIntervalString = null;
	// get the headers
	StringBuilder headers = new StringBuilder();
	char c;
	while ((c = (char) != -1) {
		if (headers.length() > 5 && (headers.substring((headers.length() - 4), headers.length()).equals("\r\n\r\n"))) {
			// end of headers

	// headers look like this:
	//		ICY 200 OK
	//		icy-notice1: 
This stream requires Winamp
// icy-notice2: Firehose Ultravox/SHOUTcast Relay Server/Linux v2.6.0
// icy-name: .977 The 80s Channel // icy-genre: 80s Pop Rock // icy-url: // content-type: audio/mpeg // icy-pub: 1 // icy-metaint: 16384 // icy-br: 128 Pattern p = Pattern.compile("\\r\\n(icy-metaint):\\s*(.*)\\r\\n"); Matcher m = p.matcher(headers.toString()); if (m.find()) { metaIntervalString =; } if (metaIntervalString != null) { int metaInterval = Integer.parseInt(metaIntervalString.trim()); if (metaInterval > 0) { int b; int count = 0; int metaDataLength = 4080; // 4080 is the max length boolean inData = false; StringBuilder metaData = new StringBuilder(); while ((b = != -1) { count++; if (count == metaInterval + 1) { metaDataLength = b * 16; } if (count > metaInterval + 1 && count < (metaInterval + metaDataLength)) { inData = true; } else { inData = false; } if (inData) { if (b != 0) { metaData.append((char)b); } } if (count > (metaInterval + metaDataLength)) { break; } } String metaDataString = metaData.toString(); System.out.println(metaDataString); } } } finally { stream.close(); }


For the last I’ve been doing webdesign (yeah, that actual visual UI stuff, not just AJAX or something) at work, and – remakably – for the first time since 1997 (yes – 1997!) I’ve enjoyed it.

Generally speaking my design tastes are different – or perhaps I could better say they reflect my unique sense of humour. For example, the orginal – and best – design for featured a colour scheme generated from converting universal constants (the speed of light, e, etc etc to hex values). It was unique, and is still yet to be duplicated (!!).

But doing serious webdesign led me to dig out an old, old review for the first website I ever built and maintained. This, was when the web was young, CSS didn’t really work, Netscape 4 (!) was my browser of choice and I think I was running a pre-Slackware 1.0 Linux install, which I’d downloaded onto 12 floppies.

Website review, 1997 (actually, the article is from the May 1998 edition of Adelaide Review but I did the site in ’97)


Website review
Website review

The (s|S)emantic (w|W)eb

“The semantic web is the future of the web and always will be”

Peter Norvig, speaking at YCombinator Startup School

I’m sick of Semantic Web hype from people who don’t understand what they are talking about. In the past I’ve often said <insert Semantic Web rant here> – now it’s time to write it down.

There’s two things people mean when they say the “semantic web”. They might mean the W3C vision of the “Semantic Web” (note the capitalization) of intelligent data, usually in the form of RDF, but sometime microformats. Most of the time people who talk about this aren’t really having a technology discussion but are attempting a religious conversion. I’ve been down that particular road to Damascus, and the bright light turned out to be yet another demonstrator system which worked well on a very limited dataset, but couldn’t cope with this thing we call the web.

The other thing people mean by the “semantic web” is the use of algorithms to attempt to extract meaning (semantics) from data. Personally I think there’s a lot of evidence to show that this approach works well and can cope with real world data (from the web or elsewhere). For example, the Google search engine (ignoring Google Base) is primarily an algorithmic way of extracting meaning from data and works adequately in many situations. Bayesian filtering on email is another example – while it’s true that email spam remains a huge problem it’s also true that algorithmic approaches to filtering it have been the best solution we’ve found.

The problem with this dual meaning is that many people use it to weasel out of addressing challenges. Typically, the conversation will go something like this:

Semantic Web great, solve world hunger, cure the black plague bring peace and freedom to the world blah blah blah…

But what about spam?

Semantic Web great, trusted data sources automagically discovered, queries can take advantage of these relationships blah blah blah…

But isn’t that hard?

No, it’s what search engines have to do at the moment. The semantic web (note the case change!) will also extract relationships in the same way.

So.. we just have to mark up all our data using a strict format, and then we still have to do the thing that is hard about writing a search engine now – spam detection.

Yes, but it’s much easier because the data is much better.

Well, it’s sort of easier to parse, and in RDF form it is more self descriptive (but more complicated), but that only helps if you trust it already.

Well that’s easy then – you only use it from trusted sources

Excellent – lets create another demo system that works well on limited data but can’t cope with this thing called the web.

Look – I don’t t think the RDF data model is bad – in fact, I’m just starting a new project where I’m basing my data model on it. But the problem is that people claim that RDF, microformats and other “Semantic Web” technologies will somehow make extracting infomation from the web easier. That’s true insofar as it goes – extracting information will be easier. But the hard problem – working out what is trustable and useful – is ignored.

The Semantic Web needs a tagline – I’d suggest something like:

Semantic Web technologies: talking about trying to solve easy problems since 2001.

RDF could have one, too:

RDF: Static Typing for the web – now with added complexity tax.

So that’s my rant over. One day I promise to write something other than rants here – I’ve actually been studying Java versions of Quicksort quite hard, and I’ve got some interesting observations about micro optimizations. One day.. I promise…

Random stuff

It’s Friday afternoon, so here’s some random stuff:

  • We live across the road from a park, and most Saturday mornings some guy rides his bike there to do Yoga. He also brings his pet chicken to the park and lets it run around. (This might be normal behavior in San Fransisco or somewhere, but in suburban Adelaide it is kinda odd)
  • Alex is now 2, and doesn’t like sleeping at childcare. Fortunately, they have figured out that letting him sleep with a ladder (yes, a full size, aluminum ladder) will calm him down and get him to sleep.
  • Paul Keating – no matter if you loved him or hated him – had a unique way with words. From yesterday’s Financial Review: “When push came to shove, McGuiness’s journalism did not add up to a row of beans. He help more political, philosophic and economic positions than would have the Karma Sutra had it been a philosophic text“.
  • If you don’t program, and you write about the meaning of programming APIs then your opinion is moot. This also applies if you try and talk about APIs
  • The Moth is a cool boat, but has come a long way since my circa-1970 tunnel hulled version. It’s kind of weird that they banned tunnel hulls, but freaking hydrofoils are okay…

Predictions for 2008

So it turns out that it’s 2008 and the thing to do is to do predictions for the next year. Here’s my 2:

  1. Facebook will have a huge leak of personal private information. It will turn out to be due to buggy code, which will finally focus some attention on the fact that Facebook’s codebase appears to be really, really bad.
  2. Someone will realize that recommendations are the next search. Some company will work out how to do for recommendations what Google did for search: ie, take what is currently an overly commercial medium (eg, Amazon recommendations etc) and turn it into a consumer facing tool which is generally useful. By 2010 what they did will seem obvious, and by 2011 they will be billionaires.

Update – 1 more thing:

OpenSocial will succeed in a big way – not because of support from the big players (Google etc) but because lots of small open source web projects (WordPress, Drupal, Joomla etc) can easily add support and will finally have a standard way of creating cross-platform compatible software.

The Napster (Grockster?) of Facebook

IANAL, but how can Audibie possibly be legal? Since the doctrine of inducement appeared (ref Grockster) I can’t see how the DCMA safe-harbor provisions would save them. Perhaps they are relying on the fact that they don’t host the files themselves – although that didn’t save Grockster or Napster.

It’s interesting to think what Facebook’s liability would be over an application like this. Facebook have a currently have a copyright policy which passes responsibility for DMCA takedown requests onto the application author. Audibie have posted their takedown procedures, in accordance with the DCMA.

If I was Facebook I’d be pretty worried that might not be enough.