Archive for java

A pragmatic approach to Google AppEngine

I’ve been working on a large (Java) AppEngine project since January 2010. I recently left that job, but the project hasn’t finished and unfortunately I can’t talk about it yet.

During that time I learnt a lot of tricks and techniques for dealing with AppEngine’s idiosyncrasies, which have been useful for building a contextual advertising demo system: Qontex.com (brief synopsis: contextual affiliate ad distribution software. Not too sure what I’m going to do with it, but I had fun building it. The front end container is actually WordPress(!), but the UI is GWT and the backend is AppEngine).

Anyway, it seems useful to share a few things I’ve learnt.

1) Be pragmatic

I think of AppEngine as Amazon S3 plus some intelligence, rather than Amazon EC2 minus features. I find that a lot less frustrating.

If there is something you need that AppEngine doesn’t do well, don’t try and force it. Full Text Search is a great example: it’s horrible to try & get it to work on AppEngine, but installing Solr on a VM somewhere (or using a cloud Solr provider) is trivial.

2) AppEngine is a platform optimized for a specific type of application.

Don’t think of AppEngine as a standard Java application stack in the cloud. From the documentation:

While a request can take as long as 30 seconds to respond, App Engine is optimized for applications with short-lived requests, typically those that take a few hundred milliseconds. An efficient app responds quickly for the majority of requests. An app that doesn’t will not scale well with App Engine’s infrastructure.

Think about that for a while, and understand it well. Often Java developers are used to building corporate web apps where functionality is slowly built up over time. All too often a single HTTP request will have 4 or 5 database queries in it, and that is regarded as normal. That won’t work in AppEngine.

When you are working with AppEngine you’ll be thinking about performance continually, and differently to how you do with a normal Java application.

3) The datastore is dangerous.

In the development environment it has similar performance characteristics to a traditional database. In production it is slow at best, unpredictable at worst. If you come from an enterprise Java background, think of it as an integration server for a legacy API you are integrating with: data inside it isn’t going to go missing, but you should expect your connection to it will break at any point. You need to isolate your users from it, protect you application from it and consider carefully how to protect your data from outages.

I usually assume that a datastore query is going to take 200ms. Lately it has usually been better than that, but the variation is still a problem: http://code.google.com/status/appengine/detail/datastore/2010/11/23#ae-trust-detail-datastore-query-latency

4) Memcache is useful, but no silver bullet.

Memcache is useful because it has much more predictable performance characteristics than the datastore – and it’s a lot faster too. Generally, it’s pretty safe to rely on Memcache responding in less than 20ms at worst. At the moment its responses are around 5-10ms. See the Memcache status page for details: http://code.google.com/status/appengine/detail/memcache/2010/11/23#ae-trust-detail-memcache-get-latency

A Useful Pattern

One pattern I’ve found useful is to think of user-facing servlets as similar to the UI thread in a GUI application. Blocking should be kept minimal, and anything that’s going to take significant time is done from task queues. This includes anything beyond a single “GET” on the datastore (note that a GET operation is very roughly twice as fast as a datastore query)

For example Qontex has a process that relies on content analysis. I currently do that on-demand rather than attempting to spider the entire internet. The demo “Ad Explorer” front end is written in GWT, and it works like this:

1) Send a request to the analyze URL, passing the name of a callback function (for JSONP callback)

2) The backend checks Memcache for data about the URL. If it isn’t there, it fires an AppEngine task queue request to analyze the URL and returns a JSONP response that contains a status_incomplete flag and a wait_seconds parameter.

3) The GWT client gets the response, and sets a timer to re-request in wait_seconds seconds.

4) Meanwhile, back on the server the task queue task is being processed. That task will load the results into memcache.

5) The client re-requests the analyze URL, and this time Memcache has been loaded so the servlet can built a response with the correct data.

I use a similar, but simpler pattern to write to the datastore.

When an ad is served, or when a user clicks an ad I fire a task-queue request to record that, which lets me send a response much quicker. AppStats is great for showing this graphically:

As you can see there it would be sensible to bulk up all those memcache reads into a single read on a composite object. At the same time, the entire servlet responds on 37ms, which isn’t too bad, and some of those memcache calls are conditional – but the point is that AppStats gives great visibility into exactly how your application is performing.

Comments (8)

Solr+Cassandra

I’ve been a big fan of Solr for quite a long time, and have used it extensively at work.

I noticed a few weeks ago that Jake Luciani had managed to get Lucene (which Solr uses) working on Cassandra (Facebook’s highly scalable keystore).

The next step had an obvious name: Solandra – Solr running on Cassandra.

Basically there wasn’t too much to getting it going in the limited form it is now – a few minor changes to Jake’s Lucandra code, a custom Solr FieldType (exactly why I needed this I’m unsure) and correctly configured solrconfig.xml and schema.xml files.

I haven’t tested updates, so you’ll probably need Jake’s BookmarkDemo to load data in.

My changes to the Lucandra index reader include hard coding (!) the fields returned by getFieldNames(..) to match the Solr schema and the fields added in the demo.

If anyone is interested, the code is available: solandra.zip. You’ll need to be a Java developer to use it, though.

Comments (5)

The AppEngine is forking Java “controversy”

So there has been some noise from Sun about how Google AppEngine is evil because it’s not supporting the complete set of classes in the JRE. I’m sorry Sun – I’m a Java programmer, and I think that argument is shit. I’d much prefer a partial Java implementation with well defined limitations than PHP, or Python or Ruby.

AFAIK, no one has posted a list of classes missing. I can’t be bothered doing that either, but I did manually take a look at package level. Here’s it looks like GAE/J is missing:

java.applet
java.awt.*
javax.activation
javax.imageio.*
javax.jws.*
javax.management.*
javax.naming.*
javax.net.*
javax.print.*
javax.rmi.*
javax.sound.*
javax.swing.*
javax.tools
javax.xml.bind.*
javax.xml.crypto.*
javax.xml.soap
javax.xml.stream.*
javax.xml.ws
org.ietf.jgss
org.omg.*

From that list, I’d like to see javax.activation, javax.management and the remaining javax.xml.* and maybe javax.tools packages supported. The rest really don’t seem at all relevant to the AppEngine environment.

Comments (4)

Random MP3 metadata code

I’ve been doing random MP3 metadata work lately. Here’s some code which others might find useful.

Extracting MP3 tags from mp3 file hosted on server using HTTP Range queries.

So I was using Apache Tika for various metadata stuff. I wanted to get the song title for a file hosted on a server, but Tika only supports MP3 ID3v1 metadata, which exists at the end of a file. Downloading an entire MP3 just for the title is wasteful, but fortunatly HTTP Range queries can help us out.

HttpClient httpClient = new HttpClient();
httpClient.getHttpConnectionManager().getParams().setConnectionTimeout(10000);
httpClient.getHttpConnectionManager().getParams().setSoTimeout(10000);

String address = "http://address of mp3 file here";

HttpMethod method = new HeadMethod();
method.setURI(new URI(address,true));

Header contentLengthHeader = null;
Header acceptHeader = null;

httpClient.executeMethod(method);
try {
	//System.out.println(Arrays.toString(method.getResponseHeaders()));
	contentLengthHeader = method.getResponseHeader("Content-Length");
	acceptHeader = method.getResponseHeader("Accept-Ranges");
} finally {
	method.releaseConnection();
}

if ((contentLengthHeader != null) && (acceptHeader != null) && "bytes".equals(acceptHeader.getValue())) {
	long contentLength = Long.parseLong(contentLengthHeader.getValue());
	long metaDataStartRange = contentLength - 128;
	if (metaDataStartRange > 0) {
		method = new GetMethod();
		method.setURI(new URI(address,true));
		method.addRequestHeader("Range", "bytes=" + metaDataStartRange + "-" + contentLength);
		System.out.println(Arrays.toString(method.getRequestHeaders()));
		httpClient.executeMethod(method);
		try {
			Parser parser = new AutoDetectParser();

			Metadata metadata = new Metadata();
			metadata.set(Metadata.RESOURCE_NAME_KEY, address);
			InputStream stream = method.getResponseBodyAsStream();
			try {
				parser.parse(stream, new DefaultHandler(), metadata);
			} catch (Exception e) {
				e.printStackTrace();
			} finally {
				stream.close();
			}
			System.out.println(Arrays.toString(metadata.names()));
			System.out.println("Title: " + metadata.get("title"));
			System.out.println("Author: " + metadata.get("Author"));
		} finally {
			method.releaseConnection();
		}
	}
} else {
	System.err.println("Range not supported. Headers were: ");
	System.err.println(Arrays.toString(method.getResponseHeaders()));
}

The next thing I needed to do was extract song titles from a shoutcast stream. Shoutcast streams are kinda-but-not-quite http. Metadata is embedded in the stream (not as part of the MP3). That makes the code pretty ugly, but whatever… This code will open a connection, read the metadata and close, so you don’t need to keep downloading gigs of data.

URL url = new URL("http://scfire-ntc-aa01.stream.aol.com:80/stream/1074");
URLConnection con = url.openConnection();
con.setRequestProperty("Icy-MetaData", "1");

InputStream stream = con.getInputStream();
try {

	BufferedReader in = new BufferedReader(new InputStreamReader(stream));

	String metaIntervalString = null;
	// get the headers
	StringBuilder headers = new StringBuilder();
	char c;
	while ((c = (char)in.read()) != -1) {
		headers.append(c);
		if (headers.length() > 5 && (headers.substring((headers.length() - 4), headers.length()).equals("\r\n\r\n"))) {
			// end of headers
			break;
		}
	}

	//System.out.println(headers);
	// headers look like this:
	//		ICY 200 OK
	//		icy-notice1: 
This stream requires Winamp
// icy-notice2: Firehose Ultravox/SHOUTcast Relay Server/Linux v2.6.0
// icy-name: .977 The 80s Channel // icy-genre: 80s Pop Rock // icy-url: http://www.977music.com // content-type: audio/mpeg // icy-pub: 1 // icy-metaint: 16384 // icy-br: 128 Pattern p = Pattern.compile("\\r\\n(icy-metaint):\\s*(.*)\\r\\n"); Matcher m = p.matcher(headers.toString()); if (m.find()) { metaIntervalString = m.group(2); } if (metaIntervalString != null) { int metaInterval = Integer.parseInt(metaIntervalString.trim()); if (metaInterval > 0) { int b; int count = 0; int metaDataLength = 4080; // 4080 is the max length boolean inData = false; StringBuilder metaData = new StringBuilder(); while ((b = stream.read()) != -1) { count++; if (count == metaInterval + 1) { metaDataLength = b * 16; } if (count > metaInterval + 1 && count < (metaInterval + metaDataLength)) { inData = true; } else { inData = false; } if (inData) { if (b != 0) { metaData.append((char)b); } } if (count > (metaInterval + metaDataLength)) { break; } } String metaDataString = metaData.toString(); System.out.println(metaDataString); } } } finally { stream.close(); }

Comments (3)

ROME 1.0 Released

I’ve just pushed out ROME 1.0 and ROME Fetcher 1.0.

As they say with open source projects – “it’s done when it’s done”. But nearly 5 years to get to version 1.0 is kind of long.

Comments (4)

ROME 1.0RC2 Release

I’ve just pushed out a release of ROME core, ROME Fetcher and ROME modules.

For those who don’t know, ROME is a (the?) Java library for handling RSS and Atom. Unlike some other libraries it is pretty stable (18 months since the last release) and has a low number of dependencies (one – JDom – if all you need is parsing)

The annoucement, including links is at https://rome.dev.java.net/servlets/ReadMsg?list=dev&msgNo=2656

The thing I’m most pleased about (and the number one source of complaints about ROME) is that I’ve pushed it to the java.net Maven repository, so now it will be easier to use from Maven. Further details are at http://wiki.java.net/bin/view/Javawsxml/RomeAndMaven2

Comments (2)

Installing Java on RedHat Linux by building your own RPM

It’s pretty easy to install Java on Linux – download the RPM from sun and install it. Then if you run “java -version” you’ll suddenly discover that it doesn’t really work:

java version "1.4.2"
gij (GNU libgcj) version 4.1.2 20070626 (Red Hat 4.1.2-14)

You can get around that by setting your path and JAVA_HOME, or by only using Java version that have a matching JPackage RPM and using the alternatives command

If you want to be able to build your own RPM, here’s how to do it.

 

# Be sure to enable the distro specific repository for your distro below:
# - jpackage-fc for Fedora Core
# - jpackage-rhel for Red Hat Enterprise Linux and derivatives

[jpackage-generic]
name=JPackage (free), generic
mirrorlist=http://www.jpackage.org/mirrorlist.php?dist=generic&type=free&release=1.7
failovermethod=priority
gpgcheck=1
gpgkey=http://www.jpackage.org/jpackage.asc
enabled=1

[jpackage-fc]
name=JPackage (free) for Fedora Core $releasever
mirrorlist=http://www.jpackage.org/mirrorlist.php?dist=fedora-$releasever&type=free&release=1.7
failovermethod=priority
gpgcheck=1
gpgkey=http://www.jpackage.org/jpackage.asc
enabled=0

[jpackage-rhel]
name=JPackage (free) for Red Hat Enterprise Linux $releasever
mirrorlist=http://www.jpackage.org/mirrorlist.php?dist=rhel-$releasever&type=free&release=1.7
failovermethod=priority
gpgcheck=1
gpgkey=http://www.jpackage.org/jpackage.asc
enabled=0

[jpackage-generic-nonfree]
name=JPackage (non-free), generic
mirrorlist=http://www.jpackage.org/jpackage_generic_nonfree_1.7.txt
failovermethod=priority
gpgcheck=1
gpgkey=http://www.jpackage.org/jpackage.asc
enabled=1
  • Become root
  • Copy this file to /etc/yum.repos.d. Edit it, and make sure that enabled=1 is set for the [jpackage-generic-nonfree] section.
  • Make directories required by the RPM process (I suspect you can do this outside the /usr/src directory, though):  
mkdir -p /usr/src/redhat/SOURCES  
mkdir -p /usr/src/redhat/RPMS/i586/
  • Copy the Java installation file you previously downloaded to /usr/src/redhat/SOURCES and make it executable (chmod +x <name of file>)
  • Install the tools you need to build an rpm: yum install yum-utils jpackage-utils rpm-build  (At the moment this seems to fail on 64bit machines because of missing dependencies)
  • cd usr/src/redhat/SOURCES
  • yumdownloader –source java-1.6.0-sun
  • At the moment, that will download a file called java-1.6.0-sun-1.6.0.10-1jpp.nosrc.rpm
  • Run setarch i586 rpmbuild –rebuild java-1.6.0-sun*nosrc.rpm. At the moment that gives an error message, which seems to be able to be ignored:
sh: /usr/src/redhat/SOURCES/jdk-6u10-linux-i586.bin: No such file or directory
error: Bad exit status from /var/tmp/rpm-tmp.6041 (%prep)
RPM build errors:
    user jasonc does not exist - using root
    group jasonc does not exist - using root
    user jasonc does not exist - using root
    group jasonc does not exist - using root
    user jasonc does not exist - using root
    group jasonc does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.6041 (%prep)
  • That previous command extracted a RPM SPEC file in the /usr/src/redhat/SPECS/ directory.
  • Edit /usr/src/redhat/SPECS/java-1.6.0-sun.spec. Find the part that says %define buildver and change the value to the build for the new version of Java
  • Run rpmbuild -ba /usr/src/redhat/SPECS/java-1.6.0-sun.spec. This extracts the JDK installer you previously downloaded and builds a set of RPMs from it.
  • cd /usr/src/redhat/RPMS/i586; ls;

java-1.6.0-sun-1.6.0.11-1jpp.i586.rpm        java-1.6.0-sun-fonts-1.6.0.11-1jpp.i586.rpm
java-1.6.0-sun-alsa-1.6.0.11-1jpp.i586.rpm   java-1.6.0-sun-jdbc-1.6.0.11-1jpp.i586.rpm
java-1.6.0-sun-demo-1.6.0.11-1jpp.i586.rpm   java-1.6.0-sun-plugin-1.6.0.11-1jpp.i586.rpm
java-1.6.0-sun-devel-1.6.0.11-1jpp.i586.rpm  java-1.6.0-sun-src-1.6.0.11-1jpp.i586.rpm
  • You can now install the RPM: rpm -i java-1.6.0-sun-1.6.0.11-1jpp.i586.rpm
  • For me that failed with a missing X dependency: libXtst.so.6 is needed by java-1.6.0-sun-1.6.0.11-1jpp.i586
  • I fixed that with yum -y install libX11-devel libXtst.
  • Use the alternatives command to set the correct version of Java: alternatives –config java
  • Finally: java -version:

java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)
That’s it – you finally have Java working on Linux! You also have a RPM which can be installed on other machines.

Comments (6)

Modify java.library.path at runtime

Linking to native code in Java is always a hassle. JNI isn’t exactly nice, and there are some oddities around classloaders and native libraries which are annoying if you run into them.

One thing I wasn’t aware of was exactly how hard it is to load a library it isn’t already in the directories specified by the java.library.path system property. 

Initially, I thought I’d just be able to alter that property and the JVM would pick up the new locations. That turns out not to be the case, as is shown by this (closed) bug report.

However, there is a solution, outlined in this post on the Sun forums, which revolves around altering the usr_paths field stored in java classes.

	public static void addDir(String s) throws IOException {
		try {
			// This enables the java.library.path to be modified at runtime
			// From a Sun engineer at http://forums.sun.com/thread.jspa?threadID=707176
			//
			Field field = ClassLoader.class.getDeclaredField("usr_paths");
			field.setAccessible(true);
			String[] paths = (String[])field.get(null);
			for (int i = 0; i < paths.length; i++) {
				if (s.equals(paths[i])) {
					return;
				}
			}
			String[] tmp = new String[paths.length+1];
			System.arraycopy(paths,0,tmp,0,paths.length);
			tmp[paths.length] = s;
			field.set(null,tmp);
			System.setProperty("java.library.path", System.getProperty("java.library.path") + File.pathSeparator + s);
		} catch (IllegalAccessException e) {
			throw new IOException("Failed to get permissions to set library path");
		} catch (NoSuchFieldException e) {
			throw new IOException("Failed to get field handle to set library path");
		}
	}

Obviously, I don’t think that’s portable across JVMs, though.

Comments (2)

The problem with OpenID is…

The problem with OpenID is branding – people get (very) confused when they get taken off site to login. I’ve watched usability testing of this, and it is truly horrible. Obviously this isn’t unique to OpenID – it applies equally to any federated identity solution (in fact – Shibboleth based federations are even worse than OpenID in this respect).

I think user education will help, but it would be really good to be able to extend OpenID to be able to put a logo on the identity provider’s site so the user can see they are logging into site “blah” via whatever open id provider.

Comments

Why tech predictions are stupid (and a small prediction)

Every year hundreds of tech pundits go and make their predictions for the year – a trend I’m not immune to either. Alan Kay explained the problem with this the best: “The best way to predict the future is to invent it”. In a field like computing it is so easy for a single person to build something new it makes trying to make predictions a pointless Lose Weight Exercise.

None the less, here’s something that is less of a prediction and more an Lose Weight Exercise in deduction and rumor mongering. Sun is planning to launch a direct competitor to Amazon’s EC2 in the near future (not sure when exactly, but 2008 for sure). Note that this is different to the existing Sun Grid product (which will presumably continue).

Comments