I’ve been working on a large (Java) AppEngine project since January 2010. I recently left that job, but the project hasn’t finished and unfortunately I can’t talk about it yet.
During that time I learnt a lot of tricks and techniques for dealing with AppEngine’s idiosyncrasies, which have been useful for building a contextual advertising demo system: Qontex.com (brief synopsis: contextual affiliate ad distribution software. Not too sure what I’m going to do with it, but I had fun building it. The front end container is actually WordPress(!), but the UI is GWT and the backend is AppEngine).
Anyway, it seems useful to share a few things I’ve learnt.
1) Be pragmatic
I think of AppEngine as Amazon S3 plus some intelligence, rather than Amazon EC2 minus features. I find that a lot less frustrating.
If there is something you need that AppEngine doesn’t do well, don’t try and force it. Full Text Search is a great example: it’s horrible to try & get it to work on AppEngine, but installing Solr on a VM somewhere (or using a cloud Solr provider) is trivial.
2) AppEngine is a platform optimized for a specific type of application.
Don’t think of AppEngine as a standard Java application stack in the cloud. From the documentation:
While a request can take as long as 30 seconds to respond, App Engine is optimized for applications with short-lived requests, typically those that take a few hundred milliseconds. An efficient app responds quickly for the majority of requests. An app that doesn’t will not scale well with App Engine’s infrastructure.
Think about that for a while, and understand it well. Often Java developers are used to building corporate web apps where functionality is slowly built up over time. All too often a single HTTP request will have 4 or 5 database queries in it, and that is regarded as normal. That won’t work in AppEngine.
When you are working with AppEngine you’ll be thinking about performance continually, and differently to how you do with a normal Java application.
3) The datastore is dangerous.
In the development environment it has similar performance characteristics to a traditional database. In production it is slow at best, unpredictable at worst. If you come from an enterprise Java background, think of it as an integration server for a legacy API you are integrating with: data inside it isn’t going to go missing, but you should expect your connection to it will break at any point. You need to isolate your users from it, protect you application from it and consider carefully how to protect your data from outages.
I usually assume that a datastore query is going to take 200ms. Lately it has usually been better than that, but the variation is still a problem: http://code.google.com/status/appengine/detail/datastore/2010/11/23#ae-trust-detail-datastore-query-latency
4) Memcache is useful, but no silver bullet.
Memcache is useful because it has much more predictable performance characteristics than the datastore – and it’s a lot faster too. Generally, it’s pretty safe to rely on Memcache responding in less than 20ms at worst. At the moment its responses are around 5-10ms. See the Memcache status page for details: http://code.google.com/status/appengine/detail/memcache/2010/11/23#ae-trust-detail-memcache-get-latency
A Useful Pattern
One pattern I’ve found useful is to think of user-facing servlets as similar to the UI thread in a GUI application. Blocking should be kept minimal, and anything that’s going to take significant time is done from task queues. This includes anything beyond a single “GET” on the datastore (note that a GET operation is very roughly twice as fast as a datastore query)
For example Qontex has a process that relies on content analysis. I currently do that on-demand rather than attempting to spider the entire internet. The demo “Ad Explorer” front end is written in GWT, and it works like this:
1) Send a request to the analyze URL, passing the name of a callback function (for JSONP callback)
2) The backend checks Memcache for data about the URL. If it isn’t there, it fires an AppEngine task queue request to analyze the URL and returns a JSONP response that contains a status_incomplete flag and a wait_seconds parameter.
3) The GWT client gets the response, and sets a timer to re-request in wait_seconds seconds.
4) Meanwhile, back on the server the task queue task is being processed. That task will load the results into memcache.
5) The client re-requests the analyze URL, and this time Memcache has been loaded so the servlet can built a response with the correct data.
I use a similar, but simpler pattern to write to the datastore.
When an ad is served, or when a user clicks an ad I fire a task-queue request to record that, which lets me send a response much quicker. AppStats is great for showing this graphically:
As you can see there it would be sensible to bulk up all those memcache reads into a single read on a composite object. At the same time, the entire servlet responds on 37ms, which isn’t too bad, and some of those memcache calls are conditional – but the point is that AppStats gives great visibility into exactly how your application is performing.