All posts by Nick Lothian

Java Web Hosting

I find it amazing how expensive Java based website hosting is. I'm currently paying $20 Canadian/month, for my own VM
and Tomcat instance, plus MySQL. That's not too bad, but it is a lot more than the US$5/month I was paying for Perl/PHP/MySQL
hosting previously. I realize that Java Servlet hosting takes more memory than Perl/PHP, but I'm surprised that the difference
can be justified. I think it might be a lack of competition in the Servlet hosting market.

Ideally, a hosting provider would have:

  • JDK1.4.2
  • Tomcat
  • MySQL
  • Access to the Apache<->webapp mappings
  • An easy domain management system
  • Decent uptime and storage limits

HotSwap Client Tool

I just discovered Sun's HotSwap Client Tool – a tool for
dynamically updating classes while they are running. This uses the same features of the 1.4.x VMs which Eclipse and other IDEs
use to Hotswap running code while debugging.

Sun's tool is at least partly targeted at updating deployed applications however. It has some nice features like:

  • Get the list of classes currently loaded by the VM
  • Find which classes have changed
  • Compare the source code for the versions

It's under a BSD licence, too, so you are free to use it in your own apps. I'd be interested to hear what
people think of it.

jvmstat

As menti
oned yesterday
, I've been playing with jvmstat. It
allows you to visually watch garbage collection going on in your
application. It's quite fascinating to watch the difference as you try
different garbage collection algorithms. For instance, using the -XX:+AggressiveHeap algorithm makes a huge difference to the amount
of GC Time (or at least it did in my benchmark).

It's also very helpful in understanding how the various garbage
collection algorithms work – something I've always glossed over in the
past.

Below are some traces from JBoss being hit pretty hard by an increasing
number of clients. The first image shows it just starting up – note that
there has been little memory reclaimed by the old generation garbage
collector. In the second image you can see the saw tooth pattern in the old
generation memory usage. It is also interesting to see the copying of object
between the two survivor generations.

jvmstat Graph
jvmstat Graph

The third image shows the use of the AggressiveHeap garbage collector.
Note that most of the objects don't even make it out of the Eden pool. My
hypothesis is that this shows that most of my objects are short lived. A lot less time is
spent doing GC, too – compare the occasional spikes in the GC time graph to the almost continual GC in
the previous example.

Some useful links:

  • Tuning Garbage
    Collection with the 1.4.2 JavaTM Virtual Machine
  • Diagnosing a
    Garbage Collection problem
  • Frequently
    Asked Questions about Garbage Collection in the HotspotTM JavaTM Virtual
    Machine
  • ECPerf & SPECJAppServer

    At Monday's AJUG-Adelaide meeting we were fortunate to have Sun's
    Tom Daly give us
    a presentation on the ECPerf & SPECJAppServer benchmarks.

    It was a very good presentation – reasonably technical, but lots of
    interesting anecdotes, too. I'm sure if I was using Oracle on Solaris/Linux
    I'd have got even more out of it.

    Some of the things I took out of it were:

  • SPECJAppServer2003 (which may be called something different) will
    have a significant webservice component.
  • JDK1.4.2 is much, much faster on Linux & Solaris than JDK1.4.
  • The ability to scale J2EE by running it on more powerful hardware is
    important to Sun. To paraphrase Tom: The massive transactional benchmark
    numbers J2EE can get by running on large servers are important because J2EE
    CAN run on those servers (unlike .NET)
  • CMP in EJB app servers is good because server vendors have gone to
    extreme lengths to optimise its performance.
  • Intrigued by the potential performance benefits of JDK1.4.2, I spent some
    time today doing some benchmarking. In my simple client app->JBoss 3.2 test
    I'm finding roughly 5-10% increase in performance. This is under windows,
    with both the client & JBoss on the same machine, in separate VMs (512 M
    memory, but lots of stuff running).

    I also had a better look at jvmstat,
    which is very nice. It does non-intrusive GC monitoring of your Java apps. You don't
    need to enable any java debugging options for it to work, so it doesn't slow
    down your app at all while you monitor it.

    IIS Connection Limit

    Omri Gazitt (whos blog is a
    typical Microsoft one – very useful!) today posted how to (partially) get rid of the dreaded “Access Forbidden: Too many users are
    connected Internet Information Service” error on IIS
    .

    I've run into this error at the most inopportune of times – once we were
    demoing our app (running on a notebook, with IIS in front of JRun 4). We
    were plugged into the client's network, and they were mucking around with it
    while we were demoing. Of course, We start having bloody “Access Forbidden”
    errors, and have to try and explain WTF is going on (and why it's not a
    problem on the real system) to a whole lot of people who were much more
    interested in the app for what it did than the technology. Very
    embarrassing!

    Re: Bayesian Filtering: The Spam Fights Back

    Charles Miller had an interesting post about
    spam he is recieving that is designed to get through Bayesian filtering.

    I am of the opinion that Bayesian filtering will eventually be only one of a range of filters which people will need to
    deploy against spam. I'm optimistic that combinations of text filtering algorithms (including, but not only
    Bayesian alogorithms) can continue to be effective for some time.

    I think other filters are needed, though. For instance, in the spam that Charles recieved many words (designed to
    fool Bayesian filtering) were styled to be invisible. This used to be an old search-engine spamming technique, but now
    Google detectes that, and actually uses the stylistic structure of the web page (ie – the appearance) during its analysis.
    I can't think of any reason why mail filters can't do the same thing.

    (Disclaimer: I've written an open source package for Bayesian filtering in Java)

    An interesting web application infrastructure issue

    Many web applications depend require cache control for pages, especially if
    they involve user logons or time-dependant data.

    Usually this is achieved with HTTP headers – something like (in JSP):

    response.setHeader("Cache-Control", "no-cache");
    response.setHeader("Expires","Tue, 30 May 1980 14:00:41 GMT");

    An alternative, which usually work well is to require your site to be run
    under HTTPS. In theroy, this seems ideal, since it provides security as well
    as cache control.

    However, beware of the impact of things like reverse-proxies. Many companies
    are installing reverse proxies in front of their web hosting machines
    to do request filtering in order to provide some
    protection against SQL injection & XSS attacks on their websites. This is a
    really good idea, but there can be some unexpected impacts.

    One I didn't expect was the impact on caching. Because the proxy needs to
    inspect the request, it decrypts it, then forwards the request as a HTTP
    request to the server. Many vendors list this as a feature, because it
    offloads some processing requirements tot he proxy-box, instead of the
    webserver. The catch comes if you don't have explict cache control in your
    pages AND the reverse proxy is a caching reverse proxy. In this case the
    proxy may return the cached content to the user, which is NOT WHAT YOU WANT!

    Another complicating factor is that some reverse proxies forward the original HTTP
    1.1 requests to the server as HTTP 1.0, and seem to ignore HTTP 1.1 headers
    that are returned. This can bite you if you only use the “Cache-Control”
    (HTTP 1.1 only) header.

    Lesson:
    Always provide explict cache control AND expiry headers, and never rely on
    HTTPS to control caching for you.

    DRM HTML via IE – the death of "View Source"?

    While downloading assorted windows patches today, I came across an interesting link:
    Rights Management Add-on for Internet
    Explorer
    . I'm a little surprised there hasn't been an something of an outcry over this one:

    Document authors, Web site authors, and creators of Web-based applications can deliver protected information by
    restricting permission. This provides protection, not only while the information is in transit, but also after the
    recipient of the information has received it.

    I understand that this is useful technology for traditional content providers – but in my view it really goes againt the
    whole philosophy of the web. Can you imagine if the original Mosaic had this stuff?

    From the software politics point of view, I think it is an interesting anti-PDF play from MS. If you can do DRM in HTML,
    then it's means you don't need to use encypted PDF (or eBooks for that matter).