All posts by Nick Lothian

Classifier4J version 0.3 is now available.

Classifer4J is a java library that provides an API for automatic
classification of text, including Bayesian classification. Version 0.3 is the first version recommened for general use.

Some of the many improvements include:

  • The ability to train the BayesianClassifier via a ITrainable interface, rather than requiring updates
    to the datasource.
  • Performance and design improvements to the JDBCWordDataSource.
  • Stop Word support.
  • Internal Refactoring, particually with respect to the WordProbability object (thanks to Pete Leschev).

Classifier4J is available from http://classifier4j.sourceforge.net/

Classifier4J, NNTP//RSS and Bayesian Blog Classification.

I now have Classifier4J and
nntp//rss working together to do Bayesian classification of RSS feeds.
There are a few things still to work out (perfomance and usability to name two), but I'm pretty pleased with it, since it
was something I whipped up in a couple of hours. AFAIK it is the first Bayesian/RSS thing that has got far enough to have a screenshot…

(Updated to fix link to image)

More dotNet vs Java

http://weblogs.asp.net/jprismon/posts/9824.aspx raised a couple of valid points that need more than the superficial
comments I posted yesterday.

In particular:

Microsoft has completly committed to .NET. Longhorn's new features are all managed code.

I've done a bit of research about this, and I'm not convinced it is true. While all the new features in
Longhorn (eg the file system, Active Directory enhancements etc) will undoubtably expose managed code interfaces
I doubt they will themsleves be written in .NET. I know the new version of IIS has some of the code moved into the Windows kernel
(I think the correct terminology is “it runs in ring 0”?), and code that performance optimised is unlikely to be managed.
Note that you wouldn't do it in Java, either, so this isn't a particular weakness of .NET

Microsoft's most profitable Business Aplications are being ported as we speak. BizTalk, Office,
and the OS all have managed serviced components now, and the next version of SQL will have
extremly rich CLR support.

This is true, and is a big deal. Increasingly I suspect we'll see Office's .NET interfaces used from other applications,
kind of like people use to automate Excel & Word from COM. This time it will be easier, and Office will be designed for
doing the kind of batch-procssing & workflow which people want. In the Java world OpenOffice exposes some Java interfaces.
I can't comment on how good they are. However, many databases (Oracle, DB2, Sybase ASE & ASA for example) all have
extremely rich Java support. This is pretty mature and looks good in comparison to SQL Server's .NET support.

Interoperability rocks in .NET. Not just platform (mono is doing a great job) but also interop based on the WS-I stack

I don't really understand what is being said here. Interoperability with what? I do a lot of fairly hairy integration work
in my day job and I can speak from experience when I say that 7 Bit ASCII works really well in both .NET & Java, but
anything else seems to have edge cases that have issues (Mostly in the crappy proprietary libraries we need to use). SOAP
over HTTP is usually okay from both .NET & Java. Over all, I don't see this as a particular win either way. From the
integration point of view J2EE has the JCA spec which is quite nice – unfortunalty you need to rely on your vendor to
suply a JCA complient connector, though.

Java is at best a niche platform. When was the last time you saw any non server/specialized software written in Java?
Of the top ten software software packages (Windows, Office, SAP, PeopleSoft, Oracle, SQL, Quicken,
Quickbooks, TaxCut, Microsoft Money) how many of them are actually written in java? 0/10. Microsoft
owns 90% of the CPU market. Microsoft has decided to slip .NET until Longhorn, but it is out there in the
hands of extremly productive developers.

This is a fair point (although SAP, PeopleSoft & Oracle all have significant Java components). How many are written
in .NET, though? (0/10) I'll conceed that Office & SQL Server will have significant .NET components in the next release,
but that will really only match what is in Oracle, SAP & PeopleSoft right now.

Reflection, Inspection, Attributes and Events. Simpler in .NET, more powerful in .NET.

Yes to Reflection, Inspection & Attributes. I'd also add the dynamic code generator thing .NET has (whatever that is
called), and delegates. JDK1.5 will close this gap somewhat, though, and most of these features can be emulated in Java
right now. I don't know why you'd say .NET events are better.

ASP.net is a solid step up from ASP. Seperate of presentation and business logic is much more solid,
the rendering pipeline is more powerfull, and the security features rock.

Yes, ASP.net is a lot better than ASP. The Java servlet spec compares very well with it, though,
and there are a lot more third party Java tools than for .NET.

Sun fails the Dogfood test. Number of critical applications in Solaris that are or are being ported to Java?
None, ask Sun why that is (not scalable, not fast). How much of Windows is being ported?
The whole Shabang (see Longhorn). I will be happy to re-examine Java seriously for ongoing work when
Sun's rm6 utilities (including the command lines) are written in Java.

True. MS is always pretty good at dogfooding their stuff (except for Visual Source Safe!! What's up with that!!!).
However, I think it is an exaggeration to say MS is writing all of Longhorn in .NET.

Not only that, Sun is now lifting features from .NET, clearly there is some new and cool features here to get
the ever slow sun to actually change their precious language.

I don't think either platform can afford to get into the “you copied this from us” game (cough.. C#… cough…).

Compact Framework. Share code between WinCE devices and your platform. Tie them together via Webservices
with a single click of the mouse.

Java has .NET beaten here. Java is on millions of phones and PDAs right now, and has thousands of applications in use.

Rich clients. Have the interoperability and accessability of the web without stateless programming
enviornment and pretty graphics.

Java has Webstart. However, I'd agree that .NET is a better rich client platform.

Integration. Don't want to rewrite all of your companies security? Use Domains and Roles.
Don't want to implement your own message Queue? Already There. How about Transactions, JIT ACtivation,
automagic threading? Done.

I really don't understand this one. Java is very, very strong in all these areas, with thousands of deployed
applications.

Overall, I'd say on these points it's not a clear win to either platform. The important point is that both
platforms are strong in some areas, and to say that isn't true is just FUD. .NET is a very, very good platform
and you'd be silly to write it off.

Re: Another ignorant discussion on .Net is 'better' than Java

I read http://freeroller.net/comments/Sosume?anchor=another_ignorant_discussion_on_net today, about
http://weblogs.asp.net/jprismon/posts/9824.aspx and I felt the need to comment (nothing like a good .NET vs Java
argument, is there?)

Exactly which comments are ignorant? I'd say I agree with most of his comments.
I might argue the toss on:

  • (4) – I only have superficial .NET, so I can't argue too hard
  • (14) – I'd say MIDP & J2ME are stronger in the market than the compact framework.
  • (16) – I don't quite follow the argument here. Security: java has a pretty good case here. Message Queue: use JMS.
    Transactions: JTA. JIT Activation: javax.activation.* Threading: java.lang.Thread

but the rest sounds reasonably accurate to me.

Use 1.4 RegExp

Mr strayneuron is
complaining about things that require JDK 1.4, in particular Amazon's Web Services API
which uses RegExp.

If it needs a RegExp, why not use JDK1.4? It's been out for over 18 months is
stable, faster that 1.3 and has better features. IMHO it is better to use something
in the core API than introduce an external package. I've used ORO fairly extensivly,
but I prefer to use 1.4 RegExp when I can, even at work (where I am much more conservative
with choosing mature APIs). It really is a pretty good implementation, and compares
well with anything else out there.

Part of the reason the code he is complaining about is only one line is because
they use 1.4 RegExp. If they had used ORO it takes at least 2 statements, plus an import.
On could argue that regular expressions should have been in the language all along,
which I would have to agree with.

Flash Development

For assorted reasons, I need to learn Flash programming. I'm not interested in creating little movies or stuff like
that – all I want is to use Flash MX like your average 4GL IDE, and then do some Flash Remoting.

I'm sorry, but the Flash MX IDE sux for developers. Why do I need to care about timelines, and having my buttons
play movies (what's with the obsession with movies, BTW?). I don't care about timelines, or layers or frames per second.

Why can't I put a button on a form, double click on it an see the code it calls? VB, C#.NET, Delphi and PowerBuilder all
work like that – you might think that it was that way for a good reason! But NO! Macromedia now requires me to think like a
designer or something.

The weird thing is that Flash MX seems to have everything an IDE needs – there is a Components pane, a Properties
pane, an Actions pane and a visual form designer. They just don't seem to work like I'd expect.

For example:

  1. I put a button on a form. It appears there, and dosn't do anything weird. Good!
  2. I double click on it, and nothing happens. Not Good. But I look down, and there in the
    Properties pane is it says “Click Handler”. Looking Hopeful.
  3. I double-click the “Click Handler” thing, and it enables me to type in the space next to it. Very Hopeful.
  4. I type “testHandler” and press enter. Nothing weird happens, so I presume I have a testHandler function created.
    I'm getting excited now!
  5. I double-click on the testHandler thing, hoping to get taken to where I can enter the code. Nothing happens.
    Hmmmm.
  6. But wait – there is an “Actions” pane! It says “Actions for testButton (PushButton)”. Okay…
  7. I press the little “+” button, and it brings up a menu. I muck around with this for a bit, and I get some code that
    looks like:

    onClipEvent (mouseUp) { testButton.setLabel("The Name"); }
  8. I try running the movie, and pressing the button doesn't do anything. Depressing. I muck around with the
    code a lot, and it still doesn't do anything Very Depressing.

  9. arhrhrhhhhh!
  10. I quit in Flash MX in disgust.

Now obviously Flash can detect a button press and then do something. This means either I'm stupid, or that I'm missing
some important concept that I need to understand to use Flash. I think it's the second option (could well be wrong, though),
and I guess I'll just keep trying until I figure it out. (Don't suggest the help – it doesn't.)

My point is that it shouldn't be this hard! It took me less time to figure out how to do the same thing in C/Motif
than in Flash, and that (a) didn't have an IDE, and (b) was on a AMD586, on Linux, using Lestif instead of Motif.

I think the key to it all is that “testHandler” on the button. If I could figure out where the code for that is
hidden then I think I'd be okay.

AJUG Mailing List

A quick pointer for all you Aussie Java Programmers. Check out
the Australia Java User's Group mailing list: http://groups.yahoo.com/group/ajug/

It's not dependant on AJUG membership or anything (I'm not a member, although I
have been to the occasional AJUG-Adelaide meeting), and in the last couple of months
has had some interesting discussion.

It doesn't compare to the DevelopMentor AdvancedJava list ( http://discuss.develop.com/advanced-java.html )
for depth of discussion, but is less intimidating to post “silly questions” to.

In my prior life as a Delphi programmer, I used to inhabit the Delphi mailing lists (in their many forms), and
I have to say they were a lot better than any Java mailing list. Of course, there aren't as many active Delphi communities
as Java communities, so you'd see the same names poping up all the time. Occasionally I think I recognize one who's
crossed over into Java – for instance I think Glen Stampoultzis
of POI fame used to be a Australian Delphi guy, too.

Speaking of Aussie Java Programmers (or not): Win32
AJP documentation
. ahhrggg!

The Blojsom Calendar Plugin

I noticed today that my calendar was only showing the posts that were on
the front page of my blog. After a little investigation I figured out that this was because
I had the blojsom limiter plugin
configured before the blojsom calendar
plugin in my plugin.properties file. Hopefully this will get indexed so if anyone
else has the same problem this will help them some.

My html display chain now looks like:

html.blojsom-plugin-chain=calendar-filter, comment, trackback, \ sendemail, calendar-gui, limiter, simple-search, referer-log

Garbage Collection

I had a discussion with a friend today about Garbage Collection. He has quite
a “messy room” problem – his bedroom is so full of stuff that he couldn't close the
door for a few months, and spend 3 (?) weeks sleeping on the couch because there was too much
stuff on the bed.

We decided that his problem was that his garbage collector isn't doing
minor collections of the
new object eligable for GC, so they just lay around and wait for a major collection.
However, there is so much garbage waiting for that major collection that the
Mark-compact collection algorithm used there is continually fighting a loosing battle
against the garbage.

We also decided that leaving stuff at a friends (or parents) house is actually a memory
leak.

How am I going so far, Superman? ;-)

Also, does this code looks familiar to anyone else:

public class Superman implements Runnable { private boolean doneEnoughWork = false; private boolean feelingSmartToday = false; public void run() { while (!doneEnoughWork) { work(); } if (feelingSmartToday) { // set the alarm object to interrupt this thread in about // 8 hours AlarmClock.set(Thread.currentThread()); } else { // BodyClock is an unstable time keeping // mechanism and should be treated with caution. // It is designed for use on resource constrained VMs // such as the PrimativeMan embedded VM and the // WeekendVM // // It may interrupt the thread passed in roughly // 8 hours, plus or minus and hour or two. BodyClock.pleaseWakeMeUp(Thread.currentThread()); } try { Thread.currentThread.sleep() } catch (InterruptedException e) { getUpAndGoToWork(); } } }

Maven and CVS Connundrum

So I've fixed my CVS/Maven issues. It turns out that you need to specify a CVS repository like:

scm:cvs:pserver:anonymous@cvs.sourceforge.net:/cvsroot:classifier4j

rather than

scm:cvs:pserver:anonymous@cvs.sourceforge.net:/cvsroot/classifier4j

Apparently, this is assumed knowledge or something, or is just a CVSism.

I was basing my setting on the sourceforge documentation:

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/classifier4j login cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/classifier4j comodulename

The SF documentation is usually
quite good…. Oh well… the joys of different development environments.