I received a nice card from Google today. I know their motto is “Don't be evil”, but I never knew a company
could be so damn nice.
All posts by Nick Lothian
Classifier4J 0.5 is available
Classifier4J is my Bayesian classification tool written in Java.
I have just released version 0.5, which is the first release since July (yes I'm slack).
This release has a number of new features, and perhaps most impressively actually reduces the
dependancies compared to version 0.4. Now there is one less excuse not to use it!
Search Engine Bugs
Watching my referers, I suspect someon of doing some fairly deep analysis of the spidering and indexing patterns of
various search engines. How else could you explain referers like http://search.msn.com/results.aspx?ps=ba%3d(0.15)0(.)0…….%26co%3d(0.15)4(0.1)3.200.2.5.10.1.3.%26isURL%3d1%26aq%3dwww%2bbug%2bgoogle%26pn%3d1%26rd%3d0%26&q=www.bug.google&ck_sc=1&ck_af=0
and http://search.msn.com/results.aspx?q=w.w.w.google.ro.&FORM=SMCRB.
I don't have any real idea what these people are hoping to achieve. I'd speculate they are testing the popularity of
various words so they can possibly game systems like adwords, or maybe to register domain names for popular word combinations.
Anyhow, it's very weird, and I'm interested in any further informed speculation.
Aspect Oriented Programming Talk
I've uploaded the slides I did for an AOP talk I did at AJUG-Adelaide a month ago. They don't
go into a lot of depth, but hopefully they gave a good introduction.
The slides can be found here (BTW, if you are
using IE, turn on page transitions – Tools->Internet Options->Advanced Tab->Check “Enable page transitions for the full powerpoint-like effect…).
Fixing the IE6SP1 HTTPS bug
If you are having trouble similar to the issues described in KB305217,
the symptoms of which are:
When Internet Explorer version 5.5 Service Pack 1 or later tries to POST data,
GET data or set up an HTTPS connection with the connect command,
Internet Explorer generates an error message that indicates that the page could not be displayed.
This problem does not occur in Internet Explorer 5.5.
then you might find that turning off keep-alive on your webserver may fix the problem. It helped for one of our clients.
Some further info can be found in
this newsthread.
Yes, this is a fairly serious bug. No, Microsoft are not distributing a fix.
Why No Java Interface for JVMPI?
I think it would be very useful to have a Java interface to the
Java Virtual Machine Profiler Interface (JVMPI).
I suppose the performance requirements make this difficult. I believe there is some support for profiling and performance
monitoring coming in JDK1.5 – yet another thing I'd like sooner rather than later!
SOAP WebServices are not slow (!) – Updated
If (by some chance) you should need to connect a .NET client to a Java Serverside application, don't believe the
hype about webservices being slow. TEST you application – the results might surprise you. In our specific example
we tested 3 commercial .NET <-> Java communication tools, as well as Axis as the
webservice implementation. One of the three products performed better than webservices in terms of total speed and
scalability – and the pricing of that made it easy to throw clusters of webservice service machines in order to
match that performance. I'm not ashamed to admit that result surprised me somewhat!
The lesson: when concerned about performance, TEST, don't rely on your intuition.
Updated: for some reason it seems I lost this entry soon after posting it. I'm not sure why that was.
Computer based reasoning
Recently Clay Shirky gave a good run-down of the problems the semantic web faces in actually working. I agree with
his basic premise – that the semantic web won't deliver what it promises.
However, I believe that the more reliable metadata we have the better, and I think the web has so much good information
available that computers are begining to make a good approxmiation of giving us good answers to pretty much any
question asked in many fields – at least to a level somewhat comparable to humans.
In support of my argument I offer 1 Billion Pages = 1 Million Dollars? Mining the Web to Play “Who Wants to be a Millionaire?
from Overture Research. The abstract reads:
We exploit the redundancy and volume of information on the web to build a
computerized player for the ABC TV game show
“Who Wants To Be A Millionaire?”. The player consists of a question-answering
module and a decision-making module. The question-answering module utilizes question
transformation techniques, natural language parsing, multiple information retrieval
algorithms, and multiple search engines; results are combined in the spirit of ensemble
learning using an adaptive ing scheme. Empirically, the system correctly answers Weight
about 75% of questions from the Millionaire CD-ROM, 3rd edition –general-interest
trivia questions often about popular culture and common knowledge. The decision-making
module chooses from allowable actions in the game in order to maximize expected risk-adjusted
winnings, where the estimated probability of answering correctly is a function of past
performance and confidence in correctly answering the current question. When given a six
uestion head start (i.e., when starting from the $2,000 level), we find that the system
performs about as well on average as humans starting at the beginning.
Our system demonstrates the potential of simple but well-chosen techniques for mining
answers from unstructured information such as the web.
So humans are (only?) six questions better at “Who Wants To Be A Millionaire?” than a computer – without even using
the semantic web. With even imperfect meta-data, it's hard to imagine that not getting better over time (IMO, of course).
Sequence Diagram Generation with Spring
I've been doing some mucking around with the AOP part of the Spring Framework. It's quite nice
– fairly similar to Nanning, as
has been noted elsewhere.
Inspired by Crazybob's jAdvise SEQUENCE
tool, I've created a similar utility for Spring based applications
(actually, I think it would go close to working with anything that
implements the AOPAlliance
interfaces but I haven't tested that, yet). Currently it produces a Gantt chart
of the method timings, and a sequence diagram of their execution.
Archive.org carries daily news
Maybe this is old news, but I just noticed that The Internet Archive now
carries weekly (I think?) half hour summaries of news from around the world (especially the middle east) in quick time format.
The list of the programs
appears to have some gaps, though – unless I'm missing them somewhere.