Ted Leung's post about the text summarisation in MacOS X got me
back working on the text summarisation in Classifier4J.
I committed an early cut of the code tonight – it works pretty well, but needs a lot of optimisation.
It allows you to specify how many sentances you want the summary to be. Here's a summary of Ted's post in two sentances:
John Robb linked to DEVONthink which is a free form information manager for MacOS X. One thing that I noticed while reading the pages is that Mac OS X has a text summarization service built in.
here it is with three:
John Robb linked to DEVONthink which is a free form information manager for MacOS X. It looks like you just dump all your information in there and turn it's recognizers loose and it sorts it all out for you. One thing that I noticed while reading the pages is that Mac OS X has a text summarization service built in.
and this is four:
John Robb linked to DEVONthink which is a free form information manager for MacOS X. It looks like you just dump all your information in there and turn it's recognizers loose and it sorts it all out for you. One thing that I noticed while reading the pages is that Mac OS X has a text summarization service built in. This is a great thing to have as a system service.
Apparently, the MacOS X service comes up with:
It looks like you just dump all your information in there and turn it's recognizers loose and it sorts it all out for you.
One thing that I noticed while reading the pages is that Mac OS X has a text summarization service built in. I've been looking for something like that for a long time.
…It turns out that the Open Text Summarization library being used in AbiWord is now up on SourceForge.
That might be a bit better than the Classifier4J output, but not too much. Mentioning the Open Text Summarization library is
useful, but I think Classifier4J's choice of “This is a great thing to have as a system service.” instead of
“I've been looking for something like that for a long time.” is better. I also think the Classifier4J summary
makes better sense than the OS X one, because the first sentance provides better context – your mileage may vary,
though.
The code for this is available from the Classifier4J CVS archive in the net.sf.classifier4J.summariser (note the spelling!)
package. If it doesn't appear to be there, that's just the STOOPID sourceforge CVS backup thing – they run the Anon CVS access off the backup server, so it takes a day for
it to get copied over.