On JavaBlogs

There has
been a bit of recent discussion about the fact that
as JavaBlogs grows it is changing, with a few problems with what some people see as
low quality posts.

Gerard has outlined the four main methods of making
a community scale, but I would like to suggest a fifth. IMO, I believe that automatted text categorisation can increase the
size a community can scale to without requiring non-software intervention.

I've done some
experimentation with using text analysis algorithms
for simple match/non-match categorisation. I believe something as simple as Bayesian classification for blog posts can go some way
to improving the quality of links on the “Hot List”.

Todays Java.Blogs posts

Today's Java.Blogs posts

Ultimatly, I think that some of the more advanced text categoriation algorithms
might be even more useful. For instance, Google News manages to categorise its stories fairly well, and I believe they do most
of that automatically. NewsInEssence categorises news into “clusters” atomatically.
A quick look on citeseer shows plenty of
algorithms around, and I'm pretty sure the author of Classifier4J
might be interested in implementing at least one.

Leave a Reply

Your email address will not be published. Required fields are marked *