All posts by Nick Lothian

Switched to Feedburner

October 9, 2006java, techNick Lothian

I've switched my feeds at my BadMagicNumber blog to be published via FeedBurner. There should be no disruption to your normal programming, although I think some aggregators will show some non-new items as new.

I switched the feeds over using a simple Servlet Filter. If anyone wants to do the same, here's the code. This works for blojsom, but you might need to modify it slightly for your own setup.


public class FeedBurnerRedirectFilter implements Filter {
	private String redirectURL;
	
	public void init(FilterConfig config) throws ServletException {
		redirectURL = config.getInitParameter("redirectURL");		
	}

	public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) 
						throws IOException, ServletException {
		String flavor = request.getParameter("flavor"); 
		
		if ("atom".equals(flavor) || "rss".equals(flavor) 
				|| "rss2".equals(flavor) || "rdf".equals(flavor) ) {			
			HttpServletRequest httpRequest = (HttpServletRequest) request;			
			HttpServletResponse httpResponse = (HttpServletResponse) response;
			
			String userAgent = httpRequest.getHeader("User-Agent");
			if (userAgent != null && userAgent.indexOf("FeedBurner") < 0) {
				/// redirect if not feedburner
				httpResponse.sendRedirect(redirectURL);
				return;				
			}
			
		}		
		chain.doFilter(request, response);		
	}

	public void destroy() {
		
	}
}

Using XPath on real-world HTML documents

September 11, 2006javaNick Lothian

The article on the Server Side about Web-Harvest reminded me on one of my favourite things: using XPath to extract data from HTML documents.

Obviously, XPath won't work normall on most real-world webpages, because they aren't valid XML. However, the magic of TagSoup gives you a SAX-parser that will work on ugly HTML. You can then use XPath against that SAX stream.

Here's the magic innvocation to make TagSoup, XOM and XPath all work together:

XMLReader tagsoup = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser"); tagsoup.setFeature("http://xml.org/sax/features/namespace-prefixes",true); Builder bob = new Builder(tagsoup); Document doc = bob.build(url);

which then allows you to do things like:

XPathContext context = new XPathContext("html", "http://www.w3.org/1999/xhtml"); Nodes table = doc.query("//html:table[@class='forumline']", context);

Cool, hu?

My New Blog

August 28, 2006techNick Lothian

For a variety of reasons I've started a new blog: http://wwwscope.com/. I probably wont write much Java stuff on there (not that I seem to here, either at the moment), but my intention is that it will contain mostly longer technical content.

I've started off with an analysis of the recently released AOL search logs.

I've got comments enabled over there, so let me know what you think…

When all you have is a hammer

August 6, 2006techNick Lothian

The biggest difference between 1.5.0 and 1.7.1 is that 1.7.1 uses two databases at once.

In order to support fulltext matching, a new feature, we use a MyISAM database in MySQL with FULLTEXT keys 
(see http://dev.mysql.com/doc/refman/4.1/en/fulltext-search.html).

However, InnoDB is still faster for JOIN's and offers referential integrity, so as a compromise we run two databases and keep them 
synchronized with MySQL replication (see http://dev.mysql.com/doc/refman/4.1/en/replication.html).

From the Connotea docs

And I'm back.

July 18, 2006personalNick Lothian

Yes, I'm back (and have been for a few weeks).

The wedding went well – and yes, thanks to a last minute flight to London by my friend it actually was a real wedding.

I had the worst trip home ever – left Edinburgh at 9:00 am on Wednesday, arrived home at 7:30 pm on Friday. I am sick of airports, and airplanes where the seat won't lock in position, so if you have it upright it slowly starts reclining, and if it is reclined it slowly starts going upright.

In unrelated news I'm now (a) working Monday-Wednesday so Maya can go back to work on Thursdays and Fridays while Alex & I hang out, and (b) I'm now technical architect at work. I'm not quite sure how (a) will effect (b), but we'll see….

Away for two weeks

June 6, 2006personalNick Lothian

I'm off to the UK for two weeks for my friend's wedding (which is now a “commitment ceremony” due to various immigration issues).

I'll be checking my email, but don't expect well-thought out responses….

For the interested, here's some stuff to read while I'm away:

Interactive TV: Conference and Best Paper

This paper describes mass personalization, a
framework for combining mass media with a highly
personalized Web-based experience. We introduce
four applications for mass personalization:
personalized content layers, ad hoc social
communities, real-time popularity ratings and
virtual media library services. Using the ambient
audio originating from the television, the four
applications are available with no more effort than
simple television channel surfing. Our audio
identification system does not use dedicated
interactive TV hardware and does not compromise
the userâ€™s privacy. Feasibility tests of the proposed
applications are provided both with controlled
conversational interference and with â€œliving-roomâ€
evaluations.

Detecting Spam Web Pages through Content Analysis

In this paper, we continue our investigations of “web spam”: the injection of artificially-created pages into the web in order to influence the results from search engines, to drive traffic to certain pages for fun or profit. This paper considers some previously-undescribed techniques for automatically detecting spam pages, examines the effectiveness of these techniques in isolation and when aggregated using classification algorithms. When combined, our heuristics correctly identify 2,037 (86.2%) of the 2,364 spam pages (13.8%) in our judged collection of 17,168 pages, while misidentifying 526 spam and non-spam pages (3.1%).

I like this paper, because I used some very similar techniques in my de-spammed version of the Google Blog Search.

Develop AJAX applications in Java (just Java)

May 16, 2006javaNick Lothian

Google Web Toolkit (GWT) is a Java development framework that lets you escape the matrix of technologies that make writing AJAX applications so difficult and error prone. With GWT, you can develop and debug AJAX applications in the Java language using the Java development tools of your choice. When you deploy your application to production, the GWT compiler to translates your Java application to browser-compliant JavaScript and HTML

From Google Web Toolkit – product overview

Wow. And it looks like it actually works….

Google Reader Homepage Module (the Google version)

May 4, 2006techNick Lothian

So it looks like my Google Reader Module is now somewhat redundant.

It is nice to see how similar it is to my effort, though.

First Greasemonkey script: Mark All As Read in Google Reader

February 21, 2006UncategorizedNick Lothian

My quest to make Google Reader work the way I want it to continues.

One of the biggest complaints people have with Google Reader is that there is no way to mark all items as read. Having unread items make people feel under pressure, which I (I believe) is one of the things that river-of-news style aggregators are supposed to avoid. Unfortunately, not feeling the pressure of unread items in Google Reader at the moment is something of a Zen meditation Lose Weight Exercise.

This Greasemonkey script solves that. To use it, go to the “Edit Subscription” page, then in the “More Actions” box you'll find a “Mark all as read” action. The first time you use it you may need to run it a couple of times to completely clear out your reading list (there doesn't seem to be a good way of finding out if the reading list display is empty or not).

Tested with Firefox 1.5.0.1 and Greasemonkey 0.6.4.

Download the script here.

Screenshot

Google Reader Module now available via IG Directory

February 16, 2006UncategorizedNick Lothian

My Google Reader Module is now available in the IG directory and approved as an inline module. This means you no longer need the developer module to use it.

Thank's to Mihai & Adam from Google for their help with this.