Category Archives: Uncategorized

JBoss Portal

March 30, 2005UncategorizedNick Lothian

JBoss Portal 2.0 looks pretty interesting. Now that it is JSR168 compliant (what were they thinking with not doing that in the initial version?) and has a reasonable feature list it will be good to see some more competition in the open source portal market place. Hopefully the increasing number of players (EXO, Liferay, Jetspeed 2.0 and now JBoss Portal) will increase the quality of the offerings.

Multi threaded programming

March 27, 2005UncategorizedNick Lothian

More on multi-threaded programming models:

Tim Sweeney resumes:

“You can expect games to take advantage of multi-core pretty thoroughly in late 2006 as games and engines also targeting next-generation consoles start making their way onto the PC.

Writing multithreaded software is very hard; it's about as unnatural to support multithreading in C++ as it was to write object-oriented software in assembly language. The whole industry is starting to do it now, but it's pretty clear that a new programming model is needed if we're going to scale to ever more parallel architectures. I have been doing a lot of R&D along these lines, but it's going slowly.”

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2377&p=4

I've posted previously on multi threaded programming under Java 1.5. The programming model is pretty powerful and worth investigating even now for Hyperthreaded CPUs

PDFs in IE

March 25, 2005UncategorizedNick Lothian

Matt Raible links to a description of IE's handling of PDFs. Having done some work in this area (and worked with someone else who had to figure this all out for himself) I can tell you that it isn't fun.

Buyout of SunGard

March 23, 2005UncategorizedNick Lothian

In the “Something that I normally wouldn't bother writing about but some of my readers might find intersting department” I'd just like to report that:

SunGard Data Systems Inc. announced today that it is in discussions with a consortium of seven private investment companies seeking to purchase the financial services disaster recovery service provider for more than $10.5 billion.

From http://www.computerworld.com/securitytopics/security/recovery/story/0,10801,100550,00.html

60% of Sungard's revenue comes from “its software and processing business”. Hmmmm….

Open Source Java at last(?)

March 16, 2005UncategorizedNick Lothian

From Jonathan Schwartz's Weblog today:

ps. stay tuned for news on Java's open source accessibility, too…

I hope that is what it sounds like.

Using a lib directory with Maven

March 14, 2005UncategorizedNick Lothian

People seem unaware that it is pretty easy to use a lib directory in Maven builds. Many ant builds use structures like below:


+---lib
+---src
    +---conf
    +---java
    +---test

I find this a useful layout, especially if you have jars that do not exist in the main Maven repository.

In Maven, you might have a dependency declared like this:


<dependency>
	<id>cas</id>
	<artifactId>cas</artifactId>
</dependency>

In your projects.properties file you can override the location of each jars like this:


maven.jar.override=on
maven.jar.cas=lib/cas-2.0.12.jar

You need to add an entry for each jar you want to override the location of. I haven't figured out how to make it work for all the alternate types of dependency declarations, though.

Obviously this (like too many things in Maven) is something that should be a lot easier than it is. I think that the project.xml should be able to specify a build-repository that is checked before the local repository, since that would make self-contained builds a lot easier. In the mean time, this is a useful measure that I've found useful.

160TB of cache

March 8, 2005UncategorizedNick Lothian

With such a large MSS system, a large disk cache is
still essential to deliver data to jobs in reasonable time.
SLACâ€™s 1.3PB is currently backed by 160TB of disk
cache.

Lessons Learned from Managing a Petabyte

That would make a decent home media server.

Personalized search using user's local files

March 8, 2005UncategorizedNick Lothian

Projects on display during a Microsoft Research event yesterday included a method for personalizing Web search results based on the contents of the files on an individual user's computer hard drive.

The project reflects a broader push in the industry to improve the relevance of Web search results by tailoring them to the person doing the searching. But other programs, including Google's personalized search engine, have approached the challenge by having users create profiles to define their preferences.

http://seattlepi.nwsource.com/business/214288_techfest03.html
via Greg Linden

If I wanted to improve personalized search I'd use the user's email (either GMail, Hotmail or Yahoo Mail) to discover their interests. I'd use any organisation the user has done using their mail folders to cluster the search results and I'd use their contact lists to highligh results from people they know.

Ask.com (who I'd give the prize for “second best search engine after Google” to) don't have an email offering so they couldn't compete in that way. They do, however have Bloglines which would allow significant personalization. Yahoo (with MyYahoo) and MSN (with something – is it called MyMSN?) both have blog readers integrated which would allow discovery of the users interest as well.

If I were building a new search engine and I wanted to compete on the basis of personalized search the first thing I'd do is make it easy to subscribe to search results. A search subscription is a great indicator of interest! Unfortunately there are some technical difficulties involved in identifying who is subscribed to which search (most client-side aggregators don't share cookies with browsers) but there are ways around this (unique URLs for each subscription, and only logged in users can create subscriptions).

PubSub.com is currently the closest thing around to an ultimate database of users search preferences, though.

Move the code to the data

March 7, 2005UncategorizedNick Lothian

Kevin Schofield point to some Jim Gray papers, including a new (Jan 2005) one that I hadn't read.

The paper discusses the challenges around working with multi-petrabyte scientific datasets. There are some interesting approaches discussed (including Google's Map Reduce of courser).

However, I often wonder why this little gem from Dan Creswell never got more attention. He has written extensions to JavaSpaces that allow the code (think queries & processing instructions) to migrate to the data, instead of the code trying to download the data and process it. In a prototype system:

I then ran a test with each version submitting ten objects and then removing them from the queue. The code-uploading version was nearly 7 times as fast and I'm certain that as concurrency increases, the performance gap will get greater still as contention increases

Code Downloading for Improved Performance

I'd love to have time to look at this more, but I'd imagine that would be an approach that would work well.

Google Desktop Search 1.0