Archive for August, 2004

Speaking at Builder Conference

I'll be at the Builder Conference in Sydney on 5-6 October. I'm speaking on Java Portlets - probably with a fair emphasis on Pluto, but I'll talk about portlet frameworks and various portlet issues, too.

Australia doesn't seem to have many good tech conferences - and even less that have any Java content, but this seems to be a reasonable agenda.

Comments

Interesting Search Paper

For all you search engine geeks out there: Block-level Link Analysis - analysis of web pages at the block, rather than page level, and using with algorithms such as Page Rank & HITS. (From MS Research, via Doug Cutting on the nutch-dev list)

To quote the abstract:

Link Analysis has shown great potential in improving the per-formance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.

My understanding is that this is similar to the method used by Google's AdSense program to work oout what advertisment to display on each page.

Comments

HTTP Caching & Cache-Busting

HTTP Caching & Cache-Busting for Content Publishers.

If you've never had to deal with this stuff you are lucky

Comments

The single worst mailing list post ever?

So Hani think the portlets list is bad? I've been a member for a while, and I do tend to agree that most of the posts are crap.

However, I think the J2EEPATTERNS-INTEREST list at Sun might be worse. As an example, I'd like to offer this sample of dazzling brilliance:

Hi,

I have a problem associated with Java-DB2. Though it is irrelevant to ask this kind of doubt in this forum. But this is the only forum I have subscribed to. Hence I am putting the question to you all. So please excuse me…

(See http://www.junlu.com/msg/31311.html for the rest of that message.)

I guess it was nice to apologise in advance for the off-topic post, espeically since it is more than most people on that list do.

Comments

Link Spam & Reverse Page Rank

With the rise of services like Technorati and Feedster which reward sites for linking to other sites (as opposed to Google's Page Rank model, which rewards sites for having links to them), I wonder how long it will be before link farms appear with nothing but links to popular blogs, in order to pick up hits from people using Technorati and Feedster to follow the conversations?

I think it's only a matter of time.

Comments

Dave on Rome

Dave Johnson has a written a good overview of how Rome hangs together. Read it along with the tutorials on the Rome Wiki.

Comments

Entity Aggregation in SOA

Even if I hadn't learnt anything else at TechEd, the pointer from Anna Lui to the SOA Challenges: Entity Aggregation paper on MSDN would have made it worthwhile.

I've done integration projects for the past 4 years, and in every single one we've run into the exact problems described in that paper.

Anna's talk at TechEd was pretty good, too.

Comments

TechEd Day 2

I didn't go to many product demos today, but I saw so many SOA talks it got pretty tiring (especially since most presenters hadn't seen the other presentations, so spent a long time going over things that had been discussed in depth elsewhere.)

I did pick up a few tips, though - mostly things that seem obvious in retrospect, but were expressed quite nicely:

  • Reference Data should be versioned, and once versioned it should never change. This means that properly referenced version data can be cached without needing to think about cache invalidation.
  • Having a big Web Service which takes various commands is a bad pattern, because it makes it more difficult to guarrentee an unchanging service contract
  • There was a lot of discussion about canonical schemes for data (eg: the correct way to represent a customer or order object) and how to convert between canonical schemes and internal representations. (It's less work overall to convert-on-write, but reader-makes-correct is a valid system which can mean less data is lost).

The Tech Ed party is tonight, which should be good.

Comments

Thoughts on Visual Studio Team System

I've been thinking a fair bit about VSTS (aka Whitehouse) since yesterday. It's fairly obvious that MS wants to move into the complete development lifecycle market -
while VSTS does not include a requirements tracking tool it includes everything else (issue tracking, source control, load testing etc, etc) and is fully extensible.

The obvious target for this move is Rational (which has been a division of IBM for while). MS kept VSTS pretty quite before the US TechEd, but it was fairly clear
that they had to do something about SourceSafe, so the source control section of it was probably apparent to IBM/Rational before it happened.

It's interesting to look at some of the Eclipse sub-projects in this light. For instance, the expansion
of the Hyades sub-project
covers a lot of the testing requirements. Rational has obviously built significantly on this for their toolset, but a significant
proportion is available for free.

It seems to me that Microsoft wants to compete with Rational features and will undercut them on price. I think the expanded Eclipse toolset will mean third
party vendors will quickly build tools come close to matching VSTS on features, but will be able to massivly undercut it pricewise.

This competition will be great for users, but I think a lot of high end tool vendors will struggle. I think in a years time companies like Mercury & Compuware will
need to either cut their prices and/or find new sources of revenue.

Comments

TechEd Day 1

So I've been to three and a half sessions plus a keynote on day one at MS Tech Ed.

The first couple of sessions were on the Visual Studio Team System which is coming out in the next release of VS. It was pretty impressive (although obviously still alpha quality.) If you've used the whole Rational tool suite you'd find it fairly similar but possibly better integrated. The presenter empahsised that it will cost a lot mroe than VS2003 Architect, though (I suspect it will still be less that Rational)

The guy doing the demo knew the competition, too - he mentioned Eclipse a couple of time.

MS needs to release VS2005 pretty quickly, though - people were still impressed by the “Create Method Stub” and refactoring (which IDEA & Eclipse users have had for a couple of years. I've been doing C# in VS2003 a bit lately, and I really, really miss refactoing support (and yes, I know about Refactory but haven't tried it yet)

The third session was on SOA and was pretty interesting. The presenter talked about various strategies for migrating to SOA, and spoke a lot about breaking up applications into smaller parts which could be addressed as services. I found that pretty useful, and would have liked to talk more about that.

Comments