Solr + Hibernate

September 24, 2007java, random, techNick Lothian

Solr is good software. Hibernate is good software, and with Hibernate Search it uses Lucene for full text search.

It’s possible to configure Solr to use arbitrary Lucene indexes. I think it would be great if someone (else!) would do the work to configure Solr to work with Hibernate Search.

Beware the fast follower

September 23, 2007techNick Lothian

Regarding Google To â€œOut Openâ€ Facebook On November 5 – beware the fast follower.

Facebook has a pretty nice API, but depending exactly what Google shares it could be possible to build some pretty impressive applications. Imagine knowing the frequency each Gmail contact was emailed… that would make facebook.friends.areFriends look kind of primitive.

Oh, BTW – I was wrong (or more charitably – misguided) about this stuff. Brad was right – making public data portable is the only safe way to go.

Quick & Dirty Server Monitoring

September 23, 2007random, techNick Lothian

Sometimes it’s difficult to setup Nagios for server monitoring. This is what I do instead.

Firstly, for load monitoring:


#!/bin/bash

FILENAME=< absolute path >/monitoring/logs/load-$(date +%Y%m%d).txt

cat /proc/loadavg | awk '{print strftime("%Y/%m/%d %H:%M:%S", systime()), $1, $2, $3}' >>  $FILENAME

Run it both from cron, and then I use another cron script and gnuplot to graph the output.

genloadgraph.sh:



DATE=$1
if [ -z $DATE ]; then DATE="$(date +%Y%m%d)"; fi
FILENAME=load-$DATE.txt
cp < absolute path >/monitoring/logs/$FILENAME < absolute path >/monitoring/load.txt
gnuplot < absolute path >/monitoring/loadplot.p
rm < absolute path >/monitoring/load.txt

loadplot.p:


set terminal png large size 800,600
set xdata time
set timefmt "%Y/%m/%d %H:%M:%S"
set title "Load"
set format x "%H:%M:%S"
set out '< absolute path >/monitoring/load.png'
plot "< absolute path >/monitoring/load.txt" using 1:3 title '1 min average' with lines, "< absolute path >/monitoring/load.txt" using 1:4 title '5 min average' with lines, "< absolute path >/monitoring/load.txt" using 1:5 title '15 min average' with lines
set output

Gives a graph like this:

It possible to do a similar thing for website monitoring:



#!/bin/bash

FILENAME=< absolute path >/monitoring/logs/nicklothian-$(date +%Y%m%d).txt
(time wget -q --delete-after http://nicklothian.com/blog/) 2>&1 | awk '/real/ {print strftime("%Y

/%m/%d %H:%M:%S", systime()), $2}' >> $FILENAME

Preserving privacy while promoting social network portability

August 24, 2007blogging, social networks, techNick Lothian

Brad Fitzpatrick and David Recordon recently wrote an interesting paper Thoughts on the Social Graph which gathered quite a lot of attention. They addressed some themes which I’ve been thinking about for quite a while now, and certainly moved the issue on a lot more than the recent Wired article did.

There’s no doubt that Brad & David know what they are talking about, either. Indeed, if Tim O’Reilly invented Web 2.0, then I think it’s not much of an exaggeration to say that Brad wrote the software which powers it.

However, I think their approach to the social network problem is surprising. In particular, I think it’s odd that the people who invented OpenID are proposing a centralized repository for all social networking data.

I believe there are better approaches. I’ve proposed and built a demonstrator for a system using what must be one of the most under appreciated data structures of all time: the Bloom filter. In short, a Bloom filter is a compact data structure which will remember if it has seen a piece of data previously, without remember the data itself. Obviously, this is useful in the social networking context because you can do things like load up all a users contact and then make the Bloom filter public. That allows system to query the filter to see if they know another user, without exposing their contact list to privacy leaks.

Incidentally, that demonstrator is my first Facebook app. Writing Facebook apps turns out to be pretty nice, although in this case I wrote it in PHP -which is less enjoyable. Have I ever mentioned that I’m not a huge PHP fan? Perhaps that’s partially because I don’t know PHP at all, but it’s just such a goopy language. Mucking around with Ruby (which I don’t know either) makes you go hmmm.. that’s nice. Even in Javascript I find myself going hmm… okay.. not quite what I expected, but it kind of makes sense. Doing the same in PHP just makes you go hmmm… – not in a good way, either.

Sleep.. glorious sleep

August 7, 2007personalNick Lothian

Our boy Alex is 21 months old now. During the first 20 months of his life he sleep though 10 times, and we were often up for a couple of hours during the night and/or had to get up well before 6:00am. That was pretty tough, but then he learnt how to climb out of cot.. We had to buy him a bed and suddenly it was taking 2 hours to get him to sleep, and he was still waking up a couple of hours later.

After a week or two of that I gave in and agreed to see the sleep doctor. To my absolute and utter astonishment Alex is now going to sleep without crying and sleeps though the night at least 2 out of ever 3 nights. Even better – when he does wake up he goes back to bed himself.

So.. if there are others of you suffering though this.. there is hope!

Recently read books

July 15, 2007booksNick Lothian

I’ve read a bunch of books over the past couple of months. Here’s some short reviews.

Continue reading Recently read books →

Guessing is much quicker than debugging

July 7, 2007java, techNick Lothian

My previous post Iâ€™ve already tried the â€˜waving a dead chicken over our serversâ€™ trick attracted a bit of attention, and quite a number of suggestions – thanks to all who contributed. The suggestions seemed to fall into four main categories:

Database tuning.
- This is a good suggestion, and is something we’ve done a fair bit of. In this case it doesn’t really help because the problem wasn’t performance but stability.
Introduce a caching layer
- We’d already done this, twice. We initially used an ehcache caching filter to fix some pretty serious performance problems. We later added some OSCache JSP cache tags in some critical areas in some templates (it was the addition of OSCache which caused the performance boost seen in my post on monitoring performance using the Google Webmaster Tools). As it turned out this combination may have been what caused our problem.
Rewrite everything
- Thanks. Let me know when you get a job in the real world.
Debug the problem
- This is what I figured we’d have to do. It’s something I was attempted to avoid because the issue seemed to be threading related, and we couldn’t reproduce it anywhere except our production environment.

We did have one stroke of good luck. We were able to predict when the site would stop working by monitoring the number of threads Apache was using and we could use this information to preemptively restart the site. We were able to modify the restart script to generate stack traces for all the JMV’s threads (kill -SIGQUIT <jvm pid>).

Since it looked like I’d actually have to start debugging this problem I started looking through the stack traces and I noticed that lots of the threads were in the ehcache filter. Now this wasn’t necessarily a bad thing, since all http request would be passed though it. However, it did make debugging harder, was easy to remove (just comment it out in the web.xml) and did have some potential to be a source of problems – in particular the cache-invalidation part.

So we took a punt and removed the filter and… it fixed the problem. Yay! I’m a genius and all that.

Except…. now the CMS is crashing with a NullPointerException deep in the data persistence layer. There’s also the small problem that I don’t have a clue why that change fixed it. Using the ehcache filter on its own worked fine, and there is no programmatic interaction between the ehcache and oscache code.

There is an alleged fix for the NullPointerException – but we have to take a point release of the CMS, and then patch it with a service pack to get it. Our previous experience with upgrades have been less than confidence inspiring.

In the mean time we have a script watching the site and restarting it when it crashes. It’s kind of like failover, without the over bit.

“You should just buy Google”

June 29, 2007java, techNick Lothian

Me to Unix Admin @ work: Hey – so I’m doing estimates for a proposal which needs between 30 and 250 TB of storage – what do you know about mass storage?

Unix Admin: Hmm.. you should just buy Google….

Okay then! But seriously – Amazon S3 seems the obvious solution, and I’m also looking at OmniDrive. Any other suggestions are welcome. No hardware suggestions, though – I don’t have enough confidence in our operations group to do it in house (mainly because we don’t have an operations group…)

The future of web development looks a lot like….

June 25, 2007java, javascript, techNick Lothian

The future of web development looks a lot like this (Rails on Javascript on the JVM).

Continue reading The future of web development looks a lot like…. →

BadMagicNumber

My Blog, Take 4