Now that there is a basic SQL database (in Javascript!) to allow you to query and persist data client side. I don't think I need to wite much about it – now go and write applications!
VMWare Player
From Wubble:
Today, as part of our VMworld 2005 festivities, we announced our VMware Player. This is a freely downloadable tool that, as you might guess, plays virtual machines.
It's a free download, and there are a number of preconfigured VMs available for download.
This is a really good idea for software demos, as can be seen by some of the VMs that are available (eg IBM, Oracle etc). No longer do you need to sacrafice a machine (or a preconfigured VM) and spend a long time getting some demo software working. Now vendors can just redistribute the player and a preconfigured VM. Brilliant!
The Browser Application VM looks like a good idea, too.
Anatomy of a Cross Site Scripting Attack
If you create websites that require any kind of security hopefully you are familiar with the dangers of cross site scripting attacks. (If not, please let me know so I can stay clear…)
The other day MySpace got taken down by a XSS attack. The interesting thing about it was that (a) it used XMLHttpRequest to get around a multi-phase hash verification test and (b) the author has written about how they did it.
The attack itself is quite smart:
9) Finally we can do a POST! However, when we send the post it never actually adds a friend. Why not? Myspace generates a random hash on a pre-POST page (for example, the “Are you sure you want to add this user as a friend” page). If this hash is not passed along with the POST, the POST is not successful. To get around this, we mimic a browser and send a GET to the page right before adding the user, parse the source for the hash, then perform the POST while passing the hash.
It is a worry though that he spent so long working out how to get it to work and yet after he deployed it:
7 hours later, 8:35 am: You have 74 friends and 221 friend requests.
Woah. I did not expect this much. I'm surprised it even worked.. 200 people have been infected in 8 hours. That means I'll have 600 new friends added every day. Woah.1 hour later, 9:30 am: You have 74 friends and 480 friend requests.
Oh wait, it's exponential, isn't it. Shit.
Classifier4J used in graphical realtime music programming language
I recieved a nice note from Olivier Pasquet the other day letting me know that he's used the auto-summarizing features of Classifier4J in Max/MSP, a graphical realtime music
programming language. That's one thing I would never have predicted it would be used in!
Spam Blog Crisis
Tim Bray says there is a spam blog emergency occuring right now. I tend to agree. I'd like to see the search terms he is using to get that many splogs, though.
Removing spam blogs results from results sorted based on time is difficult because you can't rely on PageRank-like algorithms. Email spam filters are probably a better model, although the auto-generated splogs that I suspect Tim is suffering from are hard to detect using Bayesian-type algorithms. OTOH, my de-spammed version of Google's blog search just uses heuristics based on the URL of the item, and it does okay for many searches. Compare my version of a search for “cancer” with the raw version. At the time of writing my version removes 26 spammy results to get the first 10 non-spammy ones.
Re: private feeds
Many aggregators don't handle password-protected feeds well: some don't support it at all, and some do support it (either fully or with the user ID and password in the URL) but aren't very secure. What if you used hard to guess feed URLs? For example:
http://myhost/feeds/[big cryptographically unique ID]
It works with any reader. If it leaks out, others won't be able to access your account (they don't have your real password).
On the down side, if you subscribed to this feed in something like Bloglines, wouldn't Bloglines index it so other users could search it? Of course Bloglines supports embedding the user ID and password in the URL. Does Bloglines index these feeds?
I started replying in a comment, but it got too long and interesting:
There was an intersting discussion on using this technique on the P2P Hackers & REST Discuss mailing lists (although it was more for conventional webpages rather than just feeds).
I think it has some promise and I've been thinking of using it in one of my projects, but there are some things to be aware of:
1) Referrers. If your feed includes resources from or links to other sites you need to make sure links go though a redirector to strip the referrer headers.
2) Use https (if possible). This will partially solve the referrer problem (although not when readin via an aggreagor), and could be used as a sign for the aggregaror not to index it.
I don't think Bloglines does index password protected feeds. That creates an interesting possibility: create a feed that requires HTTP basic authentication, but accepts any combination of usernames and passwords. That will signal to aggregators not to index that feed, but doesn't have the security risks associated with sharing a real username/password.
Google Reader feedback
As part of my continuing quest for the ultimate aggregator I've been using Google Reader a bit – although I haven't replaced Bloglines yet.
Firstly, the good things: I generally like the feel of the “lens” part of the UI. The scrolly headline box thing is nice (although there should be a delay when you stop on an item before it gets marked as read). It's good to see that Google hasn't gone down the whole “sharing lables as tags” thing – tags are useful when you primarily tag something for yourself. (The whole “tagging for other people” thing leads to spam [1].) It isn't clear exactly what use the lables have yet, though.
Despite those things, I can't use it as my main aggregator. The one feature I need is a “view new items by author” view. The “Your Subscriptions” page almost has the elements on it – it needs to bold feeds with new item and put a count of the new items next to them.
[1]: I've been meaning to write on that – del.icio.us tags are useful because they benefit from people's selfishness; ie, people want to find something in their own bookmarks. Technorati tags aren't as useful because the only reason to use them is to benefit other people. See the use of HTML meta tags circa 1997 and how what happened to them..
Fixing Google Blog Search
I've previously complained about how much
spam is included in Google's Blog Search. Generally, though, I think Google does a good job with most of the things they do, and
I think that most of the criticism they get is unfair. That made me feel a little uneasy about
adding to the criticism and increasing the perception of Google as an evil company.
So what should someone to do when they believe they have uncovered a problem? I decided I'd do what I like people to do when the find a problem with
some of my software: fix it.
Here's my imperfect attempt:
More spam blog problems
It isn't just Google that is having problems with spam blogs. Two out of the top five most popular links on Bloglines are spam sites linked to by spam blogs. The spam site is an online poker tournament and the links from the linking blogs all use the same text.
I don't think Bloglines problems are quite as bad as Google's, though. It looks like Bloglines is actively removing spam blogs, whereas Google isn't.
Google Blogsearch: A poor effort
Google recently launched the Google Blogsearch. A first I thought this was a typically good Google product: fast, up-to-date and reliable. Then some spam started to appear in their index and it became rapidly less useful.
Now it is approaching uselessness for searches ordered by date (and what other kind of search would you want for a blog search anyway?)
For instance, compare this search for email howto mac on Google Blogsearch, IceRocket, Feedster, Blogdigger and Technorati.
Right now 7 out of the first 10 results on Google are obvious spam, with no pretence of being anything else. IceRocket and Feedster have a couple of spam items on their first page, while Blogdigger and Technorati seem to have results that at least look spam free. Those results actually makes Google look better than it actually is: it looks like roughly 40 out of the first 50 items Google are showing are spam, and all come from the the same couple of sites.
I aren't normally one to criticize Google; I think that the Google search engine itself is a lot better than the competition. I have to wonder, though: with all those highly qualified PhD's why is that they can't detect that a site called 1001-the-most-complete-broadband.info which contains nothing but alphabetically ordered lists of linked keywords is a spam blog?