- Teaching Smart People how to Learn: http://www.ncsu.edu/park_scholarships/pdf/chris_argyris_learning.pdf
Datacenter Renewable Power Done Right: All the article is interesting, but I think this gives something away about how Amazon prices AWS: “Amazon Web Services added an interesting innovation where they sell the remaining capacity not fully consumed by this natural flattening of the peak to average. These troughs are sold on a spot market and customers are often able to buy computing at less the amortized cost of the equipment they are using”. I wonder if that means that the normal pricing strategy for AWS is simply to amortize the cost (presumably taking projected usage into account)?
- A Natural Language to SPARQL translator. Apparently used regular expressions to parse natural language(!). Can handle “what is the population of Australia”, but fails on “what is the population of Adelaide” (thinks Adelaide is a country). Fails very badly on “When was Australia founded?” (thinks Australia is a band!) Based on https://github.com/machinalis/quepy
- CIA Factbook as RDF: http://wifo5-03.informatik.uni-mannheim.de/factbook/page/Botswana
- Raspberry Pi Camera documentation
- Word2Vec. I find this one of the most amazing things I’ve seen for a long, long time. It makes me wonder exactly what Strong AI would take.
- Calavera – Active-Proxy is a dynamic reverse proxy written in Go, using CoreOS’s etcd for zero-downtime reconfig.
- The top libraries used by 30,000 Java, Ruby & JS projects on GitHub: http://www.takipiblog.com/2013/11/20/we-analyzed-30000-github-projects-here-are-the-top-100-libraries-in-java-js-and-ruby/. “Interesting” to see that 3 out of the top 4 Java libraries are related to logging. What a screw up that is.
- http://www.cedexis.com/country-reports/ Cedexis use JS instrumentation to measure performance from a user’s browser (ie, the best way to do it). They have ISP, CDN & Cloud Hosting metrics.
- BitCoinJ – a Java implementation of Bitcoin. The high level documentation linked from the front page makes interesting reading.
- A course run(? promoted?) by Google Australia on building a startup: http://course.introtostartups.com/course
- Big Data in Education: https://www.coursera.org/course/bigdata-edu This is my first Coursera course – I’m half way though this course now, and I’ve found good. It’s run by Ryan Baker from Columbia University, and he’s a pretty good lecturer, and active in the forums. I think the name is slightly misleading – we aren’t looking at Big Data at all (and to be fair, he warned us about this in the first lecture), but the methods are useful. This course is directly related to my work, which makes it easier to stay engaged.
- A new StartupAdelaide website. I did the search, with searches across The Facebook group (contact me for an invite) and the Startup Adelaide Reddit board. ElasticSearch is pretty nice, but Solr 4 has some advantages too.
- An interview with a guy working on Google Knowledge Graph and Freebase: http://ultimategerardm.blogspot.com.au/2013/11/wikidata-freebase-interview-with-denny.html. I think it’s likely that Google Now is powered by this.
- Blackbird Dealflow – Observations after our first 8 months. A good survey of the startup scene in Australia.
Inspired by Pete Warden, I’m going to try & do 5 quick links on a semi-periodic basis.
- I recently sold my ChromeReload Chrome extension. It’s dropped from 5 stars to 3 in the reviews, so I guess that didn’t go well.
- The Strava v3 API is out for early access. Apply here, docs here.
- Places to announce side projects: https://news.ycombinator.com/item?id=6488822, via Luke Chadwick
- What I saw at the OpenStack Summit, mostly for this quote:
Traveling to Hong Kong, I expected to see a mixed audience, half composed of my typical enterprise audience and the other half composed of web scale companies, cloud service providers, and growing end user organizations. According to this expectation, I assumed to see a polarized audience of COTS software adopters and risk adverse large companies side by side with DIY believers and reckless organizations. It was not the case. What I saw is just the latter category, a world where VMware and many other mainstream vendors don’t have a place.
In further interactions with many people on site, my feeling grew significantly. I talked to organizations that show an attitude to risk more common in early stages startups than massive enterprises. These companies look at massive post-IPO web-scale firms like Google, Facebook, Netflix, and how they are rejecting packaged software in an unprecedented way and how they are building entirely homegrown computing stacks to become more efficient, more scalable, more competitive.
Yep. Software eats the world – including “Enterprise IT”
- Reasonator: Wikidata, rendered nicely. See the Cambridge example. Slowly (SLOWLY!) the semanic web is becoming something. I’d hesitate to say useful, but it possibly isn’t the waste of time and resources it was 5 years ago.
It’s SOPA blackout day today.
At Hacker News, most SOPA related stories have a comment something like “It is time to stop being defensive. The tech sector has lots of money, it should get together to lobby and go on the offensive against the RIAA and MPAA instead”
What no one seems to realize is that the large players in “the tech sector” generally do not have interests which align well with one another.
The biggest us tech companies by market cap are:
Why would Microsoft, IBM and Oracle lobby for a decrease in copyright length (as has been suggested). Why would Apple, Intel, Cisco and Qualcomm lobby for patent reform when a great deal of their value is in the form of patents?
If I was trying to organize a “tech lobby group” I’d lobby on two things:
- Hands off the Internet. I don’t think this needs explaining, but will be a hard, hard fight. All tech companies should benefit from this.
- Compulsory licensing of streaming video. Currently music streaming is compulsory licensed (which allows Pandora etc to exist), and this arrangement should be extended to video. This would enable a huge number of video-related startups to legally build innovative solutions around copyrighted materials, while still giving revenue to the owners of the content owners. Every tech company would benefit either directly or indirectly from this. Also, the MPAA would hate it (even if they should be lobbying for it themselves).
I’ve been working on a large (Java) AppEngine project since January 2010. I recently left that job, but the project hasn’t finished and unfortunately I can’t talk about it yet.
During that time I learnt a lot of tricks and techniques for dealing with AppEngine’s idiosyncrasies, which have been useful for building a contextual advertising demo system: Qontex.com (brief synopsis: contextual affiliate ad distribution software. Not too sure what I’m going to do with it, but I had fun building it. The front end container is actually WordPress(!), but the UI is GWT and the backend is AppEngine).
Anyway, it seems useful to share a few things I’ve learnt.
1) Be pragmatic
I think of AppEngine as Amazon S3 plus some intelligence, rather than Amazon EC2 minus features. I find that a lot less frustrating.
If there is something you need that AppEngine doesn’t do well, don’t try and force it. Full Text Search is a great example: it’s horrible to try & get it to work on AppEngine, but installing Solr on a VM somewhere (or using a cloud Solr provider) is trivial.
2) AppEngine is a platform optimized for a specific type of application.
Don’t think of AppEngine as a standard Java application stack in the cloud. From the documentation:
While a request can take as long as 30 seconds to respond, App Engine is optimized for applications with short-lived requests, typically those that take a few hundred milliseconds. An efficient app responds quickly for the majority of requests. An app that doesn’t will not scale well with App Engine’s infrastructure.
Think about that for a while, and understand it well. Often Java developers are used to building corporate web apps where functionality is slowly built up over time. All too often a single HTTP request will have 4 or 5 database queries in it, and that is regarded as normal. That won’t work in AppEngine.
When you are working with AppEngine you’ll be thinking about performance continually, and differently to how you do with a normal Java application.
3) The datastore is dangerous.
In the development environment it has similar performance characteristics to a traditional database. In production it is slow at best, unpredictable at worst. If you come from an enterprise Java background, think of it as an integration server for a legacy API you are integrating with: data inside it isn’t going to go missing, but you should expect your connection to it will break at any point. You need to isolate your users from it, protect you application from it and consider carefully how to protect your data from outages.
I usually assume that a datastore query is going to take 200ms. Lately it has usually been better than that, but the variation is still a problem: http://code.google.com/status/appengine/detail/datastore/2010/11/23#ae-trust-detail-datastore-query-latency
4) Memcache is useful, but no silver bullet.
Memcache is useful because it has much more predictable performance characteristics than the datastore – and it’s a lot faster too. Generally, it’s pretty safe to rely on Memcache responding in less than 20ms at worst. At the moment its responses are around 5-10ms. See the Memcache status page for details: http://code.google.com/status/appengine/detail/memcache/2010/11/23#ae-trust-detail-memcache-get-latency
A Useful Pattern
One pattern I’ve found useful is to think of user-facing servlets as similar to the UI thread in a GUI application. Blocking should be kept minimal, and anything that’s going to take significant time is done from task queues. This includes anything beyond a single “GET” on the datastore (note that a GET operation is very roughly twice as fast as a datastore query)
For example Qontex has a process that relies on content analysis. I currently do that on-demand rather than attempting to spider the entire internet. The demo “Ad Explorer” front end is written in GWT, and it works like this:
1) Send a request to the analyze URL, passing the name of a callback function (for JSONP callback)
2) The backend checks Memcache for data about the URL. If it isn’t there, it fires an AppEngine task queue request to analyze the URL and returns a JSONP response that contains a status_incomplete flag and a wait_seconds parameter.
3) The GWT client gets the response, and sets a timer to re-request in wait_seconds seconds.
4) Meanwhile, back on the server the task queue task is being processed. That task will load the results into memcache.
5) The client re-requests the analyze URL, and this time Memcache has been loaded so the servlet can built a response with the correct data.
I use a similar, but simpler pattern to write to the datastore.
When an ad is served, or when a user clicks an ad I fire a task-queue request to record that, which lets me send a response much quicker. AppStats is great for showing this graphically:
As you can see there it would be sensible to bulk up all those memcache reads into a single read on a composite object. At the same time, the entire servlet responds on 37ms, which isn’t too bad, and some of those memcache calls are conditional – but the point is that AppStats gives great visibility into exactly how your application is performing.
I was at a StartUp Club event last night and had a brief discussion with someone who had the thesis that all advertising by companies is evil and therefor will soon (?) be overtaken by personal recommendations from your social circle.
I disagree with that for a number of reasons (eg, your social circle may not be best qualified to make a recommendation etc etc), but during the course of the discussion I was surprised when no one recognized the term Perfect Advertising. A quick bit of Googling today only turned up one decent post, and yet I’m sure this isn’t a concept I’ve invented.
Perfect Advertising is the idea that a person sees no advertising until they need something, and at that point a single advertisement is presented to them that matches their requirements perfectly.
The example last night was jogging shoes. The original argument was that you will get shoe recommendations from your friends. My counterpoint was that it would be easy to get better recommendations by instrumenting your body and taking advice from sports scientists, and in a world with perfect advertising you would be presented with a single choice of shoes, in the correct size that compensated perfectly for your over or under pronation. That’s not a recommendation your friends are likely to be qualified to make, but of course perfect advertising would take into account the views of your social circle, too (eg: will you be socially ostracized by buying Nike shoes, or will you be laughed at for buying Vibrum Five Fingers?)
This might seem a distant goal, but none the less it’s an important concept because it shows the weakness in social advertising systems (the lack of intent) as well as a weakness in search advertising systems (the lack of context).
Is it easier to add context to search advertising or derive intent in social advertising? That’s the $100 billon question (literally), and I don’t have the answer.
I went along to the Mobile Monday meeting on Business Angels in South Australia. Pretty good event, all up.
Nick Foskett gave a pretty good and honest talk about the state of Angel investment in South Australia. A quick summary from memory only : SA Angles Inc was formed in 2007, but spent quite a while figuring out how they should do investments (solution: don’t form a fund) and who should be involved. They have 17 people who are actively seeking investment opportunities.
Since they formed they have had 41 pitches and made one investment. I can’t remember the exact amount invested (I think it was 400K), but it was by a number of members who all invested in the region of 25K each.
Nick mentioned that a number of the early pitches were under-prepared. People couldn’t give answers on basic financial predictions, and had no idea how much money they wanted or how much of the company they would give up for that amount.
He also mentioned a couple of other reasons why investments were declined:
- People trying to create jobs for themselves.
I’m paraphrasing here, but this was about people who had a reasonable idea for a small business, but not something that would generate enough returns to be an attractive business. To me, this was about developing a scalable business model.
- Nothing more than an idea
Apparently people think investors will give them a million dollars for a good idea. I had thought that had died out ten years ago, but maybe not.
- Lack of defensible advantage
Nick mentioned how easy it is to outsource writing code, and talked about how important it is for your idea to have something that would stop others replicating your success and taking your money. When he was talking about this I thought he meant traditional forms of IP protection (Patents, etc), but speaking to him later it was clear that applications where there was a genuine first-mover advantage and/or network effects are also ok.
Then it was question time. Umm yeah.
The most interesting question was one Michael Kubler asked about the expected returns. The short answer was 5 times the investment in 5 years was too long, but many would be happy with 5x in a couple of years.
But the first “question” was something different. Some guy and his business (?) partner decided it would be useful for the room to hear their long rant about how terrible it was no one would invest in them and that people in the US would love them but they wanted to stay here and how terrible it was that the SA Angels group had received 41 pitches and only invested in one etc etc etc.
There is no denying that 1/41 isn’t a great ratio, but Nick had already spent quite a while talking about the reasons for it, and they sounded reasonable to me. I’m sure there are some opportunities they should have taken (and speaking to Nick later he confirmed that), but it seems pointless to blame investors for being overly conservative when people come asking for investment and don’t even have an idea of what kind of revenue they think they will get.
More importantly, the overly aggressive and downright rude way the questioner’s point was presented was offputting.
One exchange went something like this:
Questioner: Why are investors here so conservative when we KNOW we could get money in the States very easily.
Nick: Well if you can get money from the US….
Questioner’s partner (interjecting): Oh.. we could
Nick: then that is something you should pursue.
Of course, this may have been part of the little known “be rude to potential investors and make them hate you before you ask them for money” strategy. Someone should study the success of that method some more.
Anyway, on a better note I was fortunate enough to talk briefly to Nick afterwards, and asked about (very) early stage investment – ie, pre-revenue. He was quite open to that, but rightly pointed out that it is much riskier, so would cost a larger stake in the company.
There is a saying here in Australia: “Don’t throw the baby out with the bathwater”. I’m not sure it it’s common elsewhere, but basically it’s a warning not to discard the good along with the bad.
When Google Buzz was released, people were shocked at the automatic management of contacts. I suspect that people inside Google were quite surprised at this – after all, it was just a logical extension of what they had been doing within Gmail and Gtalk for years.
I also suspect that within Google it isn’t widely known how much people hate this feature. Personally, I’m no privacy freak, but I’m continually annoyed by the fact that I get random people showing up on my GTalk list just because we corresponded about some random open source project 5 years ago or something. I’ve also heard that a lot of Googlers have stopped using public GTalk because too many external people interrupt them. There is a lot I could say about the rudeness of strangers who just want help to diagnose problems etc, but for the moment I’ll just say that software should encourage desired behavior and discourage things you don’t want to happen.
However – I don’t think Google is wrong in thinking that computers could do a better job than humans of managing contacts for them. I’d love the auto-follow-on-buzz (and auto-add-to-GTalk) feature if it did it almost as well as I could do it myself. At the moment I tolerate the feature in GTalk because – while I know I could do better myself – I know from services like Facebook that the continual grouping and pruning of contact lists is a game I don’t have time for.
After the push-back Google got on automatic contact management in Buzz, it would be tempting to just give up on the problem. I don’t think they should do that – instead of pulling away from the problem they should invest in solving it.
Because then maybe I won’t hate my GTalk contact list anymore.
A couple of weeks ago I posted my work on developing virtual keypad using HTML5 video. That worked surprisingly well, but had some unfortunate requirements in that HTML5 doesn’t really support access to webcams.
I’ve developed a Flash shim, which gets access to your webcam, and copies frames to data urls which can be used in HTML DOM images.
This works well, within some limits:
- Chrome (and Safari?) leaks memory when new data URLs are created. (See bug). Apart from that, this technique works on both Chrome & Firefox, but my virtual keypad is Firefox only ATM
- I don’t really know Flash. I downloaded the Flex SDK a few days ago and started hacking, so I’m pretty sure my Flash code sucks.