Labs

Many companies have 'Labs' pages where they present projects they consider cool or innovative. My labs page is a selection of my projects and blog postings that I hope you find interesting.

Automatic copy/move image forgery detection

Detecting images that have been forged by covering up part of the image by copy/pasting blocks of pixels is possible using an algorithm that I implemented. Details by clicking the image below (which shows copying of a bunch of foliage to cover the presence of a military vehicle):

POPFile - Automatic email sorting

Back in 2001 I needed a solution to email overload. After research I created POPFile which was the first widely usable "Bayesian" email sorter (its parents include iFile for exmh and SwiftFile for Lotus Notes) as a POP3 proxy. It's still going strong today but now supports IMAP and other protocols.

Deconstructing Digg

Looking into Digg's public data I managed to track Digg's userbase growth accurately and tie spurts of growth to specific events, then I used similar data to predict where Digg's audience is located.

 

Building a temperature probe for the OLPC XO-1

The OLPC XO-1 has a handy Measure application that can measure a signal sent in through the microphone socket.

Here I hack Measure to measure temperature and detail the hardware necessary to build the probe.

A SQLite interface in 23 lines of Arc

Paul Graham's Arc language turns out to be pretty succient. In 23 lines of Arc I managed to hack together a working interface to a SQLite database:

(= db! 'nil)

(def db+ (name (o host "localhost") (o port 49153))
 (let (i o) (connect-socket host port)
   (db> o name)
   (if (db< i) (list i o))))

(def sql ((i o) q)
 (db> o q)
 (if (db< i) (readall i 200)))

(def db- (db)
 (map close db))

(def db> (o s)
 (write s o)
 (writec #\return o)
 (writec #\newline o)
 (flush-socket o))

(def db< (i)
 (= db! (read i))
 (iso db! 200))

Calibrating a machine learning spam filter

One problem often encountered with machine learning based spam filters is what to do with the region around 0.5. i.e. how do we intepret a message that's scores as being right in the middle between spam and ham. This post shows how calibrating the filter can help.

Copyright (c) 1999-2008 John Graham-Cumming