Friday, August 31, 2007

Useful sources of messages for testing spam filters

- Enron Corpus

- PU corpus

- SpamAssassin public corpus

- TREC 2005 Public Spam Corpus

- TREC 2006 Spam Track Public Corpora

- 20 Newsgroups


Wednesday, August 29, 2007

Why you don't want to code for a government department

Back in the mists of time, straight after my doctorate, I worked for a UK start-up called Madge Networks initially maintaining device drivers that implemented LLC, NetBIOS, IPX/SPX protocols and then writing a TCP/IP stack. Most of this work was done in C and assembler (x86 and TMS380).

When I first joined the company I was sent on an x86 assembly training course run by QA Training. (It rained on the first day and we were locked out so one of the company big cheeses ran over with QA Training umbrellas; to this day I use that umbrella).

During the course we were asked to write a simple function in C. I've forgotten what it was, but let's say it was a classic factorial function. I wrote something like:

unsigned int fac( unsigned int i )
if ( i < 2 ) {
return 1;

return i * fac( i - 1 );

Later we looked at an assembly equivalent of the function, but before that I took at look at the person sitting next to me. His function looked like this:

unsigned int aq456( unsigned int aq457 )
if ( aq457 == 0 )
return 1;

if ( aq457 == 1 )
return 1;

return aq457 * aq456( aq457 - 1 );

So, naturally I asked him why he used such horrible names for functions and variables .

It turned out that he worked for a government department and all identifiers were allocated and documented before the code was written. Hence, somewhere a document would tell you that aq456 was a factorial function with excruciating detail of its parameters and return values. And also you could discover that aq457 was the parameter for aq456.

He recounted how each project had a unique two letter code (he'd chosen aq as being qa backwards) followed by a sequence number.

He'd chosen aq because all the projects had worked on were aa, ab, ac, ... the department had apparently never got past 26 projects.

I wonder why?


Tuesday, August 28, 2007

FoxNwes msitake that amsued me