Monday, May 26, 2008

POPFile v1.0.1 released plus a glimpse of the future

POPFile v1.0.1 was released today; this is the first ever POPFile release that I didn't do. POPFile is now being managed by a core team of developers: Manni Heumann (in Germany), Brian Smith (in the UK), me (in France), Joseph Connors (in the US) and Naoki Iimura (in Japan). A truly international effort. The actual release binaries were built by Brian Smith who, for a long time, has been the installer guru.

This release contains minor feature improvements and a number of bug fixes. Some of the bugs fixes were for annoying bugs that showed up only occasionally: that makes it a worthwhile upgrade.

Since I pulled back from being involved in every detail of POPFile's evolution the core team has been liberated to work on the project. v1.0.1 is their first release, and it is minor, but much greater things are coming:

1. A native Mac installer

2. A SOHO version of POPFile. Some time ago I did most, but not all, of the work to make a multi-user version of POPFile. That work is being completed by the core team and will allow a single POPFile installation to be shared by multiple users.

Thank you to the POPFile Core Team for this great start to a new chapter in POPFile history.

Labels:

Monday, May 19, 2008

A post (anti-spam-) retirement note

One of the anti-spam companies I was/am involved with, MailChannels, made an interesting announcement recently about a commercial offering for SpamAssassin. What makes the announcement interesting to me is that Justin Mason (who wrote SpamAssassin) is also an advisor to MailChannels.

The program, Traffic Control 3 for SpamAssassin, is a free download and for sites that process less than 10,000 messages per day there's no charge at all (and no need to go and get a license from MailChannels).

Basically, the new product acts as a front-end to SpamAssassin traffic shaping incoming messages so that load is taken off SpamAssassin and the mail server.

Labels:

Saturday, May 17, 2008

Breaking the Fermilab Code

A story appeared on Slashdot about a mysterious fax received at Fermilab written in an unknown code. The full story is here. I looked at it and immediately noticed a few things:

1. The first part looked like ternary (base 3) with digits 1 (|), 2(||) and 3(|||).

2. The last part looked like binary with digits 1(|) and 2(||)

3. The middle bit looked like either a weird substitution code, or I wondered if it might be machine code.

4. In the last part the digit 2 (||) never occurs more than once, perhaps it was actually a separator and the last part is not binary.

The first step was to convert the bars into numbers. Here's a copy of my marked up print out:



The first part has the numbers (or at least I thought):

323233331112132
333231322123312
111331132312233
333212123213113
311333313331111
211333323232211
232313331121231
33231312

Noticing this had 113 digits (which is a prime number) I went off on a wild goose chase around primes, and then around the interpretation of this number in hexadecimal as a string in ASCII, Unicode or binary... waste of time.

Then I started thinking about ternary again and wrote down the largest ternary numbers that can be expressed with 1, 2, 3, ... digits:

23 = 210
223 = 810
2223 = 2610
22223 = 8010

One of those stood out: with three digits the maximum number is 26 and there are 26 letters in the alphabet! Then the only question was was how to map the three digits used in the code (1, 2, 3) to the three ternary digits (0, 1, 2).

To simplify things I wrote a small Perl program that tries out all the possible mappings and outputs the ternary interpreted as a string (with 001 = A, etc.):

use strict;
use warnings;

my $top = $ARGV[0];

$top =~ tr/321/abc/;

my @chunks;

while ( $top =~ s/^([abc]{3})// ) {
push @chunks, $1;
}

my @digits = ( '0', '1', '2' );

foreach my $d0 (@digits) {
foreach my $d1 (grep {!/$d0/} @digits) {
foreach my $d2 (grep {!/[$d0$d1]/} @digits) {
print "($d0$d1$d2) ";
foreach my $c (@chunks) {
my $v = 0;
my $m = 1;
foreach my $d (reverse split( //, $c )) {
$d =~ s/a/$d0/;
$d =~ s/b/$d1/;
$d =~ s/c/$d2/;
$v += $d * $m;
$m *= 3;
}
print chr( 64 + $v );
}
print "\n";
}
}
}

With my initial interpretation of the top part of the coded message I got the following output:

(012) [email protected]@[email protected]@CJQJFBWKAF
(021) [email protected]@[email protected]@FTVTCAPSBC
(102) JDNXUMEISOZNUODMFSGYQMPNZHMJCHCPNTELP
(120) [email protected]@RMPWRWJLFUNJ
(201) THYLOZGRKUMYOUHZCKENVZWYMDZTFDFWYJGXW
(210) [email protected]@IZWPIPTXCOYT

A ha! The 021 block (which corresponds to the mapping 3 -> 0, 2 -> 2, 1 -> 1) seems to have a partial message: [email protected]@WOULD and then it's garbage. Going back to the original message I realized that 113 is not divisible by three and that I'd either missed a symbol, or had two too many.

After much fiddling around I discovered that the correct interpretation of the top block is that two of the threes are wrapped from one line to another (there appears to me some indentation in the message that indicates this, take a look at the original, but this could be just random).

323 233 331 112 132
333 231 322 123 312
111 331 132 312 233
333 212 123 213 113
311 333 313 331 113
113 333 232 322 133
231 333 112 123 133
231 312

Rerunning my Perl program output the full message:

(012) [email protected]@[email protected]@[email protected]
(021) [email protected]@[email protected]@[email protected]
(102) JDNXUMEISOZNUODMFSGYQMPNYYMCIVEMXSVEO
(120) [email protected]
(201) THYLOZGRKUMYOUHZCKENVZWYNNZFRQGZLKQGU
(210) [email protected]

So much for the first part. The second part took me off into Z-80, 6502 and 6809 machine code wondering if it was a program and then nowhere. I still don't understand what this part is trying to say.

The third part looked initially like binary but on closer examination I decided that the 2s (||) were actually separators and the message should be interpreted as number separated by 2s by counting the 1s (|). That yields:

31211112111312
32213123123331
12213111332312
23333333233123
12313123332311
33223232312312
112

(Once again there was a wrapping 'problem' in the message where a run of 8 |s was actually 3 |s then 1 || and 3 more |s.) Using the little Perl program reveals:

(012) [email protected]@[email protected]
(021) [email protected]@[email protected]
(102) OZTYSBOOMXGZLODMLNEEOMEVACOOX
(120) [email protected]@NKVMNLUUKMUDYWKKB
(201) UMJNKAUUZLEMXUHZXYGGUZGQBFUUL
(210) [email protected]@YSQZYXOOSZOHNPSSA

So, the same mapping between digits is used.

That leaves some final questions:

1. Who is Frank Shoemaker?
2. Why is base spelt incorrectly?
3. Is the extra S in BASSE a reference to the middle section where three symbols start with S.
4. If #3 is correct, then those three symbols could be intepreted as FC16 which is 252. Could this be the employee number of the author?
5. Why is the letter A missing from the middle section when all the other hexadecimal digits are there?

Labels:

Thursday, May 15, 2008

Which countries have the most beautiful women? (My deeply flawed analysis)

So, I happened upon the Wikipedia page about the Miss World pageant and noticed that it had a list of winners by country. For example, India has won Miss World 5 times. But, of course, India has a very large population so you'd expect it to be able to churn out a few beauties. So, to get a better idea here is a population adjusted list of countries that have won Miss World:



































CountryWinsPop.Wins/Pop.Normalized
Bermuda1661630.0000151141876879827100.00%
Iceland33162520.0000094861060167208462.76%
Grenada11100000.0000090909090909090960.15%
Guam11734560.0000057651508163453638.14%
Jamaica326510000.0000011316484345537.49%
Trinidad and Tobago113050000.0000007662835249042155.07%
Sweden391829270.0000003266932210176562.16%
Puerto Rico139942590.0000002503593282258361.66%
Austria283164870.0000002404861571959411.59%
Ireland143390000.0000002304678497349621.52%
Finland153082080.0000001883874934817931.25%
Venezuela5281998220.0000001773060837050671.17%
Israel172820000.0000001373249107388080.91%
Netherlands2164085570.0000001218876224155480.81%
Dominican Republic197600000.0000001024590163934430.68%
Czech Republic1103811300.00000009632862703771170.64%
Australia2212900000.00000009394081728511040.62%
Greece1112167080.00000008915271753530540.59%
Peru2286747570.00000006974775758343830.46%
UK4604873000.00000006612958422677160.44%
Argentina2403019270.00000004962541865553970.33%
South Africa2437000000.0000000457665903890160.30%
Poland1385182410.00000002596172551077810.17%
France1644731400.00000001551033500152160.10%
Turkey1705862560.00000001416706391113870.09%
Egypt1803350360.00000001244786894724240.08%
Germany1822100000.00000001216397031991240.08%
Russia11420088380.000000007041815242513290.05%
Nigeria11480000000.000000006756756756756760.04%
US23040720000.000000006577389565629190.04%
Brazil11867576080.000000005354534204571740.04%
India511324460000.000000004415221564648560.03%
China113218518880.0000000007565144091241790.01%

So, far and away, the top three are Bermuda, Iceland and Grenada. Given that Bermuda is the winner, and a tax-haven, and has a sub-tropical climate... Hamilton here I come!

Labels:

Thursday, May 01, 2008

The Spammers' Compendium finds a new home

Shortly after I announced that I was getting out of anti-spam the folks at Virus Bulletin contacted me about taking over The Spammers' Compendium. I was delighted.

Today the transfer is complete and the new home is here. It will be maintained and updated by Virus Bulletin. Please send submissions to them.

Labels: