Friday, January 20, 2006

Deconstructing Sundance with POPFile

The brave folks at Unspam have taken POPFile and years of data surrounding the Sundance Film Festival in attempt to predict the outcome of the 2006 edition. Specifically, they want to get invited to parties, and they've chosen to geek out on data, bend POPFile to their use (boy, that must have been hard work) and try to predict the hits from the festival.

They claim an 81% accuracy over the past festivals and have a nice web site giving their predictions for this year. The predictions also include a breakdown of how the decisions are made indicating the most important (and worst) words to appear in a review of a movie.

There's also movie metadata like the type of film it was shot on, or who reviewed it. That's a very interesting use of POPFile and if they get it right and are invited to all the cool parties I hope they fulfill my wish and put a good word in for me with Neve Campbell.


Tuesday, January 17, 2006

The WMF SetAbortProc problem is not a backdoor

Microsoft recently fixed a problem wherein a Windows MetaFile (WMF) could contain arbitrary code that would be executed when the WMF was "played". The problem was particularly bad because a WMF could be played automatically in Internet Explorer by referencing it in an IFRAME if the Windows Picture and Fax Viewer was installed and registered (which by default on recent versions of Windows it was), because that program would automatically handle WMFs and play them to display them.

Steve Gibson suggested in a podcast, which then became a big news story, that he was convinced that this functionality was in fact an intentional backdoor inserted by Microsoft for their own purposes (or at last for the purpose of a rogue engineer inside the company).

Microsoft responded to that accusation with a blog posting by Stephen Toulouse in which he gave some more details of the problem and essentially said that Gibson was wrong (without naming him).

I've looked carefully at this and am convinced that Gibson is wrong. This is nothing more than a bug caused by the reimplementation of a legacy API. The same bug appears in WINE and was recently fixed.

Gibson based his backdoor claim on two things (both of which he now admits were incorrect): that a special incorrect value was required in one of the metafile fields to get the backdoor to activate and that the code ran in its own specially created thread. In fact, this bug occurs with correct or incorrect values in the field and uses the same thread.

Secondly, it's pretty clear from the code that this occurs because metafiles can contains a special ESCAPE record that allowed them to call into the Windows API called Escape which has a function to perform SetAbortProc. If you take a look at the WINE code you can see how this exploit works there and I bet (because it's the same API) that the same or very similar thing is happening in Windows:

First the file dlls/gdi/metafile.c contains a function called PlayMetaFileRecord with the following signature:

BOOL WINAPI PlayMetaFileRecord( HDC hdc, HANDLETABLE *ht,
METARECORD *mr, UINT handles )

Which is simply WINE's implementation of the same Win32 API (which is documented here: /library/en-us/gdi/metafile_1yec.asp)

The third parameter (mr) is a METARECORD pointer (a METARECORD is just an entry in the metafile and is detailed here: /library/en-us/gdi/metafile_8j1u.asp) and is the all important header with the following definition:

typedef struct tagMETARECORD {
DWORD rdSize;
WORD rdFunction;
WORD rdParm[1];

With the rdSize being the size of the record in words, the rdFunction being the function and the rdParm the data (which in the case of an exploit would be executable code). PlayMetaFileRecord handles META_ESCAPE like this:

Escape( hdc, mr->rdParm[0], mr->rdParm[1],
(LPCSTR)&mr->rdParm[2], NULL);

You'll note that parameter 3 is a pointer into the metafile parameter block, i.e. if executed parameter 3 would execute code in the metafile. Now Escape has implemented like this (dlls/gdi/driver.c):

INT WINAPI Escape( HDC hdc, INT escape, INT in_count,
LPCSTR in_data, LPVOID out_data )

and the SETABORTPROC is handled with the following code:

return SetAbortProc( hdc, (ABORTPROC)in_data );

So if you have an ESCAPE/SETABORTPROC record in a metafile then under WINE the AbortProc is set to point into the metafile (since in_data is corresponds to &mr->rdParm[2]).

So it's quite clear from the WINE implementation that this is a way to set a pointer into the metafile for execution. All it would take is that the metafile's AbortProc is called and arbitrary code could be executed.

In WINE at least this looks nothing like an intentional backdoor. It looks more like a bug caused by the fact that Escape is rather powerful and can set a pointer to code.

Now it's possible in WINE (I believe) to force the AbortProc to execute with another ESCAPE record that has NEWFRAME as the function. Again looking at the Escape code you'll see that NEWFRAME has handled like this:

return EndPage( hdc );

EndPage is a standard GDI function (see here for documentation: /library/en-us/gdi/prntspol_0d6b.asp). If you take a look at the implementation in WINE you see the following code (dlls/gdi/printdrv.c):

ABORTPROC abort_proc;
INT ret = 0;
DC *dc = DC_GetDCPtr( hdc );
if(!dc) return SP_ERROR;

if (dc->funcs->pEndPage)
ret = dc->funcs->pEndPage( dc->physDev );
abort_proc = dc->pAbortProc;
GDI_ReleaseObj( hdc );
if (abort_proc && !abort_proc( hdc, 0 ))
EndDoc( hdc );
ret = 0;
return ret;

Note that this function always called the AbortProc of the DC. So I think a metafile with an ESCAPE/SETABORTPROC followed by ESCAPE/NEWFRAME would in WINE causes arbitrary code execution.

Now if you read this article from MSDN: you learn the following about the AbortProc
The SetAbortProc function (and the SETABORTPROC escape) sets up what is known as the AbortProc. This AbortProc function resides in the application; GDI calls it during a print job to inform the application of spooler errors and to allow the application to abort the job when desired. GDI calls the AbortProc function with information about why it is being called; this value is either an error code from the spooler or zero, which indicates that the function is being called simply to allow an abort.

The AbortProc function is called routinely during several steps of the printing process:

* After every write to the printer port when printing directly to the printer (no spooling)
* After every write to a file when printing directly to a file (no spooling)
* After every write to the spooler file when spooling
* Periodically when out of disk space for spooling as a result of other spool jobs
* Before playing every metafile record when GDI is simulating banding
* Occasionally from some older printer drivers

When GDI calls the AbortProc function, the application can continue the print job by returning a nonzero value or abort the print job by returning zero.

In there it mentions that the AbortProc will get called when playing a metafile record. Stephen Toulouse all but said that when he said: "The way this functionality works is by registering the callback to be called after the next metafile record is played.".

So my take is that Gibson is wrong, very wrong. This is no backdoor. It's just a side effect of the ESCAPE/SETABORTPROC handling in a metafile and the fact that the metafile processing is calling the AbortProc for you.

I've used Steve Gibson's WMF_dbg.exe and WinDBG to step through the implementation of PlayMetaFileRecord and especially how it handles the Escape function and it appears to be implemented in exactly the same fashion as the equivalent WINE function. Here's my commented disassembly:

77f493fd 8d4b0a lea ecx,[ebx+0xa]

; ecx now contains &mr->Param[2]
; When entering this structure edi is a pointer to the out parameters
; for the Escape and seems always to be null. So the next instruction
; pushes the last parameter of Escape (LPVOID lpvOutData) as NULL.

77f49400 57 push edi

; Pushes the LPCSTR lpvInData parameter from ecx which is pointing to
; &mr->Param[2] which is inside the metafile the is being played

77f49401 51 push ecx

; Now load ecx and eax with the size of the input structure and the
; function number for the escape (int cbInput and int nEscape). In the
; case of a SETABORTPROC nEscape/eax is 9 and is taken from the
; mr->Param[0] and cbInput/ecx is from mr->Param[1]. Note that it
; doesn't matter if cbInput is correct of not for SETABORTPROC.

77f49402 0fb74b08 movzx ecx,word ptr [ebx+0x8]
77f49406 0fb7c0 movzx eax,ax
77f49409 51 push ecx
77f4940a 50 push eax

; The final parameter is the original DC passed to PlayMetaFileRecord

77f4940b ff7508 push dword ptr [ebp+0x8] ss:0023:0006ff0c=2621029e
77f4940e e8113c0000 call GDI32!Escape (77f4d024)
77f49413 e981060000 jmp GDI32!PlayMetaFileRecord+0xd19 (77f49a99)

By varying the metafile function and parameters I've verified that all that's happening in PlayMetaFileRecord is that when it encounters an ESCAPE it does the same thing as WINE and extracts cbInput, nEscape and lpvInData from the METARECORD and calls GDI Escape.


Sunday, January 15, 2006

Do spammers fear OCR?

Nick FitzGerald recently sent me two sample spams that seem to indicate that some spammers fear that using images in place of words isn't enough. They've started to obscure their messages to prevent optical character recognition.

The first spam appears to be a scan of a document that's been skewed slightly. Now this could be a simple and bad scan.

But the second is even more interesting. It appears to be perfectly normal:

Until you look at the fact that this was actually constructed using <DIV> tags for layout and the breaks between the lines are in the middle of words. Here Nick has kindly inserted borders showing that the words are broken horizontally and then put back in the right position:

But is anyone doing OCR, or are spam filters getting good enough that the spammers are being really paranoid about what they are sending?

The funny think about the second example is that the URL they include is not obfuscated, is clickable and appears in the SURBL :-) So despite the effort to obscure the content a simple check of the spamminess of the URL gets this email canned.


Wednesday, January 11, 2006

The disappointing MacBook Pro

Lots of people have complained about the name; others have pointed out that there's no information about battery life; still others are wary of the benchmark numbers.

None of these things worry me at all.

What I'm upset about with this machine is that after having announced that they were switching to Intel, Apple announces a rather boring machine with some odd things missing. I'd really like to switch to an Intel Mac, if I can triple book Linux, Windows and Mac OS X on it. But...

1. iSight. I have *no* need for a built in iSight camera and it doesn't make me happy that there's no physical cover for the camera so that I *know* that it can't see me.
2. Modem. There's no modem. Cmon Apple! How cheap is a v92 modem and socket? A lot cheaper than an iSight I'd wager. Not all of us can connect all the time via Ethernet or WiFi. There are lots of hotel where dial up is the only option.
3. Why is the screen resolution lower than the current PowerBook range. OK, it's a bright display and widescreen, but 900 pixels high?
4. Europe vs. US pricing. The high-end machine is US$2,499 or Eur 2,699. 1 Eur is about US$1.20 so that US price is US$2,499 and the European price is US$3,240. So it really costs an extra US$740 to get the machine to me in Europe?

I do like the remote, I can imagine using it for presentations. The MagSafe connector is clever. I'm glad that iLife is bundled.

But overall I'm underwhelmed.


Monday, January 09, 2006

Python is the new Tcl; Ruby is the new Perl

Given how bad I am at predictions I'd like to give you the following (which you'd better take with a grain of salt): Python is the new Tcl and Ruby is the new Perl. My prediction for 2006 is that most Perl 5 programmers will decide that Ruby looks cool and learn the language and that Python will be a niche language that a die hard core continues to love and embellish.

I think Ruby's the new Perl because of the confluence of three things: Perl 6, RubyGems and Ruby's own Perliness. Firstly, given how hideous Perl is if you're a Perl person (like me) why learn Perl 6 when you could learn Ruby? Ruby's core language is clean, regular expressions are strong and there are many Perl influences (which Ruby is slowly removing to ensure that language doesn't become Perl). Secondly, RubyGems will do for Ruby what CPAN did for Perl.

Python's lack of good package management (if you ignore ActiveState's PyPPM and the recently started Cheese Shop) and idiosyncratic language constraints ("hey we liked signficant white space in Make so much we used it ourselves") make it look like Tcl to me. And where's the killer app? Arguably dynamic web pages were Perl's killer app, and perhaps Rails is Ruby's. Python has... Zope?

All of which leaves me wondering how to complete "PHP is the new..."

(Naturally LISP is the new LISP).


Thursday, January 05, 2006

How I manage email

Some time ago I received an email asking how I deal with my email. Here are my requirements:

1. I want to download my email to my laptop for offline access
2. When I'm away from my machine I want web based access
3. I don't want to pay for any of this
4. I don't want to download spam
5. I want my non-spam email automatically sorted
6. I want an email address that I control
7. I want access from my laptop to be secure

My solution is a combination of my own domain (which is the one thing I pay for), Google's Gmail service, Mozilla Thunderbird and POPFile.

Mail sent to any address is forwarded to a Gmail account. Gmail filters out the spam (and a number of hand written filters deal with some other unwanted mail for me). The nice thing here is that I've wasted no bandwidth on the spam or unwanted mail. (Some spam still gets through and POPFile handles that later)

If I'm away from my machine I can pop into Gmail and look at my waiting mail in the standard Gmail inbox. My laptop runs Mozilla Thunderbird which uses Gmail's POP access to download the filtered mail. But the mail is proxied through POPFile (using POPFile's SSL access for a secure connection to Gmail) which automatically sorts the mail into categories. A small number of simple rules in Thunderbird pick up POPFile's X-Text-Classification and automatically move messages to the appropriate folders.

The only thing I don't have is POPFile style sorting in the Gmail account. So if you're listening Google and interested, I'd be interested in working with you to add POPFile functionality to Gmail.


Tuesday, January 03, 2006

One letter web shortcuts

I recently noticed that I get to the BBC news web site my typing n followed by down arrow and then Enter in Firefox. Firefox has learnt that n is essentially a shortcut for for me.

I then wondered what each of the 26 letters of the alphabet mapped to in my browser. Here's the complete list:
bConnecting Flights: Boutique Airlines
gGNU Make Debugger
iMy life as an investment banker
jJohn Graham-Cumming
kBest-Ever Freeware
lLe Monde
mMy Yahoo!
nBBC News
oEffective Emacs
pPOPFile: VersionTwentyThreeCleanup
rreddit: what's new online
tThe Onion
uUPC Database Entry
vVirus Bulletin
wJapan current local time
xExchange Rates Graph (Euro, American Dollar)
yAn Intuitive Explanation of Bayesian Reasoning