Saturday, August 2, 2008

a picture is worth a thousand lines of code

Okay, this is a seriously geeky rant.

If you don't care, move along. (But you should care.)

So hackers have recently started exploiting web sites by creating "GIFARs". That is a file that acts like both a gif (picture) and a jar (executable java program). In other words, they upload a "picture" to a web site (like Facebook) and when someone "looks at it" it actually gets executed as a java program... and does something unexpected. This could be worse on a financial type web site... or even on Facebook it could be used as social engineering. (You take over someones account then act like a trusted friend to one of their friends to get private data from them.)

This really takes me back... to about 1990 or so. I argued at that time with various WinWeinies that Windows handled file types incorrectly[1]... or actually not at all. Why in the world would changing a filename change a file's functionality? That just doesn't make sense. If a file is a GIF, then isn't it still a gif if it is named picture.jpg? Or picture.jar?

As far as I know, Unix has always used the actual type of the file[2] and not the file name.[3]

They argued then, and some still argue now, that this is a performance issue. I have to call bullshit on this. First off, it just wasn't that big a performance hit back in 1990 on a Sparc 1. I cannot imagine[4] its a hit on today's modern computers. I also might point out that the default behavior for a windows file manager is to do a little iconic document preview in the file manager, which not only means reading the entire file, but also rendering it and resizing it to an icon sized thing. This is surely more processor intense than say, reading the first 2 or 3 bytes of the file. (And if you render it, aren't you already reading the first 2 or 3 bytes of the file?)

I would also point out that whether it is "efficient" or not, it is more important to be correct. And whether you are dealing with a file manager (whose entire focus is managing files, and that includes file types) or scary unknown things coming from an untrusted web site source... isn't it better to be correct?

Some will also point out that there are some non-header type files. MP3's with ID3v2 have the "file type" buried deep inside them, meaning you might have to read the whole file. So what? In the case of the exploit, don't you have to read the whole file anyway in order to display/execute it? And I might add that the current unix file type command (i.e. 'file') can and does correctly identify these types of files... and even if it didn't, wouldn't a file type of 'unknown' be superior than a file type of 'pwoned'?

In order for this stupidity to happen, you still have to believe in the old Win name is file type mentality. Or in this case, file name and Mime type [5]are file type (and not the contents of the file itself). In fact, in order for this to happen, you have to have had this stupid mentality three times.

Probable Attack Vector

Okay, I have not read or heard the details here, so I am winging it, but I am pretty sure the actual vector is something like what I am describing.

  1. Make your exploit code into a GIFAR (or a PNGAR or some other stupid combination of executable and non-executable).
  2. Find a stupid web site
  3. Upload GIFAR in a spot that wants a picture. I am guessing they named it picture.jar.
  4. Facebook serves it, sees the picture.jar name and wraps it in mime type of "application/java-archive".
  5. Browser executes the Java

Problem with Web application

First off, this is a bad web app. You are getting a file from a probably anonymous teenager. Might you check it's file type? Do you really trust this guy? If the web app was expecting a minimal set of possible file types... lets say GIF, JPG, MPG, PNG... then it should toss anything else on the floor and complain loudly.

Problem with Web Server

Look, I like Apache. And apache has built in stuff to figure out and tag the file with the correct mime type. If I serve a file named 'picture' it can figure out it is a jpg and tag it correctly. But if I put a picture out there named 'picture.jar' it trusts the file name[6]. Ick. Sure, I could edit the configuration and make this go away. I understand that. But it should ship safe. It should still be configurable, but if you want it to do stupid things with a file, you should have to ask it to do stupid things with a file. This is what I meant when I said unix still had some apps that didnt handle file types correctly... even though it is very available in the OS.

and yes, a problem with the browser

Even with Firefox[7] there is trust for the mime type... which is just as stupid as those WinWeinies back in 1990. Its a web site you don't trust. Why trust the type? If it says java-archive and the magic number says GIF89, toss it.[8] But I tried serving up a PNG named picture.jar, which apache was more than happy to mime type as a java-archive... and Firefox was more than happy to (attempt to) treat is as a Java file. Now whether IIS and/or IE do this, I don't know. I suspect they do since the OS has a tendency to always trust file names as file types.

And an easy fix

Okay, the fix isn't so easy unless you are willing to open up the code, but it's been there for-freaking-ever. libmagic is your friend. It knows what files are what types. And if we could all agree exactly on how it is implemented[9] then it wouldn't matter. In other words, if a file, by some mystery, met the signatures of both a GIF and JAR... and we agreed that the file type was the signature that occurs first in the file, then it would either be one or the other. And if we all used some flavor of libmagic... well, we'd know what freaking file types things were. And if we were connected to untrusted hosts by some series of tubes... well, then we could be a little less trusting of mime types and file names.

Notes on my feet

  1. ...and administrative users incorrectly... and time of day incorrectly... and so on

  2. which means I am really not going to do any real checking here. But I can verify that as far back as about 1990, SunOS was doing it correctly. It was a BSD flavor then, which implies BSD was doing it correctly. And with the next major release, it moved to an ATT flavor unix, which implies ATT was doing it correctly too.

  3. I say "unix" does it right. That doesn't mean some unix apps do it right. Facebook is possibly unix... hard to tell since it sits behind a masquerading load balancer. But it runs Apache, which means it might be. More on this.

  4. sometimes I use foot notes for no real reason whatsoever. Have you noticed?

  5. Mime type is just a little bit of descriptive text automatically inserted into the HTTP protocol to give the browser a head's up as to what it is getting. I am not sure why a browser would trust this.

  6. or at least it does as of version 2.2.8

  7. at least with version 2.0.x

  8. Of course, if the web application was so braindead that the signature was that of a JAR file, and it was served a JAR file with mime type of a JAR file, you're in a pickle... JAR.

  9. My understanding is that the specs are a little vague

No comments: