One of the nice aspects of working on Caja has been the people I’ve had a chance to meet and work with. Their ideas have helped inspire Caja’s design and implementation. Other times, they have inspired the following kind of wackiness:
That’s right - that is the same GIF that is being interpreted as a script and as an image.
There’s no content-sniffing tricks going on here but it was content-sniffing that inspired me to create this particular trick. In the space of a week, chance conversations with Arturo Bejar and Doug Crockford about content-sniffing reminded me of Perl::Visualize — a GIF/Perl polyglot generator I’d written back in 2003. In humanspeak, a polyglot is a person who speaks more than one language. In computerspeak, a polyglot is a snippet of code which is a valid program in two or more languages. Writing polyglots can be a fun mental exercise and if you have never tried it, I thoroughly recommend it.
The thing that made Perl::Visualize clever is that one of the languages is GIF — as in the picture format. One way of thinking about a GIF viewer is as an interpreter that takes a peculiar programming language as input and as a result of executing that program decode an image. Perl::Visualize takes an arbitrary GIF image and an arbitrary perl program and creates a combined file which can be interpreted as an image and as a perl program.
This makes some rather nifty tricks possible. For example, you can embed a perl program in a picture of its own control-flow graph. Now you have a program which can explain itself when you throw it at a picture viewer and run itself when you throw it at a perl interpreter. It takes literate programming to its logical conclusion — I call it aesthetic programming and I am sure it will have Knuth laughcrying.
src attribute to point to content which parses correctly as an image, just as a script tag expects its
Some browsers consider the content-type the server asserts to be authoritative and if the content fails to parse as that type, the content is not rendered. Others ignore the server asserted type and try to guess (sniff the content) for its type. This sniffing can take the form of heuristics like the suffix of the file name in the URL that specifies it, the “magic” first couple of bytes of the content, or simply trying to parse the file with different parsers until one fits. The type of parser tried is sometimes constrained by the particular tag (fr’instance content expected by an
img tag would only attempt to be parsed according to native image formats supported by the browser.). The problem is further exacerbated by plugins like Java and Flash and by different types of caches and “file save” feature in browsers which may or may not remember what content-type was asserted by the server.
In the programming languages world, this kind of thing would be called duck-typing (if it walks like a duck and quacks like a duck, treat it like a duck).
In web world, this is completely busted.
Browsers perform content-sniffing ostensibly in the interest of usability so even badly configured servers can continue to “work”. The problem here is that a browser gives different types of content different amounts of access. If you can fool the browser into thinking one type of content is actually another you can bypass the restrictions placed on the actual content’s access. For example, an HTML page is allowed to load external images, stylesheets and scripts. In this case the security context these resources execute in is derived from the URL of the page that these resources are embedded in. On the other hand, if the type of content being loaded is Flash or Java applet say, the security context is derived from the URL of the applet object itself. If the browser uses heuristics and gets confused between a Flash object and an image, there are real security implications! It was this type of confusion which was the source of the GIFAR attack.
- we’re able to construct data that can be validly parsed as two as two or more types;
- browsers sniff to determine content-type; and
- the security context a resource executes in depends on its content-type.
(Thanks to Mike “The Human Linter” Samuel for correcting errors in this post)