Skip to content

Two quadrocopters juggling a ball

This is ridiculously awesome: httpv://

Caja at OWASP, Sweden

Last week, I attended OWASP AppSec Research 2010 in Stockholm, Sweden.  The conference was well attended with a mix of people from industry and from academia.  There were an especially interesting set of presentations.

Mike Samuel and I spoke about virtualization as an essential security tool exemplified by our project, Caja, which replaces the same-origin policy in browsers.  The same origin policy is the existing security policy baked into browsers.  The authority that code on a page has is decided completely by the domain of the page.  This has somewhat worked traditionally when all of the code and data on a webpage is generated and vetted by the same person or organization.  The limits of this security policy become apparent, however, in social networks and on other websites which include code authored by third parties.  The browser exposes all sorts of authority ambiently to all code executing on a page irrespective of how the code came to execute. Users rightly hold the website (as identified by the url they entered in the browsers address bar) responsible for their data, however, the browser gives no ability to the site to limit what third party code on the page is able to do.

For social networks, there is a further problem in that the security policy needed changes over time as the social network experiments and matures, the value of its data grows and new attacks emerge.  Unfortunately by the time a social networking site has figured out its niche and identified its real threats, the site has acquired a large body of legacy code which must continue to run to avoid annoying its users.  The threat model is sufficiently unpredictable that no amount of upfront security design is sufficient  for the lifetime of the site.

Virtualization gives sites which anticipate this problem the ability to flexibly respond to changing threats by maintaining the security policy in code the site controls.  Simply by requiring third party code to only interact via exposed APIs, a social network can modify the authority any particular third party code has simply by changing the implementation of the API.  This gives it a place to stand to attenuate the authority the third party code gets without changing the API it responds to.

Virtualization is a general tool which is applicable wherever the mutually untrusting code must execute in the face of changing security policy.  For browsers, Caja provides such a virtualization layer for JavaScript, HTML, CSS and the DOM.  The browser is a complicated beast and not every layer is completely supported — third party code can only use those APIs which have been virtualized.  That said, as of today this includes almost all of HTML 4.01, CSS 2.1 and JavaScript and a large number of the commonly used DOM functions with more being added as they are requested and their security properties understood.

Try it out

Unincorporated Man

I just ordered The Unincorporated Man.  The blurb suggests the story is one in which “every individual is incorporated at birth, and spends many years trying to attain control over his or her own life by getting a majority of his or her own shares.”  The subject matter is provocative with compelling arguments for and against the idea.  I am optimistic that the authors do it justice.

The discussion that led me to order the book leapt fairly quickly from “what’s the price you are willing to accept to give up a freedom” to “if it was allowed, we would descend in to slavery to corporations”.  However, the premise of the book isn’t as far fetched and in diluted versions, such arrangements are common today.  Not only are there are companies like Lumninet that offer loans to students to in exchange for a portion of their future earnings, many government scholarships offered by countries like Australia, New Zealand and UK are bonded scholarships - students receive the scholarship in exchange for a promise to work in the country for a fixed period of time.  If I must belabor the analogy, its a large entity with access to resources that grants money to a person who needs it in exchange for the person giving up their choice to pursue a potentially larger income elsewhere.  The government benefits by retaining is skilled labor and the person benefits by having access to money to finish their studies that they otherwise would not have had access to.  Even golden handcuffs are a form of payment in exchange for relinquishing a choice.  A golden handcuff is a financial incentive that a company offers to its employees to discourage them from leaving the company, for example, stocks which vest in the future.  They are payments by a company in exchange for an employee giving up their choice to leave for another.

Of course there are critical differences between these examples and the choice offered in the book - one is the length of time - a fixed term vs a lifetime; another is the age of individual at which he or she makes this choice - as a child or as an adult.  But if they are important then they should feature in the argument.

JavaScript Puzzle I: Answer

The last puzzle asked what the following JavaScript snippet prints to the console:

j = a = i = 2;
var foo = j
/[a/*]/; foo++; [ /**/ ]

Clearly since this is a puzzle about parsing, you might guess that incorrect answers probably result from misparsing the snippet. Naively there are at least three interesting ways this snippet could be parsed as JavaScript. Here are two ways of them (the third is left as an exercise):

j = a = i = 2;
var foo = j;
[ /**/ ];
semicolon inserted
a useless regular expression that matches a, /, or *
increment foo
a useless array expression containing a block comment
j = a = i = 2;
var foo = j / [ a
/*]/; foo++; [ /**/
var foo initialize to j divide by an array containing the variable "a"
Block comment that is completely ignored by the interpreter
the end of the array

The answer to the puzzle depends on which of these occurs.  Trying it out in any JavaScript shell you can tell the answer is 1. The interesting point is off course why it is so. In order to see why it is, let us pretend to be a JavaScript parser and see what the parser sees.

The parser scans characters from left to right. It seems the first statement j = a = i = 2;. If we assume that this statement is the first statement in the program, it creates the variables j, a and i and initializes them to 1. The next line is var foo = j. The parser parses the variable declaration for foo, an equal sign for assignment and begins parsing what it expects to be an expression beginning with a variable j. As we saw before, at this point the parser must decide whether the end-of-line character it sees next is the end of the statement (in which case it should insert a semi-colon), or the statement is continued on the next line (in which case the end-of-line should be treated as whitespace). To decide which of these it should do, the parser looks at the first character of the next line — ah it anthropomorphically says to itself — the first character is ‘/’ which could be the start of a comment, the start of a regular expression or a division operation. Since the expression the parser is parsing is not a restricted production and an expression can be followed by a division operator, the parser will interpret the ‘/’ as a division operator.

What is expected after a division operator? Well it is an expression! “Ah!”, our enthusiastic parser says to itself, “I see a square bracket - that can be the start of an expression.” The parser consumes the square bracket and the variable a. It then sees a ‘/’ character. Once more this can be either another division operator, the start of a comment or the start of a regular expression. The parser looks ahead one character, sees the ‘*’ and decides it is a comment block. As a result, it can skip processing characters till it sees a closing ‘*/’. This doesn’t happen till close to the end of the line and is followed by the closing square bracket. As a result, what the interpreter will see is a parse of var foo = j / [ a ];

Once we have a parsed version of the program, evaluating the parse tree is relatively straight-forward. Division only makes sense if what you are dividing with is a number. JavaScript sometimes helpfully converts non-numbers into numbers when needed - arithmetic is one such time.  The numerical value of an array which contains just one element (like [ a ]) happens to be the value of the only element.   You can try this out yourself in a shell by trying to evaluate examples like <code>+ [ 1 ]</code> and <code>+[1, 2]</code>.  In our case, the value of a is its initial value, 1. Thus the interpreter divides the value of j (which is 1) by the 1 and assigns the resulting 1 to foo. And the next line prints it out.

All of this might be fun (or you might think its just a bit quirky), but there are a couple of interesting lessons to take away from this puzzle (and from other programming language puzzles you will see here). The simplest and most important lesson is:

If a language is hard to parse, it is hard for users of the language to write correct and secure programs in it.

The set of rules you need to keep in your head when trying to parse a given snippet of JavaScript is large and complicated and required three rambling paragraphs to explain. This matters to a small degree when you are implementing a parser for the language - but it matters a lot when a human programmer is trying to understand their own code or someone else’s. Humans make simplifying assumptions about what a piece of code looks like it is doing, and deduce that that is what it is doing. It was partially for this reason that when you glanced at the original puzzle program, you might have mentally broken down the program in to one (or both) of the potential parses without really carrying out a mental step by step parse. It is very much like an optical illusion - there are some pictures which can at a glance will be seen by some people one way and others a different way.

A programming language designed to write correct and secure code should NOT exhibit this feature.

JavaScript Puzzles: First in a series…

JavaScript makes the use of semicolons to delimit statements optional. As The Good Parts warns, this is dangerous and is one of the sharp knives in JavaScript. Crockford’s JSLint will scold you properly and Caja will issue static lint warnings if you forget to explicitly use semicolons.

For those of you who are already familiar with this, feel free to cut straight to the puzzle. For the rest of us, there are some great simple examples demonstrating how semicolon insertion is confusing. For instance, what does the following function return?

function universe() {

Well, its not 42 and the reason is a semicolon is inserted for you by the parser immediately after the return statement. The JavaScript grammar specifies a small number of restricted productions which mark where a line may be terminated. The effect of these restricted productions is that when a continue, break, return, or a throw keyword is followed by the end of a line, a semicolon is automatically inserted. As a result the function above is parsed as if it consisted of the following two statements:

function universe() {
  return;  // semicolon inserted
    42;     // useless expression

Since the return takes no expression, the function returns the special JavaScript value “undefined” and the statement consisting of just the expression 42 is never executed.

You might be tempted to dismiss this example because it may appear obvious what should happen. Unfortunately there are two other particularly befuddling sources of confusion when parsing JavaScript — distinguishing the division operator and the start of a comment — which combined with semicolon insertion means that code in one location can affect the parsing of code an long distance away. Any time that happens, it means the semantics of a snippet of code you are looking at can be affected by seemingly unrelated code elsewhere in the file which results in bugs, security problems and general sadness.

What does the following snippet of code output:

j = a = i = 2;
var foo = j

How would you expect a parser to parse it?

The first line clearly just initializes three variables. There are two ways the next two lines might be parsed. One way is for a semicolon to be inserted automatically after the second line terminating the statement expression var foo = j like this:

j = a = i = 2;
var foo = j;

. In this case, the third line is the regular expression /a/i which matches the character “a”. The “i” is a flag that makes the regular expression matching case-insensitive. In this case, there’s no string that the regular expression is being applied to and the expression has no side-effect. As a result, you might expect console.log to print 2.

The second way in which this expression could be parsed is for no semicolon to be inserted. In this case, the end of line is treated just like whitespace like this:

j = a = i = 2;
var foo = j /a/i;

Ah - that looks just like a series of divisions. In this case, the value 0.5 would be printed to console.

Which one of these two things really happens? The ES specification which defines the language requires that the latter occur. Loosely speaking the spec suggests as long as you’re not parsing a restricted production (like we were in our first example), no semicolon is inserted if the token after the end of a line might be valid if the end of the line was treated as whitespace.

Now you’re ready for something a bit more interesting - notice that the character that starts a line comment or a comment block is the same character used both for division and to introduce regular expressions! If you are writing a parser (or much more commonly a human reading code someone else has written), how do you decide when you see the / character whether it is the start of a comment, a regular expression or a divide operator?

Let’s cast this in the form of a puzzle.


What does the following JavaScript snippet print to the console?

j = a = i = 2;
var foo = j
/[a/*]/; foo++; [ /**/ ]

Answer next week.

WorldNetDaily Misses An Opportunity to Gnash Teeth

According to WorldNetDaily, a mother was horrified to learn the PC version of Hasbro’s The Game of Life does not prevent same-sex marriages from occurring. The article is a lot of tripe but it has some absolute gems of over-reaction:

I had no idea how insidious they were being with pushing the homosexual agenda.

First we can make the mature and grown-up computer scientist observation that the original, physical game (… the Hasbro one - not the original ahem game of life) also does not prevent a child from marrying off two boy pegs or two girl pegs to each other but some how because it is possible to prevent such actions in a computer game, there is suddenly an expectation that these restrictions be implemented. No one complained when Mr Milton-Bradley failed to use differently shaped pegs and peg-holes so that little Susie wouldn’t try to marry Patty Peg to Peggy Peg and end up asking her mum awkward questions. (As Mike pointed out to me - this is the 1860s so even if any such attempt had been made, poor Peggy Pegs would probably only fit in the Game of Life kitchen to promote that particular stereotype).

In meatspace, the rules of many games exist only as a tacit agreement between the players …largely because flaunting these rules makes the games less interesting. There is nothing physically preventing a chess player from moving chess pieces however he wants, or a soccer player from just picking up a ball and just running with it, but the result is not as interesting a game (rugby not withstanding) nor is it stable.

In a computer game, there is often a place to stand to implement some controls. Sometimes they are implemented, sometimes they are not. Sometimes they can be easily worked around (Did any other Emacsen ever do (push 18 dun-inventory)?) and sometimes slightly less easily (auto-aim in Doom fr’instance). But invariably there is an expectation that the rules of the game that were a convention in meatspace, be programmatically enforced when the same game is re-interpreted to run on a computer. Given that my regular job is building a system where mutually suspicious programs are able to inter-operate according to the rules of the game, its interesting to note that a feature of programs running on a computer is that they are able to have rules enforced upon them, while a feature of humans interacting in meatspace is that they are able to choose to follow the rules that surround them.

Not only did WorldNetDaily miss opportunities to use its incisive, investigative talent to examine the promotion of more than one queen in a game of chess, or the brokeback number of kings that ends a game of checkers. No …the WDN missed an opportunity that was right under their noses …there is a much more popular computer game, run on many, many more computers in the world, and used up many more computer cycles that also happens to be called the Game of Life. It not only allows a child to have three parents, it uses a programs ability enforce rules to force all children to be the result of a ménage à trois.

And that is an insidious, agenda-pushing conspiracy worthy of an over-reaction.

GIF/Javascript Polyglots

One of the nice aspects of working on Caja has been the people I’ve had a chance to meet and work with. Their ideas have helped inspire Caja’s design and implementation. Other times, they have inspired the following kind of wackiness:

<script src="thinkfu-js.gif"></script>

<img src="thinkfu-js.gif">

That’s right - that is the same GIF that is being interpreted as a script and as an image.

There’s no content-sniffing tricks going on here but it was content-sniffing that inspired me to create this particular trick. In the space of a week, chance conversations with Arturo Bejar and Doug Crockford about content-sniffing reminded me of Perl::Visualize — a GIF/Perl polyglot generator I’d written back in 2003. In humanspeak, a polyglot is a person who speaks more than one language. In computerspeak, a polyglot is a snippet of code which is a valid program in two or more languages. Writing polyglots can be a fun mental exercise and if you have never tried it, I thoroughly recommend it.

The thing that made Perl::Visualize clever is that one of the languages is GIF — as in the picture format. One way of thinking about a GIF viewer is as an interpreter that takes a peculiar programming language as input and as a result of executing that program decode an image. Perl::Visualize takes an arbitrary GIF image and an arbitrary perl program and creates a combined file which can be interpreted as an image and as a perl program.

This makes some rather nifty tricks possible. For example, you can embed a perl program in a picture of its own control-flow graph. Now you have a program which can explain itself when you throw it at a picture viewer and run itself when you throw it at a perl interpreter. It takes literate programming to its logical conclusion — I call it aesthetic programming and I am sure it will have Knuth laughcrying.

Unfortunately doing polyglots with perl impresses no one. People unfamiliar with perl think that because perl can look like line-noise, all line-noise is valid perl. It turns out though that the tricks that made GIF/Perl polyglots possible can be ported to Javascript rather easily. I’ll describe how GIF/Javascript polyglots work and a script for generating them in another post but its worth noting here that the image above is a perfectly valid GIF file just as it is a perfectly valid javascript program (in fact - it’s even a valid Caja program!). An image tag expects its src attribute to point to content which parses correctly as an image, just as a script tag expects its src attribute to point to a javascript file. The tag specifies a context in which content of a particular type is expected. If the only information a browser used to render content was the context created for it by the surrounding tag, things would be simple. But things in the browser world are never simple. When a server sends a file, it also sends that file’s MIME type in a Content-Type header. All is well when the Content-Type the server asserts is consistent with the expected context in which that content gets used. What happens when the server does not send a Content-Type? What happens when a file with one Content-Type is sent when a different type is expected?

Sadness happens.

Some browsers consider the content-type the server asserts to be authoritative and if the content fails to parse as that type, the content is not rendered. Others ignore the server asserted type and try to guess (sniff the content) for its type. This sniffing can take the form of heuristics like the suffix of the file name in the URL that specifies it, the “magic” first couple of bytes of the content, or simply trying to parse the file with different parsers until one fits. The type of parser tried is sometimes constrained by the particular tag (fr’instance content expected by an img tag would only attempt to be parsed according to native image formats supported by the browser.). The problem is further exacerbated by plugins like Java and Flash and by different types of caches and “file save” feature in browsers which may or may not remember what content-type was asserted by the server.

In the programming languages world, this kind of thing would be called duck-typing (if it walks like a duck and quacks like a duck, treat it like a duck).

In web world, this is completely busted.

Browsers perform content-sniffing ostensibly in the interest of usability so even badly configured servers can continue to “work”. The problem here is that a browser gives different types of content different amounts of access. If you can fool the browser into thinking one type of content is actually another you can bypass the restrictions placed on the actual content’s access. For example, an HTML page is allowed to load external images, stylesheets and scripts. In this case the security context these resources execute in is derived from the URL of the page that these resources are embedded in. On the other hand, if the type of content being loaded is Flash or Java applet say, the security context is derived from the URL of the applet object itself. If the browser uses heuristics and gets confused between a Flash object and an image, there are real security implications! It was this type of confusion which was the source of the GIFAR attack.

What are the security implications of GIF/Javascript polyglots? Since images and javascript share the same same-origin policy, getting a browser to confuse one for the other does not result in an obvious exploit. However, it does re-emphasize the lesson from the GIFAR exploit — blacklisting or recoding particular files is not going to be sufficient while:

  • we’re able to construct data that can be validly parsed as two as two or more types;
  • browsers sniff to determine content-type; and
  • the security context a resource executes in depends on its content-type.

(Thanks to Mike “The Human Linter” Samuel for correcting errors in this post)

I’m An Idiot

Sadly, this is a true story.  At least I learned about the OS X say command.[

Randal is awesome - it impresses me how XKCD can be simultaneously so universal and yet so surreal.  Today I received the seventh email from friends asking if the "true story" this strip is based on was my own experience breaking into my apartment a few years ago.  Sadly, its not - for one, the intersection of my IRC friends and Mac friends is a null set.  Sharing this story got me a free round of drinks at a pub in Boston once which was pretty neat - though not nearly awesome as having an XKCD comic based on you.