Javascript Security for Dummies from Toby Inkster on 2010-03-04 (public-rdfa-wg@w3.org from March 2010)

From: Toby Inkster <tai@g5n.co.uk>
Date: Thu, 04 Mar 2010 19:37:49 +0000
To: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <1267731469.30377.38.camel@ophelia2.g5n.co.uk>
Apologies for cutting out of the telecon a few minutes early. As I left
I volunteered to write a quick summary of how Javascript/JSON/JSONP/CORS
relate to each other and the various security issues involved.

Let's start with the basics: Javascript (more properly known as
ECMAScript these days) is a scripting language with various
implementations, the best known being the ones that are embedded in most
modern, graphical browsers. Javascript, as implemented in browsers, runs
in a sandbox to prevent maliciously crafted web pages from doing any
damage to the visitor's machine.

A good number of years ago, Microsoft implemented a proprietary
extension to Javascript which allowed scripts to perform HTTP requests
and make use of the responses. Mozilla implemented some fairly similar
functionality, and eventually other browsers followed (using the Mozilla
syntax). This feature is called XmlHttpRequest or XHR - it's a bit of a
misnomer as it's not restricted to retrieving XML.

For security reasons, XHR requests are only allowed to be performed to
URLs on the same domain as the script itself was loaded from. This is a
good thing because you don't want http://evil.example/badpage.html to be
able to perform an XHR to <http://bank.example/account-statement.html>
especially given that the XHR would be sent with all the applicable
cookies in the browser's cookie jar.

(Aside: technically it's not cross-domain requests that are disallowed,
but cross-origin requests. An origin is a slightly wooly concept:
foo.example.com and bar.example.com are considered to be the same
origin, foo.co.uk and bar.co.uk are different origins, despite the fact
that from a DNS viewpoint, they're both third-level domain names. This
is all being standardised currently.)

Douglas Crockford "discovered" JSON. He maintains that he didn't invent
it, just realise that Javascript contained a useful bit of syntax that
could be standardised on. JSON is a restricted subset of Javascript's
notation for objects and arrays. (JSON = Javascript Object Notation.)
JSON is a data format that allows strings, numbers, booleans, arrays and
associative arrays to be represented. In many ways it can be considered
a competitor to XML. It's also pretty similar to YAML (though the
oft-quoted statement that it's a subset of YAML is an urban myth).
Here's how a person might be represented in JSON:

 {
  "name"     : "Toby Inkster" ,
  "homepage" : "http://tobyinkster.co.uk/" ,
  "mbox"     : "mailto:mail@tobyinkster.co.uk"
 }

Getting back to XHR, often people want to be able to request data from
other origins, circumventing the same-origin policy enforced by
browsers. With a little bit of extra syntax, JSON can be useful for
this. This extra syntax is called JSONP. (JSONP = JSON plus Payload.)

The way that JSONP works is that instead of supplying a JSON response,
the server responding with the data sends a Javascript response, like
this (usually the name of the callback function is configurable as a
query string):

 callback_function({
  "name"     : "Toby Inkster" ,
  "homepage" : "http://tobyinkster.co.uk/" ,
  "mbox"     : "mailto:mail@tobyinkster.co.uk"
 });

How does this circumvent XHR's same-origin policy? Answer: it doesn't.
But it eliminates the need to use XHR at all. The page requesting the
data doesn't need to perform an XHR request, it just defines a function
called callback_function to deal with the data, then it loads the JSONP
file using a standard HTML <script src> element. The browser downloads
and executes the script, and calls the function with the data as a
parameter.

However, this opens up a big security hole. Suppose that the server
supplying a JSONP response is compromised, or its owner just decides to
turn to the dark side. The server can send arbitrary Javascript (i.e.
not JSONP) and the browser will execute it unquestioningly. This
Javascript could be used to steal cookies, passwords and other
privileged information from the page it was included in. Not nice.

CORS is another way around the same-origin policy, but this time it's
not a hack. It's a set of HTTP headers that a URL can respond with to
indicate that it's safe to be retrieved in cross-origin requests. So if,
say, http://bank.example/homepage.html contains no private data and is
perfectly safe for other sites to have access to, then it could set a
CORS header to allow http://evil.example/badpage.html to try its worst.
http://bank.example/account-statement.html wouldn't send the CORS header
so would be protected by the default same-origin policy.

So how does this apply to RDFa vocabularies/profiles?

If vocabularies are hosted on a separate server to the pages making use
of them, then Javascript implementations of RDFa would need to make a
cross-origin request to read them. (Actually they could make a
same-origin request to a proxying script, but that's not an especially
elegant solution.)

If we want such Javascript implementations of RDFa to be possible, this
allows two solutions:

 1. Serve up the vocabulary document as JSONP; or
 2. Serve it up as something else plus CORS headers.

#1 is problematic because as I said, JSONP is not nice, safe JSON,
despite the similar names. JSONP is Javascript.

#2 is problematic because CORS is a very new feature. Many of the newest
browsers support it (including IE8), but if you want your script to work
in downlevel browsers, this is not your solution.

In my next message I'll outline how my RDFa vocab proposal (which is
slightly different to Manu's) makes this a moot point by saying that
retrieval of the vocab document is optional - a SHOULD requirement
rather than a MUST - and provides a fallback in the case, e.g. of
browsers which don't implement CORS.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
Received on Thursday, 4 March 2010 19:39:03 UTC