- From: Toby Inkster <tai@g5n.co.uk>
- Date: Thu, 04 Mar 2010 19:37:49 +0000
- To: RDFa WG <public-rdfa-wg@w3.org>
Apologies for cutting out of the telecon a few minutes early. As I left I volunteered to write a quick summary of how Javascript/JSON/JSONP/CORS relate to each other and the various security issues involved. Let's start with the basics: Javascript (more properly known as ECMAScript these days) is a scripting language with various implementations, the best known being the ones that are embedded in most modern, graphical browsers. Javascript, as implemented in browsers, runs in a sandbox to prevent maliciously crafted web pages from doing any damage to the visitor's machine. A good number of years ago, Microsoft implemented a proprietary extension to Javascript which allowed scripts to perform HTTP requests and make use of the responses. Mozilla implemented some fairly similar functionality, and eventually other browsers followed (using the Mozilla syntax). This feature is called XmlHttpRequest or XHR - it's a bit of a misnomer as it's not restricted to retrieving XML. For security reasons, XHR requests are only allowed to be performed to URLs on the same domain as the script itself was loaded from. This is a good thing because you don't want http://evil.example/badpage.html to be able to perform an XHR to <http://bank.example/account-statement.html> especially given that the XHR would be sent with all the applicable cookies in the browser's cookie jar. (Aside: technically it's not cross-domain requests that are disallowed, but cross-origin requests. An origin is a slightly wooly concept: foo.example.com and bar.example.com are considered to be the same origin, foo.co.uk and bar.co.uk are different origins, despite the fact that from a DNS viewpoint, they're both third-level domain names. This is all being standardised currently.) Douglas Crockford "discovered" JSON. He maintains that he didn't invent it, just realise that Javascript contained a useful bit of syntax that could be standardised on. JSON is a restricted subset of Javascript's notation for objects and arrays. (JSON = Javascript Object Notation.) JSON is a data format that allows strings, numbers, booleans, arrays and associative arrays to be represented. In many ways it can be considered a competitor to XML. It's also pretty similar to YAML (though the oft-quoted statement that it's a subset of YAML is an urban myth). Here's how a person might be represented in JSON: { "name" : "Toby Inkster" , "homepage" : "http://tobyinkster.co.uk/" , "mbox" : "mailto:mail@tobyinkster.co.uk" } Getting back to XHR, often people want to be able to request data from other origins, circumventing the same-origin policy enforced by browsers. With a little bit of extra syntax, JSON can be useful for this. This extra syntax is called JSONP. (JSONP = JSON plus Payload.) The way that JSONP works is that instead of supplying a JSON response, the server responding with the data sends a Javascript response, like this (usually the name of the callback function is configurable as a query string): callback_function({ "name" : "Toby Inkster" , "homepage" : "http://tobyinkster.co.uk/" , "mbox" : "mailto:mail@tobyinkster.co.uk" }); How does this circumvent XHR's same-origin policy? Answer: it doesn't. But it eliminates the need to use XHR at all. The page requesting the data doesn't need to perform an XHR request, it just defines a function called callback_function to deal with the data, then it loads the JSONP file using a standard HTML <script src> element. The browser downloads and executes the script, and calls the function with the data as a parameter. However, this opens up a big security hole. Suppose that the server supplying a JSONP response is compromised, or its owner just decides to turn to the dark side. The server can send arbitrary Javascript (i.e. not JSONP) and the browser will execute it unquestioningly. This Javascript could be used to steal cookies, passwords and other privileged information from the page it was included in. Not nice. CORS is another way around the same-origin policy, but this time it's not a hack. It's a set of HTTP headers that a URL can respond with to indicate that it's safe to be retrieved in cross-origin requests. So if, say, http://bank.example/homepage.html contains no private data and is perfectly safe for other sites to have access to, then it could set a CORS header to allow http://evil.example/badpage.html to try its worst. http://bank.example/account-statement.html wouldn't send the CORS header so would be protected by the default same-origin policy. So how does this apply to RDFa vocabularies/profiles? If vocabularies are hosted on a separate server to the pages making use of them, then Javascript implementations of RDFa would need to make a cross-origin request to read them. (Actually they could make a same-origin request to a proxying script, but that's not an especially elegant solution.) If we want such Javascript implementations of RDFa to be possible, this allows two solutions: 1. Serve up the vocabulary document as JSONP; or 2. Serve it up as something else plus CORS headers. #1 is problematic because as I said, JSONP is not nice, safe JSON, despite the similar names. JSONP is Javascript. #2 is problematic because CORS is a very new feature. Many of the newest browsers support it (including IE8), but if you want your script to work in downlevel browsers, this is not your solution. In my next message I'll outline how my RDFa vocab proposal (which is slightly different to Manu's) makes this a moot point by saying that retrieval of the vocab document is optional - a SHOULD requirement rather than a MUST - and provides a fallback in the case, e.g. of browsers which don't implement CORS. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
Received on Thursday, 4 March 2010 19:39:03 UTC