- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 1 Feb 2010 11:04:09 -0500
- To: Tyler Close <tyler.close@gmail.com>
- Cc: www-tag@w3.org
- Message-ID: <OF8C38F5AD.B708DD79-ON852576BB.000190DB-852576BD.0057F615@lotus.com>
Tyler: Here are some comments on web-keys and the web-key paper [1] including a response to your email [2]. I'll be mixing in quotes from your email and the paper. I'll leave for a separate email responses to your specific concerns about our TAG findings and their advice on secrets in URIs. By the way, it may be of passing interest that a very long time ago, I spent nearly two years consulting on a capability-based operating system [3] at the Stanford CS department. So, while I don't claim any expertise on recent developments, I did at one point have a pretty good understanding of the fundamental principles of capability-based computing, and some experience designing an actual system. Is there a need for some new access and protection model for the Web? A lot of the web-key paper is devoted to justifying the need for something other than the password-for-a-site + cookies + same origin policy that are in widespread use on the Web. While I don't necessarily agree with every point made, I do agree with the conclusions: those mechanisms have serious drawbacks, CSRF is a serious problem, fine-grained access control is desirable, etc. So, looking for alternative security mechanisms is indeed worthwhile. Is the Web a good foundation for a capability-based system? The fundamental premise of the web-key work is that the Web, used carefully, can approximate the characteristics of a capability-based system. That is, web-keys use URI's as "capabilities", which can be informally described as a tokens that simultaneously provide addressing of, and convey permission to access, a resource. The classic security model for a capability system involves using mechanisms of the system itself to grant or transfer capabilities in a protected way. That is, capability tokens are usually managed by the protected kernel of the system, much as file handles are managed in Unix, and transmission of a capability from one user to another is mediated by system mechanisms that ensure capabilities are given only to those who should have them. Indeed, in the purest form of capability system, the ability to transfer a capability is itself a capability. The RFCs that define URIs, HTTP, HTTPS and the associated http and https URI schemes do not, by my reading, provide such an architecture or such guarantees. Unlike a classic capability-based system, in which capabilities are managed by a protected kernel or similar hardware/software, URIs retrieved from servers (e.g. links in a Web page), are returned in clear text to user agents that are free to copy those URIs most anywhere. Transmission using HTTPS is reasonably well protected, but subsequent manipulation or redistribution by user agents mostly isn't. Ironically, the web-key paper makes just this point, discussing some of the pertinent RFC's: > Both RFC 2616 on HTTP/1.1 [HTTP] and RFC 3986 on the URI [URI] > provide security guidance advising against the inclusion of > sensitive information in a URI. The text from Section 7.5 of > RFC 3986 provides a good summary of the arguments presented: > "URI producers should not provide a URI that contains a > username or password that is intended to be secret. URIs are > frequently displayed by browsers, stored in clear text > bookmarks, and logged by user agent history and intermediary > applications (proxies)." The problem is, the web-key itself (the hash) is a secret, and is therefore subject to this advice. The whole model for web-keys is: use them when you're confident that the URIs won't wind up in unintended places. The above quotes make clear that users agents are, with some important exceptions, not responsible for protecting the text of links that they receive. While the particular secrets mentioned are usernames and passwords, the advice about URIs being displayed, stored in clear text, etc., clearly applies to all information that must be protected. Also: RFC 2818 [4] is the URI scheme registration for the https scheme, and as far as I can see it does not mandate or even suggest limiting the transmission or transcription of https scheme URIs. The web-keys paper goes on to try and deal with some of the particular exposures by cases: it points out that with care, browser caches can be emptied promptly (which only reduces but does not eliminate risk), shoulder-surfing isn't always a real issue, etc. The problem is that one can only deal with security by cases if the cases in question are known and bounded. That's not true of the Web. User agents can, without violating RFCs, do all sorts of problematic things we might not have even thought of. Ironically, the web-key paper in an earlier section discusses just the sort of unbounded innovation that's perfectly consistent with pertinent RFCs, but problematic for the web-key security model: > Many modern browsers include an option to report each visited > URL to a central phishing detection service. The IE7 > implementation of this feature first truncates the URL to omit > the query string. The IEblog indicates this approach was taken > to protect user privacy and security [phishing filter]. > Unfortunately, this precaution is not taken in other browsers. > Users of these other browsers who enable online phishing > detection must trust that confidentiality is adequately > maintained by the remote service. Automatically extracting data > from an end-to-end encrypted communications channel and > transmitting it to a third party defeats the intent of the > encryption. Hopefully this iatrogenic security flaw can be > fixed in future releases of these other browsers. The paper claims that these systems represent a security "flaw"; I view them somewhat differently. In particular, I'm not aware of any normative specification that they violate, and it's not clear that what they're doing represents bad practice; there's nothing in Web architecture that requires a centralized implementation of a user-agent. The browsers mentioned above are choosing to delegate an aspect of their security logic to an off-site service -- seems to me that's fine if it meets their needs. In the future, other user agents may do different things in this spirit. I don't think you can stamp them out "by cases"; you have to admit that the Web architecture does not bound, very much, what user agents do with the links that they find in pages retrieved from the Web. Furthermore, we can't even bound the cases of interest to links retrieved using HTTP or HTTPs; URIs are normally dereferenced using HTTP or similar Web protocols, but they are also passed around in many other ways, including in filesystems, and in emails. Ironically, the email to which I am replying has triggered an example of just this exposure: > The TAG understands that unguessable URLs are used for access- > control by many of the most popular sites on the Web. For > example, this email contains a Google Docs URL [1] for a > document I have chosen to make readable by all readers of this > mailing list, even those who have never used Google Docs. This says that the intention is to give access "to readers of [the www-tag@w3.org] mailing list" , but it so happens that the owners of the www-tag mailing list have chosen to archive it publicly. Not surprisingly, search engines can find the email there, and so it turns out that the Google Doc [1] is available not just to "all readers of this mailing list", but to everyone on the Web. To see this, try a Bing search for "web-key google docs unguessable" [5] (note, there's no reference to the mailing list in the query)]. You'll see that the email comes up, and of course it has the link to the Google Doc. Indeed, the email with the capability is available directly from the Bing cache [6], without going anywhere near W3C servers. Of course, by emailing the web-key to me and to others, you have also (atypically of more rigorous capability-based systems) delegated to us the ability to pass on the capability. All I would have to do is cc: some other list in this email, post the link on my Web blog, etc. Again, all of this is in conformance with applicable RFCs; there's nothing that should be "fixed", and it's not clear that these things can be fixed at this late date. Are web-key style capabilities a bad idea? Several commentators have pointed out, correctly, that web-keys or similar techniques have been deployed on the Web, and the Google Doc example illustrates this. So, someone thinks they're a good idea and is getting value out of them. True, but we've also just shown that the semantics are, in some ways, fragile. I presume that the engineers at Google and similar sites are aware of these limitations, and they find that web-keys provide some added protection anyway. That's fine, but I don't think we can then claim that user agents, anti-phishing schemes, etc. are "broken" just because they are inconvenient for the web-key security model. I'll respond separately with more specifics regarding the Metadata In URI finding, but roughly what I'd suggest is: * Stick with the suggestion that one "should not" put secrets into URIs. I think that's good advice, and the RFC's quoted above support it. * We could/should change the finding to indicate that, although the Web does not in general guarantee the confidentiality or careful management of link URIs, there are often circumstances in which the practical risk of leakage may be sufficiently low that encoding capabilites in a URI may in fact be a useful tradeoff. Such implementations are to that degree acceptable, but the burden is on the designers to deal with associated risks; it is not anticipated that a large scale effort will be made to manage the distribution of URIs more tightly than is the case today. Use of fragment identifiers The web-key paper suggests: "Putting the unguessable permission key in the fragment segment produces an https URL that looks like: <https://www.example.com/app/#mhbqcmmva5ja3 >." The normative specification of fragment identifiers in RFC 3986 [1] says: "The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations." So far, so good, I think. "The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced." So, now we have to look at the media-type of the retrieved representations: "For some set of resources, all issued web-keys use the same path and differ only in the fragment. The representation served for the corresponding Request-URI is a skeleton HTML page specifying an onload event handler. When invoked, the onload handler extracts the key from the document.location provided by the DOM API. The handler then constructs a new https URL that includes the key as a query string argument. This new URL is made the target of a GET request sent using the XMLHttpRequest API. The response to this request is a representation of the referenced resource. " The media type registration for text/html is at [8]. The pertinent section says: " For documents labeled as text/html, the fragment identifier designates the correspondingly named element; any element may be named with the "id" attribute, and A, APPLET, FRAME, IFRAME, IMG and MAP elements may be named with a "name" attribute. This is described in detail in [HTML40] section 12." I think it's fair to say that web-key's use of fragment identifiers to designate external documents is therefore in violation of the pertinent normative RFCs, so that's a concern. I understand that Ajax applications in general are putting some stress on the architecture of fragment ids, and web-keys are somewhat in that spirit. Indeed, the TAG has done one round of work to explore client-side use of fragment ids in AJAX applications; maybe we should expand that to consider server-side innovations like webkeys as well. Nonetheless, I believe that the use of fragids in webkeys is, at least for now, nonconforming to the pertinent RFCs, and in that way makes the Web less self-describing. Another drawback of web-key fragids is that one loses the ability to use fragment-ids for their intended purpose in HTML, I.e. to directly reference some element within the document. The Dynamically constructed DOM The suggested webkey implementation, in which Javascript dynamically updates the DOM of the root document to reflect the contents of the dynamically retrieved particular document also represents complexity that I find somewhat unfortunate. That is, the HTML document retrieved by the "URL that includes the key as a query string argument" is not treated according to the usual rules of text/html, but rather is grafted into an existing DOM. I can't point to specific breakage that results from this, but it's at least a bit troubling that this document is not processed in the usual manner. I can't quite decide whether I think this is a serious concern. Conclusions I've tried to be somewhat careful and specific in setting out these concerns. I hope that won't be viewed as inflammatory or piling on. As I said at the top of my note, web-keys do address a real need. The widely used mechanisms on the Web do have problems. My personal bottom line for the moment is: it's inappropriate to assume that clients, email systems, and the like will in general limit the distribution of or protect the storage of URIs. The normative RFCs don't require that they do (Tyler does point to one admonition in RFC 2616, but it covers a quite narrow case). So, those using URIs as capabilities should be responsible for the risk that those URIs will wind up in unintended places. I think the advice in the Metadata finding that one "should not" put secrets in URIs appropriately reflects these risks, and the advice in the normative RFCs. I would have no objection to keeping the existing good practice note, but adding a section indicating that in practice some systems do get value out of putting secrets into URIs, but that the burden is on those systems to do so only when the risks are deemed acceptable for the purpose. I also remain somewhat troubled by the fact that the web-key trick of using the fragment id seems to be in violation of the pertinent specifications and loss of use of the fragid for its intended purpose; I'm also at least a bit concerned about the dynamic construction of the DOM. I hope these comments are useful. Again, I'm speaking for myself; I don't believe that the TAG as a whole has taken a position on web-keys. Noah [1] http://waterken.sf.net/web-key [2] http://lists.w3.org/Archives/Public/www-tag/2010Jan/0100.html [3] ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/945/CS-TR-83-945.pdf [4] http://tools.ietf.org/html/rfc2818 [5] http://www.bing.com/search?q=web-key+google+docs+unguessable&form=OSDSRC [6] http://cc.bingj.com/cache.aspx?q=%22web+key%22+google+docs+unguessable&d=346695414695&mkt=en-US&setlang=en-US&w=3723de06,f1f4d797 [7] http://tools.ietf.org/html/rfc3986#section-3.5 [8] http://www.rfc-editor.org/rfc/rfc2854.txt -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- Tyler Close <tyler.close@gmail.com> Sent by: www-tag-request@w3.org 01/23/2010 05:24 AM To: noah_mendelsohn@us.ibm.com cc: www-tag@w3.org Subject: Re: Draft minutes of TAG teleconference of 21 January 2010 I understand that sometimes meaning is lost in email and especially in meeting transcripts, so I just want to check that I understand the current status of the discussion on ACTION-278. 1. The TAG does not dispute any of the arguments made in my web-key paper <http://waterken.sf.net/web-key>. 2. The TAG understands that unguessable URLs are used for access-control by many of the most popular sites on the Web. For example, this email contains a Google Docs URL [1] for a document I have chosen to make readable by all readers of this mailing list, even those who have never used Google Docs. Had I not so chosen, these readers would not have access and I could have shared access with a smaller group of people, or no one at all. 3. Some members of the TAG believe that an unguessable https URL is a "password in the clear", but that sending someone a URL and a separate password to type into the web page is not a "password in the clear". 4. The TAG is currently sticking to its finding that prohibits use of the web-key technique because Noah Mendelsohn says: "I don't like that". There are no other substantive arguments that I could attempt to refute. 5. The TAG does not dispute my argument that the current finding is self-contradictory. I'm hoping there is some significant nuance I have missed. If so, please point out which of the above statements is false and exactly why, so that I can engage with that part of the discussion. --Tyler
Received on Monday, 1 February 2010 16:01:52 UTC