- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Thu, 04 Nov 2010 20:12:00 +0900
- To: Dave Thaler <dthaler@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>
- CC: Yaron Goland <yarong@microsoft.com>
Hello Dave, I'm re-forwarding your mail to the IRI WG list, just in case somebody has some comments. Regards, Martin. On 2010/11/04 1:41, Dave Thaler wrote: >> -----Original Message----- >> From: precis-bounces@ietf.org [mailto:precis-bounces@ietf.org] On Behalf Of >> "Martin J. Dürst" >> Sent: Monday, November 01, 2010 10:53 PM >> To: precis@ietf.org >> Cc: Yaron Goland >> Subject: Re: [precis] Canonicalization of IRIs in security contexts >> >> Yaron sent the mail below to the IRI WG about half a year ago. My guess is that >> nobody has yet looked at it because it's simply overwhelming. >> >> It just occurred to me that this mail might contain some material that is in one >> way or another relevant to precis, so I'm forwarding it for future reference. >> >> Regards, Martin. > > I plan to start an I-D on this topic in the near future (after Beijing), so would > welcome any pointers or other contributions. > > -Dave >> >> On 2010/04/06 8:32, Yaron Goland wrote: >>> Of late I've been worrying about the use of URIs/IRIs in security contexts. So I >> wrote up a paper that explores some of the issues and have included it below. I >> shared this paper with Ted Hardie, Larry Masinter and Dave Thaler. We were >> mostly discussing who should actually own worrying about this problem. Ted >> suggested that NewPrep (assuming it gets created as a WG) should own this. >> Larry just asked that we move this discussion to the IRI mailing list as the IRI WG >> is now worrying about security considerations. So here is the paper. >>> >>> Thoughts? >>> >>> Thanks, >>> >>> Yaron Secure >>> Comparison of URIs and IRIs in security token environments Current >>> purpose of this document The purpose of this paper is to motivate that >>> a problem exists with URI canonicalization in the context of security token >> environments and that this problem needs to be resolved. >>> >>> This paper does not contain nor attempt to contain an exhaustive collection of >> URI canonicalization issues. Rather it contains what is hoped to be a sufficiently >> large collection of canonicalization issues to motivate the need for a solution. >>> Problem Description >>> This paper looks at issues related to using URIs in secure ways in security token >> based access control systems. Examples of such systems include WS-*, SAML-P >> and OAuth WRAP. In such systems a variety of participants in the security >> infrastructure are identified by URIs. For example, requesters of security tokens >> are sometimes identified with URIs. The issuers of security tokens and the >> relying parties who are intended to consume security tokens are frequently >> identified by URIs. Claims in security tokens often have their types defined using >> URIs and the values of the claims can also be URIs. >>> >>> The most common operation on URIs in a security token context is a straight >> forward comparison. For example, a relying party is consuming a security token. >> The relying party will want to look up the name of the issuer of the security >> token, which can be a URI, in their local database and find the keying material >> associated with that issuer. The relying party will then use the keying material to >> validate that the security token is valid. This pattern requires a simple >> comparison of the submitted URIs with recorded URIs. >>> >>> As outlined in the rest of this document there are a number of decisions that a >> canonicalizer can make when canonicalizing URIs for comparison purposes. For >> example, some URI canonicalizers will strip out fragments so that >> http://example.com/foo#1234 and http://example.com/foo will be treated as >> equal. Similar treatment is also provided for userinfo, e.g. >> http://joe:password@example.com/foo will be treated the same as >> http://example.com/foo. And all of this is before even beginning to think >> through Unicode issues such as how to deal with case insensitive environments. >>> >>> The reason these inconsistencies matter is that they open up potential security >> holes. For example, the Foo corporation has paid money to the example.com >> corporation for access to the stuff service. The Foo corporation allows its >> employees to create accounts on the stuff service. So that user Joe could get >> the account http://example.com/stuff/FooCorp/joe and the user Jane could get >> http://example.com/stuff/FooCorp/Jane. It turns out, however, that Foo Corp's >> canonicalizer honors fragments for comparison purposes. So Jack, who is a >> malicious employee of Foo Corp, asks to create an account at example.com >> with the name joe#stuff. Foo Corp's URI logic checks its records for accounts it >> has created with stuff and sees that there is no account with the name joe#stuff >> so, in its records, it associates the account joe#stuff with Jack and will only issue >> tokens good for use with http://example.com/stuff/FooCorp/joe#stuff to Jack. >>> >>> Jack, the attacker, goes to the security token service at Foo Corp and asks for >> a security token good for http://example.com/stuff/FooCorp/joe#stuff. >> FooCorp is happy to issue the token since Jack is the legitimate owner (in Foo >> Corp's eyes) of the joe#stuff account. Jack then submits the security token in a >> request to http://example.com/stuff/FooCorp/joe. >>> >>> But example.com uses a URI canonicalizer, that for the purposes of checking >> equality, ignores fragments. So when example.com looks in the security token >> to see if the requester has permission from Foo Corp to access the given >> account it successfully matches the URI in the security token, >> http://example.com/stuff/FooCorp/joe#stuff with the request-URI >> http://example.com/stuff/FooCorp/joe. >>> >>> Leveraging the inconsistencies in the canonicalizers used by Foo Corp and >> example.com, Jack is able to successfully launch an elevation of privilege attack. >>> What's up with the colors and the weird SCUXXX identifiers? >>> I track requirements using unique identifiers. So each requirement gets an >> identifier of the form SCUXXX where XXX are three alphabetic letters. There is no >> meaning to each identifier. I just generate them as I need them. I use a >> dedicated style for the requirements both to highlight them and also to make it >> easy to generate a table of them automatically at the end of the doc. >>> Relative URIs >>> Is it possible to have meaningful URI comparisons involving relative URIs or do >> we require that all URIs are fully qualified before being submitted to the >> canonicalization algorithm? >>> >>> >>> SCUAAA - A secure URI canonicalization profile MUST define if it allows >> relative URIs. >>> >>> Hostname or URI resolution >>> Some systems (specifically Java) used to follow the rule that if two host names >> resolved to the same IP then the host names were considered equal. But with >> the introduction of virtual hosting and dynamic IP addresses this method of >> comparison cannot be relied upon. >>> >>> In addition a comparison mechanism which relies on the ability to resolve >> identifiers like host names to other identifies like IP addresses inherently leaks >> information about security decisions to outsiders since these kind of queries are >> often publicly viewable (e.g. someone could track DNS traffic and from that >> determine who an entity was likely getting security tokens from or being asked >> to generate security tokens to). So are there security issues in requiring name >> resolution as part of the canonicalization algorithm? >>> >>> And, if a canonicalization algorithm does require some kind of network access >> to work, how does it function in network restricted or offline contexts? >>> >>> >>> SCUAAB - A secure URI canonicalization profile MUST define if it requires >> network access in order to canonicalize a URI. >>> >>> >>> SCUAAS - A secure URI canonicalization profile MUST define it compares host >> name values to host name values or if it requires the host name to first be >> resolved to an IP address or some other underlying identifier as part of the >> canonicalization process. >>> >>> Fragment components >>> Some URI formats include fragment identifiers. These are typically handles to >> locations within a resource and are used for local reference. A classic example is >> the use of fragments in HTTP URLs where a URL of the form >> http://foo.com/blah.html#ick means "retrieve the resource >> http://foo.com/blah.html and once it has arrived locally find the HTML anchor >> named "Ick" and display that. >>> >>> So, for example, when a user clicks on the link http://foo.com/blah.html#baz a >> browser will check its cache by doing a URI comparison for >> http://foo.com/blah.html and if the resource is present in the cache a match is >> declared. >>> >>> >>> SCUAAC - A secure URI canonicalization profile MUST define how URI >> fragments are to be treated as part of the canonicalization process. >>> >>> Query components >>> Similar to fragments, there is the question of are http://foo.com/blah and >> http://foo.com/blah? equal or different? >>> >>> >>> SCUAAR - A secure URI canonicalization profile MUST define how query >> components of URIs are to be treated as part of the canonicalization process. >>> >>> But what about the values in a query component? Should >> http://foo.com/blah?ick=bick&foo=bar be considered equal to >> http://foo.com/blah?foo=bar&ick=bick? >>> >>> >>> SCUAAY - A secure URI canonicalization profile MUST define if it will allow for >> the re-ordering of query argument values and if so, how. >>> >>> URI Scheme names >>> RFC 3986 defines URI schemes as being case insensitive and in section 6.2.2.1 >> specifies that scheme names should be normalized to lower case characters. But >> separately it specifies that percent-encoded characters should be normalized to >> upper case characters. Do we want this inconsistency? >>> >>> >>> SCUAAF - A secure URI canonicalization profile MUST define how URI >>> scheme names are to be normalized (e.g. to upper or lower case?) >>> >>> Host names >>> >>> SCUAAM - A secure URI canonicalization profile MUST define how URI >>> host names are to be normalized (e.g. to upper or lower case >>> characters?) >>> >>> Userinfo >>> RFC 3986 defines the userinfo production that allows arbitrary data about the >> user of the URI to be placed before @ signs in URIs. For example: >> http://joe:jane:jack:yo@example.com/bar has the value "joe:jane:jack:yo" as its >> userinfo. When canonicalizing a URI in a security context should be the userinfo >> be left in? Some URI comparison services for example treat >> http://joe:ick@example.com and http://example.com as being equal. >>> >>> >>> SCUABD - A secure URI canonicalization profile MUST specify what is to >> happen to any userinfo included in a URI during the canonicalization process. >>> >>> IPv6 Host Names >>> IPv6 names have a wide variety of alternate but semantically identical >> syntaxes. >>> >>> >>> SCUAAK - A secure URI canonicalization profile MUST define how IPv6 >> addresses are canonicalized to a standard format. >>> >>> IPv4 Host Names >>> The BNF for URIs is ambiguous when it comes to distinguishing IPv4 addresses >> from registered names. RFC 3986 tries to resolve this ambiguity by arguing that >> when processing a host name if it matches the IPv4 production IPv4address then >> it is an IPv4 address otherwise it is a reg-name. But this solution seems on its >> face unsatisfying as it is likely to be confusing to normal users. Can we really >> expect a normal user when dealing with a security context to fully grasp that >> 12.12.12.12 will be treated as an IPv4 address and not as a DNS host name? >> Maybe IPv4 addresses should just be banned from canonicalization because of >> the confusion they can cause? Or perhaps domain names that look like IPv4 >> addresses should be banned? This is similar in spirit to the homograph problem in >> Unicode. >>> >>> >>> SCUABD - A secure URI canonicalization profile MUST specify how it handles >> IPv4 addresses and the ambiguities of IPv4 versus reg-names. >>> >>> DNS versus non-DNS names >>> RFC 3986 explicitly allows for the idea that host names might not be DNS >> names (or IP addresses). But no mechanism is provided to explicitly indicate >> when a host name is not a DNS name. This can lead to potential security issues if >> the sender of a URI thinks they are referring to a non-DNS name while the >> receiver of the URI believes that the host name is a DNS Name. >>> >>> >>> SCUAAT - A secure URI canonicalization profile MUST define if non-DNS/IP >> names are allowed as host names. >>> >>> Punycode versus non-ASCII Host name characters RFC 3986 in section >>> 3.2.2 specifically allows for the use of URL encoded UTF-8 characters in the >> host name, in addition to the use of IDNA names. This create an ambiguity for >> canonicalization since it isn't clear if all host names that involve international >> characters should be canonicalized to IDNA names or perhaps IDNA names and >> host names with international characters are considered mutually exclusive? >>> >>> >>> SCUAAU - A secure URI canonicalization profile MUST define the >> canonicalization relationship of host names with internationalized characters >> and IDNA names. >>> >>> Path Segment Normalization >>> RFC 3986 supports the use of path segment values such as ./ or ../ for relative >> URLs. Strictly speaking including such path segment values in a fully qualified URI >> is syntactically illegal but RFC 3986 nevertheless defines an algorithm to remove >> them (see section 4.1 of RFC 3986). >>> >>> >>> SCUAAP - A secure URI canonicalization profile MUST define if "." Or ".." >> characters are allowed as relative references in fully qualified URIs and if so how >> they are to be canonicalized. >>> >>> Percent Encoding >>> >>> SCUAAY - A secure URI canonicalization profile MUST define how to >> canonicalize percent encoded characters that are not going to be unencoded. >>> >>> RFC 3986 actually specifies that alphabetic characters in percent encoding >> (which are required to be in US-ASCII) should be canonicalized to upper case, >> which is inconsistent with how host names and scheme names are treated. >>> >>> >>> SCUAAZ - A secure URI canonicalization profile MUST define if characters that >> are percent encoded but do not require percent encoding should be decoded as >> part of the canonicalization process. >>> >>> The previous, btw, assumes that we can even tell when a character didn't need >> encoding. For example, a delimiter character like "/" often needs encoding so if >> we see one encoded, especially in a scheme we don't explicitly support, it's >> ambiguous if it was unnecessarily encoded. On the other hand if we see the >> letter "a" encoded it's highly unlikely that was unnecessary. But is it guaranteed >> that it is unnecessary? Section 2.3 of RFC 3986 defines a set of characters it >> argues should be decoded but is that decoding required in the canonicalization >> process? >>> >>> >>> SCUABA - A secure URI canonicalization profile MUST define when, if ever, it >> requires percent encoded characters to be decoded. >>> >>> Unicode >>> >>> SCUABF - I need a stiff drink before I even begin to think about this section. >> But http://unicode.org/reports/tr36/ makes for some motivational reading. Or >> for those with a more visual bent - >> http://www.casabasecurity.com/files/Chris_Weber_Character%20Transformati >> ons%20v1.7_IUC33.pdf. >>> >>> Transcription >>> One of the key goals of the URI design was to enable human transcription of >> URIs. But is this a goal for canonicalization in a secure context? Should secure >> canonicalization just worry about having an easy to generate machine readable >> format or is there a requirement that the output of the canonicalization be >> transcribable? >>> >>> >>> SCUABD - A secure URI canonicalization profile MUST define if transcription of >> the canonicalized URIs it produces is a goal. >>> >>> Handling unrecognized schemes >>> Is it ever safe for a canonicalizer to canonicalize an unrecognized URI/IRI >> scheme type? For example, a new URI scheme type IPPY might have a default >> port of X. Therefore IPPY://foo.com:X and IPPY://foo.com should be treated as >> equivalent since X is the default port for the IPPY scheme. But a canonicalizer >> that doesn't know the IPPY scheme also will not know its default port and so >> cannot safely canonicalize a URI with an unrecognized scheme. Similar issues >> apply when dealing with default hosts. A canonicalizer dealing with a file URL >> that didn't know that localhost is a reserved host value and equivalent to an >> empty host couldn't canoncalize in a reasonable way. >>> >>> >>> SCUABC - A secure URI canonicalization profile MUST specify if the >> canonicalizer is allowed to canonicalize unrecognized URI schemes and if so, >> how. >>> >>> Handling unrecognized IP address types RFC 3986 introduces an >>> extension point to enable future changes to the IP address format using the >> IPvFuture production. But can a canonicalizer safely deal with an IP syntax it >> doesn't explicitly recognize? The example of IPv6 which has many forms with the >> same semantic content is instructive as a canonicalizer that encountered an IPv6 >> address but didn't recognize such addresses could not perform necessary >> canonicalization. >>> >>> >>> SCUABE - A secure URI canonicalization profile MUST specify if the >> canonicalizer is allowed to canonicalize unrecognized IP address formats and if >> so, how. >>> >>> Handling syntactically illegal URIs >>> What happens if a URI that is submitted for canonicalization is syntactically >> illegal? Do we try to canonicalize around the errors or just reject the URI all >> together? This all assumes that the canonicalization profile even requires >> detecting if the URI is syntactically legal in the first place. >>> >>> >>> SCUABD - A secure URI canonicalization profile MUST specify how it handles >> URIs that are syntactically illegal. >>> >>> Which canonicalization profile is being used? >>> Can we really have a single canonicalization profile or do we need multiple >> ones? At a minimum I would imagine that we would have one profile for >> environments that treat URIs in a case sensitive manner and another for URIs in >> a case insensitive manner. >>> >>> >>> SCUABH - A secure URI canonicalization profile MUST specify how many >> different canonicalization profiles it supports. >>> >>> And if there is more than one canonicalization profile doesn't this place >> requirements on security token formats and protocols that use the >> canonicalization mechanism to explicitly define which profile they expect will be >> used with a particular URI? >>> >>> >>> SCUABI - A secure URI canonicalization profile MUST specify what >> requirements, if any, it places on formats or protocols that leverage the profile. >>> >>> Proposed Requirements >>> This is where the actual URI canonicalization profile(s) would go. >>> Q&A >>> This is where we would answer questions about the tradeoffs and design >> choices about the canonicalization profile(s). >>> Appendix >>> General Requirements >>> >>> SCUAAA - A secure URI canonicalization profile MUST define if it allows >> relative URIs. >>> >>> SCUAAB - A secure URI canonicalization profile MUST define if it requires >> network access in order to canonicalize a URI. >>> >>> SCUAAS - A secure URI canonicalization profile MUST define it compares host >> name values to host name values or if it requires the host name to first be >> resolved to an IP address or some other underlying identifier as part of the >> canonicalization process. >>> >>> SCUAAC - A secure URI canonicalization profile MUST define how URI >> fragments are to be treated as part of the canonicalization process. >>> >>> SCUAAR - A secure URI canonicalization profile MUST define how query >> components of URIs are to be treated as part of the canonicalization process. >>> >>> SCUAAY - A secure URI canonicalization profile MUST define if it will allow for >> the re-ordering of query argument values and if so, how. >>> >>> SCUAAF - A secure URI canonicalization profile MUST define how URI >>> scheme names are to be normalized (e.g. to upper or lower case?) >>> >>> SCUAAM - A secure URI canonicalization profile MUST define how URI >>> host names are to be normalized (e.g. to upper or lower case >>> characters?) >>> >>> SCUABD - A secure URI canonicalization profile MUST specify what is to >> happen to any userinfo included in a URI during the canonicalization process. >>> >>> SCUAAK - A secure URI canonicalization profile MUST define how IPv6 >> addresses are canonicalized to a standard format. >>> >>> SCUABD - A secure URI canonicalization profile MUST specify how it handles >> IPv4 addresses and the ambiguities of IPv4 versus reg-names. >>> >>> SCUAAT - A secure URI canonicalization profile MUST define if non-DNS/IP >> names are allowed as host names. >>> >>> SCUAAU - A secure URI canonicalization profile MUST define the >> canonicalization relationship of host names with internationalized characters >> and IDNA names. >>> >>> SCUAAP - A secure URI canonicalization profile MUST define if "." Or ".." >> characters are allowed as relative references in fully qualified URIs and if so how >> they are to be canonicalized. >>> >>> SCUAAY - A secure URI canonicalization profile MUST define how to >> canonicalize percent encoded characters that are not going to be unencoded. >>> >>> SCUAAZ - A secure URI canonicalization profile MUST define if characters that >> are percent encoded but do not require percent encoding should be decoded as >> part of the canonicalization process. >>> >>> SCUABA - A secure URI canonicalization profile MUST define when, if ever, it >> requires percent encoded characters to be decoded. >>> >>> SCUABD - A secure URI canonicalization profile MUST define if transcription of >> the canonicalized URIs it produces is a goal. >>> >>> SCUABC - A secure URI canonicalization profile MUST specify if the >> canonicalizer is allowed to canonicalize unrecognized URI schemes and if so, >> how. >>> >>> SCUABE - A secure URI canonicalization profile MUST specify if the >> canonicalizer is allowed to canonicalize unrecognized IP address formats and if >> so, how. >>> >>> SCUABD - A secure URI canonicalization profile MUST specify how it handles >> URIs that are syntactically illegal. >>> >>> SCUABH - A secure URI canonicalization profile MUST specify how many >> different canonicalization profiles it supports. >>> >>> SCUABI - A secure URI canonicalization profile MUST specify what >> requirements, if any, it places on formats or protocols that leverage the profile. >>> >>> Implementation Requirements >>> No table of contents entries found. >>> Open Issues >>> >>> SCUABF - I need a stiff drink before I even begin to think about this section. >> But http://unicode.org/reports/tr36/ makes for some motivational reading. Or >> for those with a more visual bent - >> http://www.casabasecurity.com/files/Chris_Weber_Character%20Transformati >> ons%20v1.7_IUC33.pdf. >>> >>> Last Used ID >>> SCUABI >>> >>> >>> >> >> -- >> #-# Martin J. Dürst, Professor, Aoyama Gakuin University >> #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp >> _______________________________________________ >> precis mailing list >> precis@ietf.org >> https://www.ietf.org/mailman/listinfo/precis > > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 4 November 2010 11:12:51 UTC