Re: [precis] Canonicalization of IRIs in security contexts

From: Martin J. Dürst <duerst@it.aoyama.ac.jp> · Date: Thu, 04 Nov 2010 20:12:00 +0900

Hello Dave,

I'm re-forwarding your mail to the IRI WG list, just in case somebody 
has some comments.

Regards,   Martin.

On 2010/11/04 1:41, Dave Thaler wrote:
>> -----Original Message-----
>> From: precis-bounces@ietf.org [mailto:precis-bounces@ietf.org] On Behalf Of
>> "Martin J. Dürst"
>> Sent: Monday, November 01, 2010 10:53 PM
>> To: precis@ietf.org
>> Cc: Yaron Goland
>> Subject: Re: [precis] Canonicalization of IRIs in security contexts
>>
>> Yaron sent the mail below to the IRI WG about half a year ago. My guess is that
>> nobody has yet looked at it because it's simply overwhelming.
>>
>> It just occurred to me that this mail might contain some material that is in one
>> way or another relevant to precis, so I'm forwarding it for future reference.
>>
>> Regards,   Martin.
>
> I plan to start an I-D on this topic in the near future (after Beijing), so would
> welcome any pointers or other contributions.
>
> -Dave
>>
>> On 2010/04/06 8:32, Yaron Goland wrote:
>>> Of late I've been worrying about the use of URIs/IRIs in security contexts. So I
>> wrote up a paper that explores some of the issues and have included it below. I
>> shared this paper with Ted Hardie, Larry Masinter and Dave Thaler. We were
>> mostly discussing who should actually own worrying about this problem. Ted
>> suggested that NewPrep (assuming it gets created as a WG) should own this.
>> Larry just asked that we move this discussion to the IRI mailing list as the IRI WG
>> is now worrying about security considerations. So here is the paper.
>>>
>>> Thoughts?
>>>
>>>                                   Thanks,
>>>
>>>                                                   Yaron Secure
>>> Comparison of URIs and IRIs in security token environments Current
>>> purpose of this document The purpose of this paper is to motivate that
>>> a problem exists with URI canonicalization in the context of security token
>> environments and that this problem needs to be resolved.
>>>
>>> This paper does not contain nor attempt to contain an exhaustive collection of
>> URI canonicalization issues. Rather it contains what is hoped to be a sufficiently
>> large collection of canonicalization issues to motivate the need for a solution.
>>> Problem Description
>>> This paper looks at issues related to using URIs in secure ways in security token
>> based access control systems. Examples of such systems include WS-*, SAML-P
>> and OAuth WRAP. In such systems a variety of participants in the security
>> infrastructure are identified by URIs. For example, requesters of security tokens
>> are sometimes identified with URIs. The issuers of security tokens and the
>> relying parties who are intended to consume security tokens are frequently
>> identified by URIs. Claims in security tokens often have their types defined using
>> URIs and the values of the claims can also be URIs.
>>>
>>> The most common operation on URIs in a security token context is a straight
>> forward comparison. For example, a relying party is consuming a security token.
>> The relying party will want to look up the name of the issuer of the security
>> token, which can be a URI, in their local database and find the keying material
>> associated with that issuer. The relying party will then use the keying material to
>> validate that the security token is valid. This pattern requires a simple
>> comparison of the submitted URIs with recorded URIs.
>>>
>>> As outlined in the rest of this document there are a number of decisions that a
>> canonicalizer can make when canonicalizing URIs for comparison purposes. For
>> example, some URI canonicalizers will strip out fragments so that
>> http://example.com/foo#1234 and http://example.com/foo will be treated as
>> equal. Similar treatment is also provided for userinfo, e.g.
>> http://joe:password@example.com/foo will be treated the same as
>> http://example.com/foo. And all of this is before even beginning to think
>> through Unicode issues such as how to deal with case insensitive environments.
>>>
>>> The reason these inconsistencies matter is that they open up potential security
>> holes. For example, the Foo corporation has paid money to the example.com
>> corporation for access to the stuff service. The Foo corporation allows its
>> employees to create accounts on the stuff service. So that user Joe could get
>> the account http://example.com/stuff/FooCorp/joe and the user Jane could get
>> http://example.com/stuff/FooCorp/Jane. It turns out, however, that Foo Corp's
>> canonicalizer honors fragments for comparison purposes. So Jack, who is a
>> malicious employee of Foo Corp, asks to create an account at example.com
>> with the name joe#stuff. Foo Corp's URI logic checks its records for accounts it
>> has created with stuff and sees that there is no account with the name joe#stuff
>> so, in its records, it associates the account joe#stuff with Jack and will only issue
>> tokens good for use with http://example.com/stuff/FooCorp/joe#stuff to Jack.
>>>
>>> Jack, the attacker, goes to the security token service at Foo Corp and asks for
>> a security token good for http://example.com/stuff/FooCorp/joe#stuff.
>> FooCorp is happy to issue the token since Jack is the legitimate owner (in Foo
>> Corp's eyes) of the joe#stuff account. Jack then submits the security token in a
>> request to http://example.com/stuff/FooCorp/joe.
>>>
>>> But example.com uses a URI canonicalizer, that for the purposes of checking
>> equality, ignores fragments. So when example.com looks in the security token
>> to see if the requester has permission from Foo Corp to access the given
>> account it successfully matches the URI in the security token,
>> http://example.com/stuff/FooCorp/joe#stuff with the request-URI
>> http://example.com/stuff/FooCorp/joe.
>>>
>>> Leveraging the inconsistencies in the canonicalizers used by Foo Corp and
>> example.com, Jack is able to successfully launch an elevation of privilege attack.
>>> What's up with the colors and the weird SCUXXX identifiers?
>>> I track requirements using unique identifiers. So each requirement gets an
>> identifier of the form SCUXXX where XXX are three alphabetic letters. There is no
>> meaning to each identifier. I just generate them as I need them. I use a
>> dedicated style for the requirements both to highlight them and also to make it
>> easy to generate a table of them automatically at the end of the doc.
>>> Relative URIs
>>> Is it possible to have meaningful URI comparisons involving relative URIs or do
>> we require that all URIs are fully qualified before being submitted to the
>> canonicalization algorithm?
>>>
>>>
>>> SCUAAA - A secure URI canonicalization profile MUST define if it allows
>> relative URIs.
>>>
>>> Hostname or URI resolution
>>> Some systems (specifically Java) used to follow the rule that if two host names
>> resolved to the same IP then the host names were considered equal. But with
>> the introduction of virtual hosting and dynamic IP addresses this method of
>> comparison cannot be relied upon.
>>>
>>> In addition a comparison mechanism which relies on the ability to resolve
>> identifiers like host names to other identifies like IP addresses inherently leaks
>> information about security decisions to outsiders since these kind of queries are
>> often publicly viewable (e.g. someone could track DNS traffic and from that
>> determine who an entity was likely getting security tokens from or being asked
>> to generate security tokens to). So are there security issues in requiring name
>> resolution as part of the canonicalization algorithm?
>>>
>>> And, if a canonicalization algorithm does require some kind of network access
>> to work, how does it function in network restricted or offline contexts?
>>>
>>>
>>> SCUAAB - A secure URI canonicalization profile MUST define if it requires
>> network access in order to canonicalize a URI.
>>>
>>>
>>> SCUAAS - A secure URI canonicalization profile MUST define it compares host
>> name values to host name values or if it requires the host name to first be
>> resolved to an IP address or some other underlying identifier as part of the
>> canonicalization process.
>>>
>>> Fragment components
>>> Some URI formats include fragment identifiers. These are typically handles to
>> locations within a resource and are used for local reference. A classic example is
>> the use of fragments in HTTP URLs where a URL of the form
>> http://foo.com/blah.html#ick means "retrieve the resource
>> http://foo.com/blah.html and once it has arrived locally find the HTML anchor
>> named "Ick" and display that.
>>>
>>> So, for example, when a user clicks on the link http://foo.com/blah.html#baz a
>> browser will check its cache by doing a URI comparison for
>> http://foo.com/blah.html and if the resource is present in the cache a match is
>> declared.
>>>
>>>
>>> SCUAAC - A secure URI canonicalization profile MUST define how URI
>> fragments are to be treated as part of the canonicalization process.
>>>
>>> Query components
>>> Similar to fragments, there is the question of are http://foo.com/blah and
>> http://foo.com/blah? equal or different?
>>>
>>>
>>> SCUAAR - A secure URI canonicalization profile MUST define how query
>> components of URIs are to be treated as part of the canonicalization process.
>>>
>>> But what about the values in a query component? Should
>> http://foo.com/blah?ick=bick&foo=bar be considered equal to
>> http://foo.com/blah?foo=bar&ick=bick?
>>>
>>>
>>> SCUAAY - A secure URI canonicalization profile MUST define if it will allow for
>> the re-ordering of query argument values and if so, how.
>>>
>>> URI Scheme names
>>> RFC 3986 defines URI schemes as being case insensitive and in section 6.2.2.1
>> specifies that scheme names should be normalized to lower case characters. But
>> separately it specifies that percent-encoded characters should be normalized to
>> upper case characters. Do we want this inconsistency?
>>>
>>>
>>> SCUAAF - A secure URI canonicalization profile MUST define how URI
>>> scheme names are to be normalized (e.g. to upper or lower case?)
>>>
>>> Host names
>>>
>>> SCUAAM - A secure URI canonicalization profile MUST define how URI
>>> host names are to be normalized (e.g. to upper or lower case
>>> characters?)
>>>
>>> Userinfo
>>> RFC 3986 defines the userinfo production that allows arbitrary data about the
>> user of the URI to be placed before @ signs in URIs. For example:
>> http://joe:jane:jack:yo@example.com/bar has the value "joe:jane:jack:yo" as its
>> userinfo. When canonicalizing a URI in a security context should be the userinfo
>> be left in? Some URI comparison services for example treat
>> http://joe:ick@example.com and http://example.com as being equal.
>>>
>>>
>>> SCUABD - A secure URI canonicalization profile MUST specify what is to
>> happen to any userinfo included in a URI during the canonicalization process.
>>>
>>> IPv6 Host Names
>>> IPv6 names have a wide variety of alternate but semantically identical
>> syntaxes.
>>>
>>>
>>> SCUAAK - A secure URI canonicalization profile MUST define how IPv6
>> addresses are canonicalized to a standard format.
>>>
>>> IPv4 Host Names
>>> The BNF for URIs is ambiguous when it comes to distinguishing IPv4 addresses
>> from registered names. RFC 3986 tries to resolve this ambiguity by arguing that
>> when processing a host name if it matches the IPv4 production IPv4address then
>> it is an IPv4 address otherwise it is a reg-name. But this solution seems on its
>> face unsatisfying as it is likely to be confusing to normal users. Can we really
>> expect a normal user when dealing with a security context to fully grasp that
>> 12.12.12.12 will be treated as an IPv4 address and not as a DNS host name?
>> Maybe IPv4 addresses should just be banned from canonicalization because of
>> the confusion they can cause? Or perhaps domain names that look like IPv4
>> addresses should be banned? This is similar in spirit to the homograph problem in
>> Unicode.
>>>
>>>
>>> SCUABD - A secure URI canonicalization profile MUST specify how it handles
>> IPv4 addresses and the ambiguities of IPv4 versus reg-names.
>>>
>>> DNS versus non-DNS names
>>> RFC 3986 explicitly allows for the idea that host names might not be DNS
>> names (or IP addresses). But no mechanism is provided to explicitly indicate
>> when a host name is not a DNS name. This can lead to potential security issues if
>> the sender of a URI thinks they are referring to a non-DNS name while the
>> receiver of the URI believes that the host name is a DNS Name.
>>>
>>>
>>> SCUAAT - A secure URI canonicalization profile MUST define if non-DNS/IP
>> names are allowed as host names.
>>>
>>> Punycode versus non-ASCII Host name characters RFC 3986 in section
>>> 3.2.2 specifically allows for the use of URL encoded UTF-8 characters in the
>> host name, in addition to the use of IDNA names. This create an ambiguity for
>> canonicalization since it isn't clear if all host names that involve international
>> characters should be canonicalized to IDNA names or perhaps IDNA names and
>> host names with international characters are considered mutually exclusive?
>>>
>>>
>>> SCUAAU - A secure URI canonicalization profile MUST define the
>> canonicalization relationship of host names with internationalized characters
>> and IDNA names.
>>>
>>> Path Segment Normalization
>>> RFC 3986 supports the use of path segment values such as ./ or ../ for relative
>> URLs. Strictly speaking including such path segment values in a fully qualified URI
>> is syntactically illegal but RFC 3986 nevertheless defines an algorithm to remove
>> them (see section 4.1 of RFC 3986).
>>>
>>>
>>> SCUAAP - A secure URI canonicalization profile MUST define if "." Or ".."
>> characters are allowed as relative references in fully qualified URIs and if so how
>> they are to be canonicalized.
>>>
>>> Percent Encoding
>>>
>>> SCUAAY - A secure URI canonicalization profile MUST define how to
>> canonicalize percent encoded characters that are not going to be unencoded.
>>>
>>> RFC 3986 actually specifies that alphabetic characters in percent encoding
>> (which are required to be in US-ASCII) should be canonicalized to upper case,
>> which is inconsistent with how host names and scheme names are treated.
>>>
>>>
>>> SCUAAZ - A secure URI canonicalization profile MUST define if characters that
>> are percent encoded but do not require percent encoding should be decoded as
>> part of the canonicalization process.
>>>
>>> The previous, btw, assumes that we can even tell when a character didn't need
>> encoding. For example, a delimiter character like "/" often needs encoding so if
>> we see one encoded, especially in a scheme we don't explicitly support, it's
>> ambiguous if it was unnecessarily encoded. On the other hand if we see the
>> letter "a" encoded it's highly unlikely that was unnecessary. But is it guaranteed
>> that it is unnecessary? Section 2.3 of RFC 3986 defines a set of characters it
>> argues should be decoded but is that decoding required in the canonicalization
>> process?
>>>
>>>
>>> SCUABA - A secure URI canonicalization profile MUST define when, if ever, it
>> requires percent encoded characters to be decoded.
>>>
>>> Unicode
>>>
>>> SCUABF - I need a stiff drink before I even begin to think about this section.
>> But http://unicode.org/reports/tr36/ makes for some motivational reading. Or
>> for those with a more visual bent -
>> http://www.casabasecurity.com/files/Chris_Weber_Character%20Transformati
>> ons%20v1.7_IUC33.pdf.
>>>
>>> Transcription
>>> One of the key goals of the URI design was to enable human transcription of
>> URIs. But is this a goal for canonicalization in a secure context? Should secure
>> canonicalization just worry about having an easy to generate machine readable
>> format or is there a requirement that the output of the canonicalization be
>> transcribable?
>>>
>>>
>>> SCUABD - A secure URI canonicalization profile MUST define if transcription of
>> the canonicalized URIs it produces is a goal.
>>>
>>> Handling unrecognized schemes
>>> Is it ever safe for a canonicalizer to canonicalize an unrecognized URI/IRI
>> scheme type? For example, a new URI scheme type IPPY might have a default
>> port of X. Therefore IPPY://foo.com:X and IPPY://foo.com should be treated as
>> equivalent since X is the default port for the IPPY scheme. But a canonicalizer
>> that doesn't know the IPPY scheme also will not know its default port and so
>> cannot safely canonicalize a URI with an unrecognized scheme. Similar issues
>> apply when dealing with default hosts. A canonicalizer dealing with a file URL
>> that didn't know that localhost is a reserved host value and equivalent to an
>> empty host couldn't canoncalize in a reasonable way.
>>>
>>>
>>> SCUABC - A secure URI canonicalization profile MUST specify if the
>> canonicalizer is allowed to canonicalize unrecognized URI schemes and if so,
>> how.
>>>
>>> Handling unrecognized IP address types RFC 3986 introduces an
>>> extension point to enable future changes to the IP address format using the
>> IPvFuture production. But can a canonicalizer safely deal with an IP syntax it
>> doesn't explicitly recognize? The example of IPv6 which has many forms with the
>> same semantic content is instructive as a canonicalizer that encountered an IPv6
>> address but didn't recognize such addresses could not perform necessary
>> canonicalization.
>>>
>>>
>>> SCUABE - A secure URI canonicalization profile MUST specify if the
>> canonicalizer is allowed to canonicalize unrecognized IP address formats and if
>> so, how.
>>>
>>> Handling syntactically illegal URIs
>>> What happens if a URI that is submitted for canonicalization is syntactically
>> illegal? Do we try to canonicalize around the errors or just reject the URI all
>> together? This all assumes that the canonicalization profile even requires
>> detecting if the URI is syntactically legal in the first place.
>>>
>>>
>>> SCUABD - A secure URI canonicalization profile MUST specify how it handles
>> URIs that are syntactically illegal.
>>>
>>> Which canonicalization profile is being used?
>>> Can we really have a single canonicalization profile or do we need multiple
>> ones? At a minimum I would imagine that we would have one profile for
>> environments that treat URIs in a case sensitive manner and another for URIs in
>> a case insensitive manner.
>>>
>>>
>>> SCUABH - A secure URI canonicalization profile MUST specify how many
>> different canonicalization profiles it supports.
>>>
>>> And if there is more than one canonicalization profile doesn't this place
>> requirements on security token formats and protocols that use the
>> canonicalization mechanism to explicitly define which profile they expect will be
>> used with a particular URI?
>>>
>>>
>>> SCUABI - A secure URI canonicalization profile MUST specify what
>> requirements, if any, it places on formats or protocols that leverage the profile.
>>>
>>> Proposed Requirements
>>> This is where the actual URI canonicalization profile(s) would go.
>>> Q&A
>>> This is where we would answer questions about the tradeoffs and design
>> choices about the canonicalization profile(s).
>>> Appendix
>>> General Requirements
>>>
>>> SCUAAA - A secure URI canonicalization profile MUST define if it allows
>> relative URIs.
>>>
>>> SCUAAB - A secure URI canonicalization profile MUST define if it requires
>> network access in order to canonicalize a URI.
>>>
>>> SCUAAS - A secure URI canonicalization profile MUST define it compares host
>> name values to host name values or if it requires the host name to first be
>> resolved to an IP address or some other underlying identifier as part of the
>> canonicalization process.
>>>
>>> SCUAAC - A secure URI canonicalization profile MUST define how URI
>> fragments are to be treated as part of the canonicalization process.
>>>
>>> SCUAAR - A secure URI canonicalization profile MUST define how query
>> components of URIs are to be treated as part of the canonicalization process.
>>>
>>> SCUAAY - A secure URI canonicalization profile MUST define if it will allow for
>> the re-ordering of query argument values and if so, how.
>>>
>>> SCUAAF - A secure URI canonicalization profile MUST define how URI
>>> scheme names are to be normalized (e.g. to upper or lower case?)
>>>
>>> SCUAAM - A secure URI canonicalization profile MUST define how URI
>>> host names are to be normalized (e.g. to upper or lower case
>>> characters?)
>>>
>>> SCUABD - A secure URI canonicalization profile MUST specify what is to
>> happen to any userinfo included in a URI during the canonicalization process.
>>>
>>> SCUAAK - A secure URI canonicalization profile MUST define how IPv6
>> addresses are canonicalized to a standard format.
>>>
>>> SCUABD - A secure URI canonicalization profile MUST specify how it handles
>> IPv4 addresses and the ambiguities of IPv4 versus reg-names.
>>>
>>> SCUAAT - A secure URI canonicalization profile MUST define if non-DNS/IP
>> names are allowed as host names.
>>>
>>> SCUAAU - A secure URI canonicalization profile MUST define the
>> canonicalization relationship of host names with internationalized characters
>> and IDNA names.
>>>
>>> SCUAAP - A secure URI canonicalization profile MUST define if "." Or ".."
>> characters are allowed as relative references in fully qualified URIs and if so how
>> they are to be canonicalized.
>>>
>>> SCUAAY - A secure URI canonicalization profile MUST define how to
>> canonicalize percent encoded characters that are not going to be unencoded.
>>>
>>> SCUAAZ - A secure URI canonicalization profile MUST define if characters that
>> are percent encoded but do not require percent encoding should be decoded as
>> part of the canonicalization process.
>>>
>>> SCUABA - A secure URI canonicalization profile MUST define when, if ever, it
>> requires percent encoded characters to be decoded.
>>>
>>> SCUABD - A secure URI canonicalization profile MUST define if transcription of
>> the canonicalized URIs it produces is a goal.
>>>
>>> SCUABC - A secure URI canonicalization profile MUST specify if the
>> canonicalizer is allowed to canonicalize unrecognized URI schemes and if so,
>> how.
>>>
>>> SCUABE - A secure URI canonicalization profile MUST specify if the
>> canonicalizer is allowed to canonicalize unrecognized IP address formats and if
>> so, how.
>>>
>>> SCUABD - A secure URI canonicalization profile MUST specify how it handles
>> URIs that are syntactically illegal.
>>>
>>> SCUABH - A secure URI canonicalization profile MUST specify how many
>> different canonicalization profiles it supports.
>>>
>>> SCUABI - A secure URI canonicalization profile MUST specify what
>> requirements, if any, it places on formats or protocols that leverage the profile.
>>>
>>> Implementation Requirements
>>> No table of contents entries found.
>>> Open Issues
>>>
>>> SCUABF - I need a stiff drink before I even begin to think about this section.
>> But http://unicode.org/reports/tr36/ makes for some motivational reading. Or
>> for those with a more visual bent -
>> http://www.casabasecurity.com/files/Chris_Weber_Character%20Transformati
>> ons%20v1.7_IUC33.pdf.
>>>
>>> Last Used ID
>>> SCUABI
>>>
>>>
>>>
>>
>> --
>> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
>> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
>> _______________________________________________
>> precis mailing list
>> precis@ietf.org
>> https://www.ietf.org/mailman/listinfo/precis
>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp