FW: "canonical" URIs

The message below was posted to the TAG list and points out that we fail to
say how equiv comparisons are performed on anyURI (e.g., for checking a
literal against an enumeration).  I'd also note that we don't say anything
of this kind about a lot of types (string, etc.).  We rely on phrases like
"if the {value} is in the value space...".

I suggest we do what we can in this regard as errata and a more formal
approach to this should be added to our candiate requirements for 1.1.

pvb
> -----Original Message-----
> From:	David Orchard [SMTP:david.orchard@bea.com]
> Sent:	Tuesday, March 19, 2002 10:05 AM
> To:	www-tag@w3.org
> Subject:	RE: "canonical" URIs
> 
> TAG members,
> 
> I don't see URI comparison officially listed as a TAG issue.  I'd like
> Joseph/Stephen's issue added to the TAG issues list.
> 
> Equivalence rules for URIs are defined by the URI scheme.  HTTP has a
> section on URI comparison.
> 
> However, XML does not have a default comparison function for the XML
> Schema
> anyURI data type.  I think a reasonable approach would be to say that the
> default comparision function for anyURI is to use the HTTP URI comparison
> algorithm, but that it is overridable by any scheme.
> 
> Cheers,
> Dave
> 
> 
> > -----Original Message-----
> > From: www-tag-request@w3.org
> > [mailto:www-tag-request@w3.org]On Behalf Of
> > Joseph Reagle
> > Sent: Tuesday, February 19, 2002 11:40 AM
> > To: www-tag@w3.org
> > Cc: PhillipHallam-Baker; xme; Merlin Hughes; duerst@w3.org
> > Subject: Re: "canonical" URIs
> >
> >
> >
> > Stephen has asked an interesting question below that I expect will be
> > important  to any activity that uses URIs as identifiers in
> > the context of
> > a semantic/security application: when are two URI variants considered
> > identical?
> >
> > My first impulse was to check the XML namespace spec,
> > "[Definition:] URI
> > references which identify namespaces are considered identical
> > when they are
> > exactly the same character-for-character." [a]
> >
> > [a] http://www.w3.org/TR/REC-xml-names/
> >
> > However, this could benefit from further specificity. What about the
> > following sort of issues?
> >
> >   The URI attribute identifies a data object using a URI-Reference,
> >   as specified by RFC2396 [URI]. The set of allowed characters for
> >   URI attributes is the same as for XML, namely [Unicode]. However,
> >   some Unicode characters are disallowed from URI references
> >   including all non-ASCII characters and the excluded characters
> >   listed in RFC2396 [URI, section 2.4]. However, the number sign (#),
> >   percent sign (%), and square bracket characters re-allowed
> > in RFC 2732
> >   [URI-Literal] are permitted. Disallowed characters must be
> > escaped as
> >   follows: ...
> >   http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/#sec-URI
> >
> > I spoke to TimBL briefly about the question, he enumerated
> > many of the
> > places one might look for equivalence in the "URI stack"
> > *while* stating
> > that clearly one wouldn't want to address all these layers for the
> > complexity and processing required:
> >   URI spec
> > 	string = string
> >   HTTP DNS
> > 	W3.org = w3.org
> >   DNS LOOKUP
> > 	www.w3.org   <-- CNAME --  w3.org
> >   HTTP REDIRECT
> > 	/foo --REDIRECT--> /foo/
> >   RDF
> > 	/foo = /bar
> >
> > Consequently, character by character comparison is probably the most
> > straightforward approach -- assuming one addresses the
> > character encoding
> > issues well.
> >
> > Stephen is presently using "absolute URIs" with RFC2396
> > equivalence (see
> > below). This seems fairly straightforward as well -- though
> > it says, "if
> > the URI is case insensitive ..." I think it might be useful
> > to specify
> > whether case *is* relevant or not for that app. Any thoughts?
> >
> > Also, my broader question to the TAG is, does this seem like
> > a worthwhile
> > issue to address for all of our specifications? I also expect the
> > validation/augmentation of URIs of type anyURI in schema
> > might also be
> > relevant to this question but haven't thought about it too carefully.
> >
> > [1] On Thursday 14 February 2002 06:01, Stephen Farrell wrote:
> > > ...
> > > The OASIS security committes's [1] SAML spec [2] is about access
> > > control. One of its messages is of the form "can fred see
> > > http://foo.com/stuff" with a minimal answer being "yes/no".
> > >
> > > Now, we're trying to figure a good way to tell implementors not
> > > to fall for the following scenario:
> > >
> > > Q: "can fred see http://foo.com/stuff" A: no
> > > Q: "can fred see HTTP://Foo.COM:80/stuff" A: no
> > > Q: "can fred see http://foo.com/otherstuff/../stuff" A: yes
> > >
> > > Which involves us in giving some guidance for a "canonical
> > > form" or URI, at least for the de-referencable via HTTP
> > > URLs.
> > >
> > > My best bet so far is the following:
> > >
> > >    By the "canonical form" of a URI we mean an absolute URI (i.e. no
> > >    relative URIs) which is the shortest of all the equivalent URI
> > >    strings, where URI equivalence is defined according to [RFC2396].
> > >    For example, the URI "http://foo.com:80/go/../go/to/" is not in
> > >    canonical form, but "http://foo.com/go/to" is in canonical form.
> > >    Note that if a URI is partly or entirely case-insensitive, then
> > >    there will be more than one "canonical form" for that URI such
> > >    that a case sensitive matching rule would consider that the
> > >    strings differ (e.g. "HTTP://Foo.cOm/go/to" is "another"
> > canonical
> > >    form of the URL above).
> > >
> > >
> > > Ta,
> > > Stephen.
> > >
> > > [1] http://www.oasis-open.org/committees/security/
> > > [2]
> > >
> > http://www.oasis-open.org/committees/security/docs/draft-sstc-
> > core-25.pdf
> > > [RFC2396] ftp://ftp.isi.edu/in-notes/rfc2396.txt
> >
> > --
> >
> > Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
> > W3C Policy Analyst                mailto:reagle@w3.org
> > IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature/
> > W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/
> >
> >

Received on Tuesday, 19 March 2002 18:15:30 UTC