W3C home > Mailing lists > Public > public-iri@w3.org > February 2005

Fwd: URI canonicalization (issues identity-101, indent-100)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 01 Mar 2005 08:30:32 +0900
Message-Id: <>
To: public-iri@w3.org
Cc: Roy T.Fielding <fielding@gbiv.com>

This issue was brought up by Roy Fielding on the Atompub mailing
list. I have added it to the IRI issues page at

I have also added another issue, which is extremely minor editorial,
at http://www.w3.org/International/iri-edit#indent-100.

This is a first step for an eventual move to IETF Draft Standard.

Regards,    Martin.

 >Cc: Atom Syntax <atom-syntax@imc.org>
 >From: "Roy T. Fielding" <fielding@gbiv.com>
 >Subject: Re: URI canonicalization
 >Date: Tue, 1 Feb 2005 18:25:35 -0800
 >To: Martin Duerst <duerst@w3.org>
 >List-Archive: <http://www.imc.org/atom-syntax/mail-archive/>

 >On Jan 31, 2005, at 11:56 PM, Martin Duerst wrote:
 >> At 14:27 05/02/01, Roy T. Fielding wrote:
 >> >That would be a falsehood.  Identifiers are not subject to
 >> >"simplification" -- they are either equivalent or not.  We can
 >> >add all of the implementation requirements we like to prevent
 >> >software from detecting false negatives, but that doesn't change
 >> >the fact that equivalent identifiers always identify the same
 >> >resource.  It is the author's responsibility to use URIs
 >> >(or IRIs) that are actually different, not the responsibility
 >> >of the protocol or implementation.
 >> It's okay for some network-oriented usage to work that way.
 >> The network can fail any moment, anyway. But it's not helpful
 >> at all in some contexts that are not network-oriented. A very
 >> good case in point would be XML Namespaces. In that case,
 >> unless there is a single rule for comparison that can be used
 >> by all implementations, it would lead to some XML being
 >> declared namespace valid by one XML processor, and not
 >> namespace valid by some other XML processor.
 >Where is this myth coming from?  If a single document uses two
 >different URIs to name the same namespace (or multiple documents
 >using different URIs are merged) then an XML processor will consider
 >those qualified names as different names. That does not change their
 >validity whatsoever. It doesn't even change their data model, since
 >to be equivalent the two URIs must refer to the same namespace and
 >therefore the same data model. Validity is a property of DTDs and
 >Schema, not XML namespace processing, and neither DTDs nor Schema
 >are able to redefine the meaning of URIs.  The only purpose of the
 >comparison algorithm is to allow those technologies to take the
 >shortcut of evaluating structures based solely on the strcmp URI
 >comparison, leaving false-negatives to be resolved by the author.
 >> >I am disappointed that a MUST requirement was added to IRI in the
 >> >last draft without working group review.  This part
 >> >
 >> >   Applications using IRIs as identity tokens with no relationship to a
 >> >   protocol MUST use the Simple String Comparison (see section 5.3.1).
 >> >   All other applications MUST select one of the comparison practices
 >> >   from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
 >> >   conversion, select one of the comparison practices from the URI
 >> >   comparison ladder in [RFC3986], section 6.2)
 >> >
 >> >is completely missing the point of the ladder.
 >> It was added due to a security review initiated by the IESG.
 >> I think I wrote the actual text, but the suggestion for
 >> the MUSTs didn't come from the authors, but from the security
 >> expert.
 >*grumble* If I had a dime for every time an IETF "security expert"
 >screwed up an application protocol, I'd at least have a grande
 >caramel machiatto to numb this headache.  It is never a good idea
 >to add requirements that make additional security-checking
 >a non-compliant application.
 >> The difference to the URI spec is not a good thing. But I disagree
 >> with your explanation. Both XML Namespaces and RDF (which is
 >> based on XML Namespaces) *require* character-by-character
 >> comparison, for good reasons.
 >No, they require strcmp comparison for certain procedures related
 >to matching names.  They cannot require implementations to ignore
 >the equivalence of some URIs because that would change the meaning
 >of the statements being made, particularly for RDF.
 >> Otherwise, concepts such as
 >> conformance to XML Namespaces or equivalence of RDF statements
 >> would just hang in the air.
 >No, they don't -- the processing algorithm does not change the
 >meaning of the identifiers.  It only defines minimal conformance
 >criteria for implementations.  RDF statements do not know the
 >meaning of the URIs used to create those statements, nor do they
 >know what URIs are equivalent, but not knowing that two URIs are
 >equivalent does not change the fact that any statements made about
 >the resource of one URI must also be valid for the resource of the
 >other equivalent URI, since URIs cannot be equivalent if they
 >do not both identify the same resource.  In other words, there
 >are no closed-world theories of equivalence that override the
 >universality of URIs.  The same applies to IRIs.
 >> Security protocols are of course
 >> another area of application where consistent behavior is
 >> extremely important. "Sometimes it may match, sometimes not"
 >> or "if you know all the schemes and protocols involved, the
 >> server used on the other side, and the intent of the creator/
 >> maintainer of the resource for what to do about it in the future,
 >> you'll get consistent results" are just sometimes not good enough.
 >Security protocols make comparisons based on what is being secured,
 >not based on some abstract theory.  They are capable of defining
 >that for themselves, consistently, and with respect to the resources
 >being secured rather than one identifier used to access those
 >resources.  The IRI spec doesn't know enough about an application's
 >needs to declare one form of comparison to be better than others.
Received on Monday, 28 February 2005 23:33:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:53 GMT