Re: Concern with URI normalization text; proposed change from Jeremy Carroll on 2004-04-07 (uri@w3.org from April 2004)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Wed, 07 Apr 2004 17:28:02 +0100
To: "David G. Durand" <david.durand@ingenta.com>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, Graham Klyne <gk@ninebynine.org>, uri@w3.org, Pat Hayes <phayes@ihmc.us>
Message-ID: <40742C12.7020405@hplb.hpl.hp.com>
David G. Durand wrote:

> At 1:23 PM -0700 4/6/04, Roy T. Fielding wrote:
> 
>>> For example, if a URI http:/example.org/namespace is used as an XML 
>>> Namespace the similar URIs HTTP:/example.org/namespace or
>>> http:/example.org:80/namespace are not names for the same XML namespace.
>>
>>
>> Yes they are.  The processing algorithm does not need to consider them
>> to be the same name, but XML cannot change the meaning of a URI.
> 
> 
> I don't
> 
> Actually, this isn't necessarily so. The W3C standard defines what a 
> namespace is, and since it has one property (identity, as defined under 
> an explicit equivalence relation: "string-equal-to"), these do not 
> identify the same namespace.
> 
> The confusion, and it is a real one, is that both URIs will return the 
> _same_ resource when interpreted as URIs.
> 
> The lack of an isomorphism between namespace identity and and resource 
> identity is a real source of confusion, but it doesn't mean that either 
> the W3C definition of namespace identity, or the IETF definition of URI 
> identity are wrong. Namespaces are identified by character strings, 
> chosen according to an algorithm also used to identify web resources, 
> and for namespaces notion of identity is string equality. URIs can also 
> be compared as to whether they can be guaranteed to resolve to the same 
> resource. This comparison produces different results.
> 
> I think it would be great if these could be harmonized, but it's proven 
> very difficult.
> 
>>> Thus techniques on the comparison ladder other than byte-for-byte 
>>> comparison do not work for comparing namespace URIs when used for the 
>>> purpose of naming namespaces. This counterexample (and others, for 
>>> example from RDF), indicates that the sentence is false.
>>
>>
>> They work just fine for comparing namespaces.  What they don't work for
>> is testing compliance with W3C recommendations, for which only the
>> simplest form of comparison is recognized.  XML Namespaces accepts the
>> presence of false negatives without any harm to the processing model.
>> Additional normalization of the namespace URI will not be incorrect;
>> it just won't be very useful, since the purpose of namespace names
>> is to avoid name collisions, not ensure name consistency.
> 
> 
> The problem is that it is legal (if confusing) for me to use both 
> HTTP:/example.org/namespace and http:/example.org:80/namespace to 
> represent entirely different namespaces, as long as I have rights to 
> assign names for example.org. I could, for instance, place the year of 
> definition of the namespace in the port field.
> 
> XML software is written with the assumption that string comparison is 
> the only form of identity, data exists that depends on it. Programs 
> exist that create namespace
> 
> That's why using non-absolute URIs in namespace definitions, while 
> legal, is generally considered bad practice. Unfortunately it's not an 
> uncommon practice.
> 
> I would like to see the divergence between these two semantics for URIs 
> to be eliminated. But I don't know if it can.
> 
> One way might be just to create a new URI scheme: xml-namespace: which 
> would have a syntax:
> 
> namespace-uri = xml-namespace '//' domain-name ( '/' path-component  ) *
> 
> Then we could keep the W3C rule, and say that URI comparison and 
> namespace comparison are only guaranteed to be the same for the 
> xml-namespace URI scheme, and that there is no URI normalization 
> possible for xml-namespace URIs.
> 
> Another way might be to transition first to a w3c Namespaces II in which 
> certain forms of URI would be made illegal for use in namespace 
> declarations. This could include explicit ports of any sort, relative 
> URIs, and anything else that a reasonable normalizer would change. Then, 
> someday, if the need was felt, it would be possible for normalization of 
> namespace URIs to be allowed in XML processors.
> 
> Pragmatically, I worry that different URI schemes have varying 
> "reasonable" normalization rules, and so I wonder how XML processors 
> will be able to dependably recognize identity for namespace names.
> 
> Just a few thoughts...
> 
>    -- David

I believe there is already text discouraging the use of URIs in non-normal 
form (i.e. for which this divergence matters).
The change proposed by Graham and rejected by Roy was trying to clarify 
that the comparison ladder while important for all is only strictly 
built-in when doing retrieval, and not when naming namespaces or 
considering the OWL document

<rdf:Description rdf:about="http://example.org/">
    <owl:sameAs rdf:resource="HTTP://example.org:80"/>
</rdf:Description>

(namespace decl omitted)

Jeremy
Received on Wednesday, 7 April 2004 12:33:46 UTC