Re: Concern with URI normalization text; proposed change from David G. Durand on 2004-04-07 (uri@w3.org from April 2004)

From: David G. Durand <david.durand@ingenta.com>
Date: Thu, 08 Apr 2004 05:51:23 +0900
To: uri@w3.org
Message-Id: <4.2.0.58.J.20040408055117.00ac9118@localhost>
At 1:23 PM -0700 4/6/04, Roy T. Fielding wrote:
>>For example, if a URI http:/example.org/namespace is used as an XML 
>>Namespace the similar URIs HTTP:/example.org/namespace or
>>http:/example.org:80/namespace are not names for the same XML namespace.
>
>Yes they are.  The processing algorithm does not need to consider them
>to be the same name, but XML cannot change the meaning of a URI.

I don't

Actually, this isn't necessarily so. The W3C standard defines what a 
namespace is, and since it has one property (identity, as defined under an 
explicit equivalence relation: "string-equal-to"), these do not identify 
the same namespace.

The confusion, and it is a real one, is that both URIs will return the 
_same_ resource when interpreted as URIs.

The lack of an isomorphism between namespace identity and and resource 
identity is a real source of confusion, but it doesn't mean that either the 
W3C definition of namespace identity, or the IETF definition of URI 
identity are wrong. Namespaces are identified by character strings, chosen 
according to an algorithm also used to identify web resources, and for 
namespaces notion of identity is string equality. URIs can also be compared 
as to whether they can be guaranteed to resolve to the same resource. This 
comparison produces different results.

I think it would be great if these could be harmonized, but it's proven 
very difficult.

>>Thus techniques on the comparison ladder other than byte-for-byte 
>>comparison do not work for comparing namespace URIs when used for the 
>>purpose of naming namespaces. This counterexample (and others, for 
>>example from RDF), indicates that the sentence is false.
>
>They work just fine for comparing namespaces.  What they don't work for
>is testing compliance with W3C recommendations, for which only the
>simplest form of comparison is recognized.  XML Namespaces accepts the
>presence of false negatives without any harm to the processing model.
>Additional normalization of the namespace URI will not be incorrect;
>it just won't be very useful, since the purpose of namespace names
>is to avoid name collisions, not ensure name consistency.

The problem is that it is legal (if confusing) for me to use both 
HTTP:/example.org/namespace and http:/example.org:80/namespace to represent 
entirely different namespaces, as long as I have rights to assign names for 
example.org. I could, for instance, place the year of definition of the 
namespace in the port field.

XML software is written with the assumption that string comparison is the 
only form of identity, data exists that depends on it. Programs exist that 
create namespace

That's why using non-absolute URIs in namespace definitions, while legal, 
is generally considered bad practice. Unfortunately it's not an uncommon 
practice.

I would like to see the divergence between these two semantics for URIs to 
be eliminated. But I don't know if it can.

One way might be just to create a new URI scheme: xml-namespace: which 
would have a syntax:

namespace-uri = xml-namespace '//' domain-name ( '/' path-component  ) *

Then we could keep the W3C rule, and say that URI comparison and namespace 
comparison are only guaranteed to be the same for the xml-namespace URI 
scheme, and that there is no URI normalization possible for xml-namespace URIs.

Another way might be to transition first to a w3c Namespaces II in which 
certain forms of URI would be made illegal for use in namespace 
declarations. This could include explicit ports of any sort, relative URIs, 
and anything else that a reasonable normalizer would change. Then, someday, 
if the need was felt, it would be possible for normalization of namespace 
URIs to be allowed in XML processors.

Pragmatically, I worry that different URI schemes have varying "reasonable" 
normalization rules, and so I wonder how XML processors will be able to 
dependably recognize identity for namespace names.

Just a few thoughts...

    -- David
--

David G. Durand
Director, Electronic Publishing Services
Ingenta
111R Chestnut St.
Providence, RI  02903 USA

T: +1 401-331-2014 x111
T: +1 401-935-5317 Mobile
E: david.durand@ingenta.com
Received on Wednesday, 7 April 2004 17:03:05 UTC