W3C home > Mailing lists > Public > www-tag@w3.org > June 2002

Re: [URIEquivalence-15] Namespaces in XML -- URI, IRIs and equivalence

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 06 Jun 2002 23:18:18 +0200
To: Misha.Wolf@reuters.com
Cc: www-tag@w3.org
Message-ID: <ts5vfug392fag826trbjt3nomgjg1q9s66@4ax.com>

* Misha.Wolf@reuters.com wrote:
>AGREED: Namespace URIs should allow the same range of characters as XML
>System Identifiers.

I disagree. Namespace URIs must adhere to the syntax constraints in RFC
2396 in order to *be* URIs at all. They are not URIs if they don't do,
no matter whether beeing used as Namespace Identification or not.

>   [Definition: URI references which identify namespaces are considered
>   identical when they are exactly the same character-for-character.]

>AGREED: It is unclear what is meant by "functionally equivalent" within
>the context of the Namespaces specification.

I agree, functional equivalence is not defined anywhere. Syntax
equivalence is defined in RFC 2396 and URI scheme registrations.

>AGREED: The "character-for-character" comparison should be specified as
>being case sensitive, eg:

I disagree. In order to call Namespace URIs "URIs" at all, equivalence
rules of URIs in general should be used to determine equivalence.

>-  "a" is not the same as "A"
>-  "http" is not the same as "HTTP"

Right according to the XML Namespaces specification, wrong according to
RFC 2396 and related specifications, equivalence depends on context,
i.e. "a" can be the same as "A" and it can be something different.

>AGREED: The Namespaces specification should make clear whether:
>-  "%6A" is the same as "j"

According to the Namespace specification it is not, those are different
character sequences.

>AGREED: The Namespaces specification should make clear whether:
>-  "%6A" is the same as "%6a"

Same as above. If they are equivalent, "http://www.example.org" and
"http://www.example.org/" should be considered equivalent, too, i.e.,
the Namespace processor needs to parse URIs and create the canonical
form of the URI.

More important to the I18N WG should be, whether
"http://www.example.org/~björn/" is equivalent to
"http://www.example.org/~bj%C3%B6rn/" and
"http://www.example.org/~bj%c3%b6rn/" and
"http://www.example.org/~bj%C3%b6rn/" and
"http://www.example.org/~bj%c3%B6rn/" or, if
"http://www.example.org/~björn/" is a valid Namespace Ident at all, I'd
say either the latter or the given examples are all equivalent, but this
definition of equivalence is an incompatible change to the Namespaces
Specification.

IMHO, the whole XML Namespaces Specification is broken, it uses URIs,
that are no URIs (since URI equivalence is not the same as Namespace
Ident equivalence) and HTTP URIs, one cannot GET (at least, there is no
constraint, that Namespace URIs need to be GETable).

If I had to reinvent the Namespaces specification, I would have
specified

  * Namespaces are defined through IRIs
  * Namespace equivalence is defined by the general IRI equivalence
    rules and the specific scheme equivalence rules
  * only W3C normalized absolute IRIs are allowed
  * The only allowed scheme is the newly registered "ns" scheme

The ns scheme would use the domain name system as namespace management
system, e.g., ns:www.w3.org:xhtml or something like that.

This is harder to implement and more expensive than the "Namespaces are
abitrary string and equivalence is determined by case-sensitive match"
mantra as currently used, but certainly more usable.

This does not solve all problems either. I think, the namespace should
be dereferenceable and dereferencing such IRI should yield in some
valueable information about the namespace, but we run into the problem,
that if the information is valueable, applications will rely on it (and
they possibly should), but this requires the resource to be always
dereferenceable, otherwise resources of that namespace would stop
working if the namespace is no longer dereferencable. The domain name
system is not able to gurantee this, neither is anything else, but the
domain name system has proven to be very unstable. We need a system of
IRN (Internationalized Uniform Resource Name) resolution for a more
trustable gurantee, but however this will look like, machines may stop
working at any point of time, this must not cause document to work
aswell, hence it is possibly a honorable goal to have dereferenceable
namespace identifiers, but impossible to implement it while staying
valueable.

regards.
Received on Thursday, 6 June 2002 17:18:27 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:32:32 UTC