Re: [Minutes] 27 Jan 2003 TAG teleconf (httpRange-14, arch doc, IRIEverywhere-27, binaryXML-30, xmlProfiles-29) from Martin Duerst on 2003-02-05 (www-tag@w3.org from February 2003)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 04 Feb 2003 20:20:53 -0500
To: Chris Lilley <chris@w3.org>, www-international@w3.org
Cc: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org, Max Froumentin <mf@w3.org>, Michel Suignard <michelsu@microsoft.com>, emmanuel@w3.org
Message-Id: <4.2.0.58.J.20030204185306.027dab20@localhost>
Hello Chris,

At 22:51 03/02/03 +0100, Chris Lilley wrote:

>On Monday, February 3, 2003, 8:54:32 PM, Martin wrote:
>
>
>MD> At 20:20 03/01/27 -0500, Ian B. Jacobs wrote:
>
> >>Minutes of the 27 Jan 2003 TAG teleconf available as
> >>HTML [1] and as text below.

> >>    DanCon, you wanted to suggest the value of having %7E specified to be
> >>           equivalent to %7e is purely aesthetic, and not *nearly* worth
> >>           the cost.
>
>OK so lets look at knock on effects here. Dan was not, I claim,
>looking at these effects when he made his comment;  in which case his
>position seems very reasonable. After all these escapes are used very
>infrequently.

I clearly agree that this is not as easy as it may look from Dan's comment.


>But it is very damaging.  It would scupper IRIs.

I'm not exactly sure this is the case. I hope we can all
look at the arguments in detail.


>Suppose there is some Unicode character FOO and it maps to %ab%cd%ef
>in UTF-8 (it won't map from those precise values, there is no such
>character, this is just an example).
>
>It would be highly desirable for FOO used in an IRI and the hexified
>version of FOO used in a URI to compare the same when comparing two
>URIs.

Yes indeed. Please note that we are not talking about them
never to compare equal. I think *at the very minimum*,
it would be very good to nail down that for *resolution*,
this equivalence always has to apply. This is what the
IRI spec currently assumes, and my understanding is that
it coincides with current practice.


>If this is not done, then IRI-URI is a one-way street.
>
>For this to work in any sensible manner, then clearly it is not enough
>for FOO to compare the same as %ab%cd%ef. It also has to compare the
>same as %AB%CD%EF and %Ab%cd%eF and ....

I agree that having FOO == %ab%cd%ef but FOO != %AB%CD%EF, on any
level of equivalence, is a very bad idea. Having %ab%cd%ef == %AB%CD%EF
but FOO != %ab%cd%ef (and therefore FOO != %AB%CD%EF) is slightly
more logical, but doesn't really cut it (and I won't consider it
anymore later in this message).


>The third way, the way where %ab is not equal to %AB, means that we
>can just give up on making FOO compare equal to %ab%cd%ef and thus, we
>can just give up on any roundtripping from IRI to URI and thus, IRI
>becomes merely a theoretical possibility. It becomes something that
>exists in a spec but actual XML files contain a bunch of illegible
>hexified nonsense.

I get the impression that you are jumping to conclusions here a bit
too quickly. XML is perfectly capable of representing FOO under
all circumstances without %-escapes, by using the characters directly,
or using numeric character references. For this example, let's just
pretend that &#xfoo; is the numeric character reference for FOO.

The area where FOO <-> %ab%cd%ef round tripping is most beneficial
and important for IRIs is *outside* XML and other technologies
(including paper) that have no way to represent FOO. But even
in those cases, e.g. to tell somebody with an old email client
(like myself) how to use a particular namespace, it's always
possible to say: "well, just use xmlns:foo='http://example.org/&#xfoo;'."
In contrast to %ab%cd%ef, &#xfoo; has the advantage that it is
understood by every XML processor. So helping people to use
&#xfoo; rather than %ab%cd%ef may actually have some benefits.


>MD> Currently, Namespaces in XML 1.1 (Candidate Rec) specifies that for
>MD> purposes of namespace equivalence, '%7e', '%7E', and '~' are different
>MD> (see http://www.w3.org/TR/xml-names11/#IRIComparison).
>
>Yes. This should change.

Well, again, this may well be the case, but let's not jump to
conclusions too quickly. I have put up some new versions of
the examples from Max Froumentin that I sent yesterday at
http://www.w3.org/2003/02/uriEquivTest/. test1.xsl is the
stylesheet for test1.xml, and test2.xsl is the stylesheet
for test2.xml. You can have a look at
    http://www.w3.org/2003/02/uriEquivTest/test1.xml
and
    http://www.w3.org/2003/02/uriEquivTest/test2.xml
directly in a XML/XSL-enabled browser, and then look at
the source and the stylesheet to see what's going on.
http://www.w3.org/2003/02/uriEquivTest/test2.xml is
in particular very interesting because it shows a
namespace IRI that is already working exactly as
described at http://www.w3.org/TR/xml-names11/#IRIComparison
in the following circumstances:

    - XSLT stylesheet invoked via stylesheet PI on Internet Explorer 6
    - XSLT stylesheet invoked via stylesheet PI on Netscape Navigator 6
    - Xalan Java 2
    - msxsl with version parameter set to '4.0'
    - msxsl with version parameter set to '3.0'

I'm looking forward to receive other test results,
and suggestions for improvements to the tests.
(Thanks to Emmanuel Pietriga for some help)

If we take the consistency of these test results as an indication,
it may well be that the caution given at
http://www.w3.org/TR/xml-names11/#IRIs ("Users defining namespaces
are advised to restrict namespace names to URIs until software
supporting IRIs is in common use."), at least for the purpose
of pure namespace use (as opposed to any retrieval that one
may want to do on the namespace URI), turns out to be unnecessary.
IRIs already work, as per spec. What better could we wish for?

As I hinted at in a previous mail, and again assuming the state
of implementations with respect to IRIs suggested by the above tests,
the danger to try and force every namespace implementation to
compare FOO == %ab%cd%ef == %AB%CD%EF,... is not only a general
chaos and incompatibilities among implementations, but also some
serious damage to the image of IRIs. There is quite some chance
than for namespaces, nobody will seriously rely on
         L == %4c == %4C
because nobody really sees a point in bothering with %4c or %4C.
But on the other hand, if told that they can expect it, people
will try to rely on
         FOO == %ab%cd%ef == %AB%CD%EF
(when they easily could use FOO == &#xfoo;), and will discover
that it sometimes works and sometimes doesn't. That will easily
lead them to conclude that "IRIs don't work.". Now *that*
would indeed be very damaging to IRIs.


In summary:

- For namespaces, there is no need to roundtrip URI <-> IRI
   (i.e. FOO <-> %ab%cd%ef) in XML, because we have &#xfoo;

- Current namespace implementations (as far as I was able to test)
   already support IRIs as described at
   http://www.w3.org/TR/xml-names11/#IRIComparison

- There is a clear potential that by forcing namespace behavior
   to change, IRIs will be affected more negatively than URIs,
   and more negatively than in the current state.


I hope that you can consider these argument.

Regards,    Martin.
Received on Tuesday, 4 February 2003 20:25:26 UTC