Re: [Minutes] 27 Jan 2003 TAG teleconf (httpRange-14, arch doc, IRIEverywhere-27, binaryXML-30, xmlProfiles-29) from Chris Lilley on 2003-02-06 (www-international@w3.org from January to March 2003)

From: Chris Lilley <chris@w3.org>
Date: Thu, 6 Feb 2003 03:57:26 +0100
To: Martin Duerst <duerst@w3.org>
CC: www-international@w3.org, "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org, Max Froumentin <mf@w3.org>, Michel Suignard <michelsu@microsoft.com>, <emmanuel@w3.org>
Message-ID: <9331297423.20030206035726@w3.org>
On Wednesday, February 5, 2003, 2:20:53 AM, Martin wrote:

MD> Hello Chris,

MD> At 22:51 03/02/03 +0100, Chris Lilley wrote:

>>The third way, the way where %ab is not equal to %AB, means that we
>>can just give up on making FOO compare equal to %ab%cd%ef and thus, we
>>can just give up on any roundtripping from IRI to URI and thus, IRI
>>becomes merely a theoretical possibility. It becomes something that
>>exists in a spec but actual XML files contain a bunch of illegible
>>hexified nonsense.

MD> I get the impression that you are jumping to conclusions here a bit
MD> too quickly.

No, but I can see why you would think that.

MD> XML is perfectly capable of representing FOO under
MD> all circumstances without %-escapes, by using the characters directly,
MD> or using numeric character references.

Yes, it is perfectly capable. My point was that if IRI to URI was a
one way trip then people would ignore the fact that the IRI could
easily be represented in XML; *for portability* and * to be robust*
to guard against any hex escaping that might happen later down the
road they would always fully escape the IRI to start with.

MD> For this example, let's just
MD> pretend that &#xfoo; is the numeric character reference for FOO.

Clearly this is possible. My point was that if the character was not
defined to be the same as the hex escape then people would not use
this possibility.

MD> The area where FOO <-> %ab%cd%ef round tripping is most beneficial
MD> and important for IRIs is *outside* XML and other technologies
MD> (including paper) that have no way to represent FOO.

Right. So, to guard against any hexifying that might happen,
technologies that can represent FOO would represent it as %AB%CD%EF.

MD> But even
MD> in those cases, e.g. to tell somebody with an old email client
MD> (like myself) how to use a particular namespace, it's always
MD> possible to say: "well, just use xmlns:foo='http://example.org/&#xfoo;'."

As long as i told you in an XML message and not, for example, plain
text.

MD> In contrast to %ab%cd%ef, &#xfoo; has the advantage that it is
MD> understood by every XML processor. So helping people to use
MD> &#xfoo; rather than %ab%cd%ef may actually have some benefits.

I think you miss my point completely here. Whether people have to use
an NCR or are able to type the character in directly is irrelevant.

>>MD> Currently, Namespaces in XML 1.1 (Candidate Rec) specifies that for
>>MD> purposes of namespace equivalence, '%7e', '%7E', and '~' are different
>>MD> (see http://www.w3.org/TR/xml-names11/#IRIComparison).
>>
>>Yes. This should change.

MD> Well, again, this may well be the case, but let's not jump to
MD> conclusions too quickly. I have put up some new versions of
MD> the examples from Max Froumentin that I sent yesterday at
MD> http://www.w3.org/2003/02/uriEquivTest/. test1.xsl is the
MD> stylesheet for test1.xml, and test2.xsl is the stylesheet
MD> for test2.xml. You can have a look at
MD>     http://www.w3.org/2003/02/uriEquivTest/test1.xml
MD> and
MD>     http://www.w3.org/2003/02/uriEquivTest/test2.xml
MD> directly in a XML/XSL-enabled browser, and then look at
MD> the source and the stylesheet to see what's going on.
MD> http://www.w3.org/2003/02/uriEquivTest/test2.xml is
MD> in particular very interesting because it shows a
MD> namespace IRI that is already working exactly as
MD> described at http://www.w3.org/TR/xml-names11/#IRIComparison
MD> in the following circumstances:

MD>     - XSLT stylesheet invoked via stylesheet PI on Internet Explorer 6
MD>     - XSLT stylesheet invoked via stylesheet PI on Netscape Navigator 6
MD>     - Xalan Java 2
MD>     - msxsl with version parameter set to '4.0'
MD>     - msxsl with version parameter set to '3.0'

MD> I'm looking forward to receive other test results,
MD> and suggestions for improvements to the tests.
MD> (Thanks to Emmanuel Pietriga for some help)

The test results are very interesting. However, they tell us what has
been implemented so far not what should be specified.

MD> If we take the consistency of these test results as an indication,
MD> it may well be that the caution given at
MD> http://www.w3.org/TR/xml-names11/#IRIs ("Users defining namespaces
MD> are advised to restrict namespace names to URIs until software
MD> supporting IRIs is in common use."), at least for the purpose
MD> of pure namespace use (as opposed to any retrieval that one
MD> may want to do on the namespace URI), turns out to be unnecessary.
MD> IRIs already work, as per spec. What better could we wish for?

MD> As I hinted at in a previous mail, and again assuming the state
MD> of implementations with respect to IRIs suggested by the above tests,
MD> the danger to try and force every namespace implementation to
MD> compare FOO == %ab%cd%ef == %AB%CD%EF,... is not only a general
MD> chaos and incompatibilities among implementations, but also some
MD> serious damage to the image of IRIs. There is quite some chance
MD> than for namespaces, nobody will seriously rely on
MD>          L == %4c == %4C
MD> because nobody really sees a point in bothering with %4c or %4C.
MD> But on the other hand, if told that they can expect it, people
MD> will try to rely on
MD>          FOO == %ab%cd%ef == %AB%CD%EF
MD> (when they easily could use FOO == &#xfoo;), and will discover
MD> that it sometimes works and sometimes doesn't. That will easily
MD> lead them to conclude that "IRIs don't work.". Now *that*
MD> would indeed be very damaging to IRIs.

And this is exactly why, to obtain the desirable result for IRI that
FOO == %ab%cd%ef == %AB%CD%EF then the otherwise not apparently result
that %4c == %4C needs to be implemented for URI.

Whether that is implemented as L == %4c == %4C or (more likely) as L!=
4c but 4c == 4C (so IRI specifically does not affect the meaning of
the ASCII characters, but the currently poorly specified beyond-ASCII
characters compare to their hexified equivalents) is for discussion.


MD> In summary:

MD> - For namespaces, there is no need to roundtrip URI <-> IRI
MD>    (i.e. FOO <-> %ab%cd%ef) in XML, because we have &#xfoo;

Uh, that may be but has not been shown.

MD> - Current namespace implementations (as far as I was able to test)
MD>    already support IRIs as described at
MD>    http://www.w3.org/TR/xml-names11/#IRIComparison

MD> - There is a clear potential that by forcing namespace behavior
MD>    to change, IRIs will be affected more negatively than URIs,
MD>    and more negatively than in the current state.


MD> I hope that you can consider these argument.

Of course. It may be that IRI to URI being a one way trip is the best
that we can do. But I would prefer to try and do better.


-- 
 Chris                            mailto:chris@w3.org
Received on Wednesday, 5 February 2003 21:57:37 UTC