- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 04 Feb 2003 20:20:53 -0500
- To: Chris Lilley <chris@w3.org>, www-international@w3.org
- Cc: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org, Max Froumentin <mf@w3.org>, Michel Suignard <michelsu@microsoft.com>, emmanuel@w3.org
Hello Chris, At 22:51 03/02/03 +0100, Chris Lilley wrote: >On Monday, February 3, 2003, 8:54:32 PM, Martin wrote: > > >MD> At 20:20 03/01/27 -0500, Ian B. Jacobs wrote: > > >>Minutes of the 27 Jan 2003 TAG teleconf available as > >>HTML [1] and as text below. > >> DanCon, you wanted to suggest the value of having %7E specified to be > >> equivalent to %7e is purely aesthetic, and not *nearly* worth > >> the cost. > >OK so lets look at knock on effects here. Dan was not, I claim, >looking at these effects when he made his comment; in which case his >position seems very reasonable. After all these escapes are used very >infrequently. I clearly agree that this is not as easy as it may look from Dan's comment. >But it is very damaging. It would scupper IRIs. I'm not exactly sure this is the case. I hope we can all look at the arguments in detail. >Suppose there is some Unicode character FOO and it maps to %ab%cd%ef >in UTF-8 (it won't map from those precise values, there is no such >character, this is just an example). > >It would be highly desirable for FOO used in an IRI and the hexified >version of FOO used in a URI to compare the same when comparing two >URIs. Yes indeed. Please note that we are not talking about them never to compare equal. I think *at the very minimum*, it would be very good to nail down that for *resolution*, this equivalence always has to apply. This is what the IRI spec currently assumes, and my understanding is that it coincides with current practice. >If this is not done, then IRI-URI is a one-way street. > >For this to work in any sensible manner, then clearly it is not enough >for FOO to compare the same as %ab%cd%ef. It also has to compare the >same as %AB%CD%EF and %Ab%cd%eF and .... I agree that having FOO == %ab%cd%ef but FOO != %AB%CD%EF, on any level of equivalence, is a very bad idea. Having %ab%cd%ef == %AB%CD%EF but FOO != %ab%cd%ef (and therefore FOO != %AB%CD%EF) is slightly more logical, but doesn't really cut it (and I won't consider it anymore later in this message). >The third way, the way where %ab is not equal to %AB, means that we >can just give up on making FOO compare equal to %ab%cd%ef and thus, we >can just give up on any roundtripping from IRI to URI and thus, IRI >becomes merely a theoretical possibility. It becomes something that >exists in a spec but actual XML files contain a bunch of illegible >hexified nonsense. I get the impression that you are jumping to conclusions here a bit too quickly. XML is perfectly capable of representing FOO under all circumstances without %-escapes, by using the characters directly, or using numeric character references. For this example, let's just pretend that oo; is the numeric character reference for FOO. The area where FOO <-> %ab%cd%ef round tripping is most beneficial and important for IRIs is *outside* XML and other technologies (including paper) that have no way to represent FOO. But even in those cases, e.g. to tell somebody with an old email client (like myself) how to use a particular namespace, it's always possible to say: "well, just use xmlns:foo='http://example.org/oo;'." In contrast to %ab%cd%ef, oo; has the advantage that it is understood by every XML processor. So helping people to use oo; rather than %ab%cd%ef may actually have some benefits. >MD> Currently, Namespaces in XML 1.1 (Candidate Rec) specifies that for >MD> purposes of namespace equivalence, '%7e', '%7E', and '~' are different >MD> (see http://www.w3.org/TR/xml-names11/#IRIComparison). > >Yes. This should change. Well, again, this may well be the case, but let's not jump to conclusions too quickly. I have put up some new versions of the examples from Max Froumentin that I sent yesterday at http://www.w3.org/2003/02/uriEquivTest/. test1.xsl is the stylesheet for test1.xml, and test2.xsl is the stylesheet for test2.xml. You can have a look at http://www.w3.org/2003/02/uriEquivTest/test1.xml and http://www.w3.org/2003/02/uriEquivTest/test2.xml directly in a XML/XSL-enabled browser, and then look at the source and the stylesheet to see what's going on. http://www.w3.org/2003/02/uriEquivTest/test2.xml is in particular very interesting because it shows a namespace IRI that is already working exactly as described at http://www.w3.org/TR/xml-names11/#IRIComparison in the following circumstances: - XSLT stylesheet invoked via stylesheet PI on Internet Explorer 6 - XSLT stylesheet invoked via stylesheet PI on Netscape Navigator 6 - Xalan Java 2 - msxsl with version parameter set to '4.0' - msxsl with version parameter set to '3.0' I'm looking forward to receive other test results, and suggestions for improvements to the tests. (Thanks to Emmanuel Pietriga for some help) If we take the consistency of these test results as an indication, it may well be that the caution given at http://www.w3.org/TR/xml-names11/#IRIs ("Users defining namespaces are advised to restrict namespace names to URIs until software supporting IRIs is in common use."), at least for the purpose of pure namespace use (as opposed to any retrieval that one may want to do on the namespace URI), turns out to be unnecessary. IRIs already work, as per spec. What better could we wish for? As I hinted at in a previous mail, and again assuming the state of implementations with respect to IRIs suggested by the above tests, the danger to try and force every namespace implementation to compare FOO == %ab%cd%ef == %AB%CD%EF,... is not only a general chaos and incompatibilities among implementations, but also some serious damage to the image of IRIs. There is quite some chance than for namespaces, nobody will seriously rely on L == %4c == %4C because nobody really sees a point in bothering with %4c or %4C. But on the other hand, if told that they can expect it, people will try to rely on FOO == %ab%cd%ef == %AB%CD%EF (when they easily could use FOO == oo;), and will discover that it sometimes works and sometimes doesn't. That will easily lead them to conclude that "IRIs don't work.". Now *that* would indeed be very damaging to IRIs. In summary: - For namespaces, there is no need to roundtrip URI <-> IRI (i.e. FOO <-> %ab%cd%ef) in XML, because we have oo; - Current namespace implementations (as far as I was able to test) already support IRIs as described at http://www.w3.org/TR/xml-names11/#IRIComparison - There is a clear potential that by forcing namespace behavior to change, IRIs will be affected more negatively than URIs, and more negatively than in the current state. I hope that you can consider these argument. Regards, Martin.
Received on Tuesday, 4 February 2003 20:25:26 UTC