- From: Chris Lilley <chris@w3.org>
- Date: Thu, 6 Feb 2003 03:57:26 +0100
- To: Martin Duerst <duerst@w3.org>
- CC: www-international@w3.org, "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org, Max Froumentin <mf@w3.org>, Michel Suignard <michelsu@microsoft.com>, <emmanuel@w3.org>
On Wednesday, February 5, 2003, 2:20:53 AM, Martin wrote: MD> Hello Chris, MD> At 22:51 03/02/03 +0100, Chris Lilley wrote: >>The third way, the way where %ab is not equal to %AB, means that we >>can just give up on making FOO compare equal to %ab%cd%ef and thus, we >>can just give up on any roundtripping from IRI to URI and thus, IRI >>becomes merely a theoretical possibility. It becomes something that >>exists in a spec but actual XML files contain a bunch of illegible >>hexified nonsense. MD> I get the impression that you are jumping to conclusions here a bit MD> too quickly. No, but I can see why you would think that. MD> XML is perfectly capable of representing FOO under MD> all circumstances without %-escapes, by using the characters directly, MD> or using numeric character references. Yes, it is perfectly capable. My point was that if IRI to URI was a one way trip then people would ignore the fact that the IRI could easily be represented in XML; *for portability* and * to be robust* to guard against any hex escaping that might happen later down the road they would always fully escape the IRI to start with. MD> For this example, let's just MD> pretend that oo; is the numeric character reference for FOO. Clearly this is possible. My point was that if the character was not defined to be the same as the hex escape then people would not use this possibility. MD> The area where FOO <-> %ab%cd%ef round tripping is most beneficial MD> and important for IRIs is *outside* XML and other technologies MD> (including paper) that have no way to represent FOO. Right. So, to guard against any hexifying that might happen, technologies that can represent FOO would represent it as %AB%CD%EF. MD> But even MD> in those cases, e.g. to tell somebody with an old email client MD> (like myself) how to use a particular namespace, it's always MD> possible to say: "well, just use xmlns:foo='http://example.org/oo;'." As long as i told you in an XML message and not, for example, plain text. MD> In contrast to %ab%cd%ef, oo; has the advantage that it is MD> understood by every XML processor. So helping people to use MD> oo; rather than %ab%cd%ef may actually have some benefits. I think you miss my point completely here. Whether people have to use an NCR or are able to type the character in directly is irrelevant. >>MD> Currently, Namespaces in XML 1.1 (Candidate Rec) specifies that for >>MD> purposes of namespace equivalence, '%7e', '%7E', and '~' are different >>MD> (see http://www.w3.org/TR/xml-names11/#IRIComparison). >> >>Yes. This should change. MD> Well, again, this may well be the case, but let's not jump to MD> conclusions too quickly. I have put up some new versions of MD> the examples from Max Froumentin that I sent yesterday at MD> http://www.w3.org/2003/02/uriEquivTest/. test1.xsl is the MD> stylesheet for test1.xml, and test2.xsl is the stylesheet MD> for test2.xml. You can have a look at MD> http://www.w3.org/2003/02/uriEquivTest/test1.xml MD> and MD> http://www.w3.org/2003/02/uriEquivTest/test2.xml MD> directly in a XML/XSL-enabled browser, and then look at MD> the source and the stylesheet to see what's going on. MD> http://www.w3.org/2003/02/uriEquivTest/test2.xml is MD> in particular very interesting because it shows a MD> namespace IRI that is already working exactly as MD> described at http://www.w3.org/TR/xml-names11/#IRIComparison MD> in the following circumstances: MD> - XSLT stylesheet invoked via stylesheet PI on Internet Explorer 6 MD> - XSLT stylesheet invoked via stylesheet PI on Netscape Navigator 6 MD> - Xalan Java 2 MD> - msxsl with version parameter set to '4.0' MD> - msxsl with version parameter set to '3.0' MD> I'm looking forward to receive other test results, MD> and suggestions for improvements to the tests. MD> (Thanks to Emmanuel Pietriga for some help) The test results are very interesting. However, they tell us what has been implemented so far not what should be specified. MD> If we take the consistency of these test results as an indication, MD> it may well be that the caution given at MD> http://www.w3.org/TR/xml-names11/#IRIs ("Users defining namespaces MD> are advised to restrict namespace names to URIs until software MD> supporting IRIs is in common use."), at least for the purpose MD> of pure namespace use (as opposed to any retrieval that one MD> may want to do on the namespace URI), turns out to be unnecessary. MD> IRIs already work, as per spec. What better could we wish for? MD> As I hinted at in a previous mail, and again assuming the state MD> of implementations with respect to IRIs suggested by the above tests, MD> the danger to try and force every namespace implementation to MD> compare FOO == %ab%cd%ef == %AB%CD%EF,... is not only a general MD> chaos and incompatibilities among implementations, but also some MD> serious damage to the image of IRIs. There is quite some chance MD> than for namespaces, nobody will seriously rely on MD> L == %4c == %4C MD> because nobody really sees a point in bothering with %4c or %4C. MD> But on the other hand, if told that they can expect it, people MD> will try to rely on MD> FOO == %ab%cd%ef == %AB%CD%EF MD> (when they easily could use FOO == oo;), and will discover MD> that it sometimes works and sometimes doesn't. That will easily MD> lead them to conclude that "IRIs don't work.". Now *that* MD> would indeed be very damaging to IRIs. And this is exactly why, to obtain the desirable result for IRI that FOO == %ab%cd%ef == %AB%CD%EF then the otherwise not apparently result that %4c == %4C needs to be implemented for URI. Whether that is implemented as L == %4c == %4C or (more likely) as L!= 4c but 4c == 4C (so IRI specifically does not affect the meaning of the ASCII characters, but the currently poorly specified beyond-ASCII characters compare to their hexified equivalents) is for discussion. MD> In summary: MD> - For namespaces, there is no need to roundtrip URI <-> IRI MD> (i.e. FOO <-> %ab%cd%ef) in XML, because we have oo; Uh, that may be but has not been shown. MD> - Current namespace implementations (as far as I was able to test) MD> already support IRIs as described at MD> http://www.w3.org/TR/xml-names11/#IRIComparison MD> - There is a clear potential that by forcing namespace behavior MD> to change, IRIs will be affected more negatively than URIs, MD> and more negatively than in the current state. MD> I hope that you can consider these argument. Of course. It may be that IRI to URI being a one way trip is the best that we can do. But I would prefer to try and do better. -- Chris mailto:chris@w3.org
Received on Wednesday, 5 February 2003 21:57:37 UTC