- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 19 Nov 2002 07:50:25 +0900
- To: "Julian Reschke" <julian.reschke@gmx.de>, <www-tag@w3.org>
- Cc: www-international@w3.org
Hello Julian, Many thanks for your comments. I have copied www-international, where comments about the IRI draft should be sent/copied. At 22:48 02/11/13 +0100, Julian Reschke wrote: >I'd like to add a few facts that seem to get overlooked again and again :-) > >1) Allowing the space character in IRIs makes it impossible to use the >space character as delimiter between IRIs. Specs that as of now use >white-space separated lists of URIs (such as XML Schema for >namespaceLocation) *will* break if an IRI contains a space character. Please note that the IRI spec doesn't advocate using spaces; it clearly warns against them. If you see a good way to make this warning clearer, please tell me. >2) IRIs are not URIs (well, many of them). "Silently" replacing URI (refs) >by IRI (refs) in spec revisions (such as XML namespaces) potentially >breaks applications that assume URI-ness (such as that only ASCII >characters are used). The discussion currently is not about a silent replacement, but about a new version of the namespaces spec. >3) QName vs URI: if XML namespaces allow IRI refs as namespace names, the >issue of mapping QNames to URIs will get even messier as it *already* is. Because the conversion from IRIs to URIs is well-defined, I think saying it gets messier is not at all appropriate. There may be an additional step, but this step is well-defined. >For the record: I'm in favor of IRIs Thanks! >- if they stay focused on the issue of I18N -- allowing whitespace in the >identifier does not really fit into this requirement (IMHO), Well, yes. But then one could argue that there are a lot of other spaces in Unicode. So somebody who would like to use some spacing in an IRI could always use a non-breaking space, or something similar. Of course, this would be bad because the machine would notice the difference between the usual space and the non-breaking space, but users may not. >- if it's always clear whether you're looking at a IRI or a URI (when >specs "upgrade" from URI to IRI, this may require out-of-band information), Well, that's always fairly easy to check, or isn't it. What's the issue is knowing whether you may expect an IRI, or not. >- if they require full normalization Can you please explain what you mean by that? If you mean normalization as in getting rid of alternative forms of codepoint sequences that the Unicode Standard defines as canonically equivalent, e.g. by using NFC, then the current design for IRIs is based on the following assumptions: - There may be some specific needs to use non-NFC data in an IRI, the clearest example is that of a form that allows non-normalized input and sends back normalized output, which leads to an IRI with some non-normalized data after the '?'. - Whenever there are no specific needs, use NFC. This in particular applies when converting from a legacy encoding (such as iso-8859-1) to an Unicode-based encoding Because of the extremely broad scope of URIs/IRIs, I'm not sure we can be more strict than that. >Proposal/Question: > >- would it make sense to deprecate URIs that can not be transformed to >IRIs (that is, those URIs that do contain %-escapes that do not map to >UTF-8 representations of fully normalized Unicode strings)? We can, and hopefully will, deprecate the creation of new URIs that contain such escapes. We cannot deprecate the use of such URIs where they already exist, because we don't want to have existing URIs to go away. Also, there may be some cases (kind of similar to the data: URI) that encode completely binary information. Although I haven't seen any actual examples, and I don't think many will turn up (the data: scheme encodes binary data as base64). All this is of course up to the revision of RFC 2396, rather than the IRI spec. Regards, Martin.
Received on Monday, 18 November 2002 17:51:07 UTC