- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Fri, 22 Jun 2007 19:39:25 +0900
- To: Richard Tobin <richard@inf.ed.ac.uk>, Bjoern Hoehrmann <derhoermi@gmx.net>, "Grosso, Paul" <pgrosso@ptc.com>
- Cc: <public-iri@w3.org>, <www-xml-linking-comments@w3.org>, <public-xml-core-wg@w3.org>, <public-i18n-core@w3.org>
Hello Richard, At 00:59 07/06/21, Richard Tobin wrote: >> You should simply drop this effort and use IRI References instead. There >> is a high cost associated with yet another notion of resource identifier >> technology > >This is not another notion of resource identifier. It is the existing >notion used for XML system identifier, XLink href, and several other >things. We are merely providing a name and a single place for a >definition that already exists in multiple specs. If these things are not resource identifiers, then what are they? >> Simply prohibit anything but IRI references That would constitute a normative change to several specs. In my oppinion, that may be inappropriate for spaces and a few other characters, in particular in the context of XPointer, but it would definitely be highly appropriate for arbitrary control characters (if you ever have encountered an URI/IRI with an arbitrary control character (not TAB/CR/LF, I'd really like to know). >> and, >> if necessary, specify "utf-8-percent-escape all disallowed characters" >> as error recovery method. That would not, at least not if you consider observable behavior to be the relevant criterion. At least for the XML spec itself, there may be a point of view that simply saying "it's an IRI" won't change anything. I'll try to explain this below. Looking at the definition of a PubidLiteral for a moment (http://www.w3.org/TR/REC-xml/#NT-PubidLiteral), it just specifies a range of characters that can be used, nothing more in terms of syntax, although it could be argued that some syntaxes (those with several // included) are much more likely, or even highly expected to make a PubidLiteral usable in a wider context (which 'Public' suggests in the first place). Likewise, the syntax for SystemLiteral is specified simply as a string of characters (from a much wider repertoire). To say that this is an IRI does not restrict this syntax. It is a well acknowledged fact that URI and IRI syntax are very difficult to check (because there are scheme-dependent restrictions, and so on) and that therefore, any strict checking (in the way e.g. the XML syntax is checked for well-formedness) is not appropriate for URIs or IRIs. The rest (namely conversion of unallowed characters to %hh-encoding) seems to already be covered under the following paragraph from the IRI spec: Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, "{", "}", "|", "\", "^", and "`", in step 2 above. If these characters are found but are not converted, then the conversion SHOULD fail. Please note that the number sign ("#"), the percent sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted. Protocols and formats that have used earlier definitions of IRIs including these characters MAY require percent-encoding of these characters as a preprocessing step to extract the actual IRI from a given field. This preprocessing MAY also be used by applications allowing the user to enter an IRI. I'm not saying that this interpretation is the only one possible, and I'm not sure how it would apply to XLink and others, but I wanted to show it here as one point of view. Regards, Martin. >That would constitute a normative change to several specs. > >-- Richard #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Friday, 22 June 2007 10:40:23 UTC