- From: Norman Walsh <ndw@nwalsh.com>
- Date: Wed, 01 Aug 2007 11:26:42 -0400
- To: public-xml-core-wg@w3.org, public-iri@w3.org, Richard Ishida <ishida@w3.org>, Felix Sasaki <fsasaki@w3.org>, www-xml-linking-comments@w3.org, public-i18n-core@w3.org, Martin Duerst <duerst@it.aoyama.ac.jp>
- Message-ID: <87ps27l1hp.fsf@nwalsh.com>
Ping? / Norman Walsh <ndw@nwalsh.com> was heard to say: | Hi, | | Sorry I was out of the loop for a bit. I see from the email threads | that we've got some improved wording proposed for the list of | characters that have to be escaped if they appear in HRRI and some | improved wording for the security considerations section. I'll | incorporate those as soon as I can. | | However, as far as I can tell, we still don't have a clear | understanding about whether we need HRRI or not. | | Here's how I see it. Sorry if this is a little repetative; I'm hoping | that considering this issue from a higher level again will help. | | 1. The XML Recommendation says that a system identifier consists of a | single or double quote followed by any characters followed by a | matching quote: | | SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") | | Any attempt to limit the characters allowed in a system identifier | would be a backwards incompatible change to XML. That is simply not an | option. | | 2. Because we knew that system identifiers allowed characters that | couldn't appear in URIs, we added some wording to clarify how | processors must escape those characters if they needed URIs. | | Over time, this text was refined, using fragments taken from drafts of | the IRI spec, and is now "cut-and-pasted" into several | recommendations. | | It's become clear that this cut-and-paste approach is tedious and | error-prone and does not scale. Asking future specs to continue this | cut-and-paste process from one or another of the existing specs is | just not helpful to the community. | | 3. The HRRI spec proposes to instantiate the very liberal repertoire | of characters allowed in a system identifier (and all the other | places) in a short, stand-alone specification. This specification will | have a name and will be available for normative reference. | | I understand that perhaps the world would be a better place if we | didn't need another name for another flavor of a string that serves | the role of identifying a resource. But that's not an option, see | point 1. | | Martin's message that quoted this paragraph from the IRI spec gave a | glimmer of hope that perhaps we could avoid 3. | | Systems accepting IRIs MAY also deal with the printable characters in | US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, | "{", "}", "|", "\", "^", and "`", in step 2 above. If these | characters are found but are not converted, then the conversion | SHOULD fail. Please note that the number sign ("#"), the percent | sign ("%"), and the square bracket characters ("[", "]") are not part | of the above list and MUST NOT be converted. Protocols and formats | that have used earlier definitions of IRIs including these characters | MAY require percent-encoding of these characters as a preprocessing | step to extract the actual IRI from a given field. This | preprocessing MAY also be used by applications allowing the user to | enter an IRI. | | Unfortunately, our problem is that system identifiers can contain not | just "printable characters in US-ASCII that are not allowed in URIs" | but a wide range of characters from elsewhere in Unicode that are not | allowed in URIs (or IRIs). | | Question: Is the paragraph from the IRI spec above intended to be | broader than a literal reading would suggest? Is it the intent of the | IRI spec that systems accepting IRIs MAY also deal with characters not | allowed in URIs by converting them? | | If so, then perhaps we can simply say that system identifiers are IRIs | and note this provision in the IRI spec for what I'll call "legacy" | identifiers. | | If not, then I think we must proceed with the HRRI spec. | | Thoughts? | | Be seeing you, | norm | | -- | Norman Walsh <ndw@nwalsh.com> | A great deal may be done by severity, | http://nwalsh.com/ | more by love, but most by clear | | discernment and impartial | | justice.--Goethe
Received on Wednesday, 1 August 2007 15:27:46 UTC