- From: Norman Walsh <ndw@nwalsh.com>
- Date: Wed, 01 Aug 2007 11:26:42 -0400
- To: public-xml-core-wg@w3.org, public-iri@w3.org, Richard Ishida <ishida@w3.org>, Felix Sasaki <fsasaki@w3.org>, www-xml-linking-comments@w3.org, public-i18n-core@w3.org, Martin Duerst <duerst@it.aoyama.ac.jp>
- Message-ID: <87ps27l1hp.fsf@nwalsh.com>
Ping?
/ Norman Walsh <ndw@nwalsh.com> was heard to say:
| Hi,
|
| Sorry I was out of the loop for a bit. I see from the email threads
| that we've got some improved wording proposed for the list of
| characters that have to be escaped if they appear in HRRI and some
| improved wording for the security considerations section. I'll
| incorporate those as soon as I can.
|
| However, as far as I can tell, we still don't have a clear
| understanding about whether we need HRRI or not.
|
| Here's how I see it. Sorry if this is a little repetative; I'm hoping
| that considering this issue from a higher level again will help.
|
| 1. The XML Recommendation says that a system identifier consists of a
| single or double quote followed by any characters followed by a
| matching quote:
|
| SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
|
| Any attempt to limit the characters allowed in a system identifier
| would be a backwards incompatible change to XML. That is simply not an
| option.
|
| 2. Because we knew that system identifiers allowed characters that
| couldn't appear in URIs, we added some wording to clarify how
| processors must escape those characters if they needed URIs.
|
| Over time, this text was refined, using fragments taken from drafts of
| the IRI spec, and is now "cut-and-pasted" into several
| recommendations.
|
| It's become clear that this cut-and-paste approach is tedious and
| error-prone and does not scale. Asking future specs to continue this
| cut-and-paste process from one or another of the existing specs is
| just not helpful to the community.
|
| 3. The HRRI spec proposes to instantiate the very liberal repertoire
| of characters allowed in a system identifier (and all the other
| places) in a short, stand-alone specification. This specification will
| have a name and will be available for normative reference.
|
| I understand that perhaps the world would be a better place if we
| didn't need another name for another flavor of a string that serves
| the role of identifying a resource. But that's not an option, see
| point 1.
|
| Martin's message that quoted this paragraph from the IRI spec gave a
| glimmer of hope that perhaps we could avoid 3.
|
| Systems accepting IRIs MAY also deal with the printable characters in
| US-ASCII that are not allowed in URIs, namely "<", ">", '"', space,
| "{", "}", "|", "\", "^", and "`", in step 2 above. If these
| characters are found but are not converted, then the conversion
| SHOULD fail. Please note that the number sign ("#"), the percent
| sign ("%"), and the square bracket characters ("[", "]") are not part
| of the above list and MUST NOT be converted. Protocols and formats
| that have used earlier definitions of IRIs including these characters
| MAY require percent-encoding of these characters as a preprocessing
| step to extract the actual IRI from a given field. This
| preprocessing MAY also be used by applications allowing the user to
| enter an IRI.
|
| Unfortunately, our problem is that system identifiers can contain not
| just "printable characters in US-ASCII that are not allowed in URIs"
| but a wide range of characters from elsewhere in Unicode that are not
| allowed in URIs (or IRIs).
|
| Question: Is the paragraph from the IRI spec above intended to be
| broader than a literal reading would suggest? Is it the intent of the
| IRI spec that systems accepting IRIs MAY also deal with characters not
| allowed in URIs by converting them?
|
| If so, then perhaps we can simply say that system identifiers are IRIs
| and note this provision in the IRI spec for what I'll call "legacy"
| identifiers.
|
| If not, then I think we must proceed with the HRRI spec.
|
| Thoughts?
|
| Be seeing you,
| norm
|
| --
| Norman Walsh <ndw@nwalsh.com> | A great deal may be done by severity,
| http://nwalsh.com/ | more by love, but most by clear
| | discernment and impartial
| | justice.--Goethe
Received on Wednesday, 1 August 2007 15:27:40 UTC