Re: HRRI vs IRI in XML

Ping?

/ Norman Walsh <ndw@nwalsh.com> was heard to say:
| Hi,
|
| Sorry I was out of the loop for a bit. I see from the email threads
| that we've got some improved wording proposed for the list of
| characters that have to be escaped if they appear in HRRI and some
| improved wording for the security considerations section. I'll
| incorporate those as soon as I can.
|
| However, as far as I can tell, we still don't have a clear
| understanding about whether we need HRRI or not.
|
| Here's how I see it. Sorry if this is a little repetative; I'm hoping
| that considering this issue from a higher level again will help.
|
| 1. The XML Recommendation says that a system identifier consists of a
| single or double quote followed by any characters followed by a
| matching quote:
|
|   SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
|
| Any attempt to limit the characters allowed in a system identifier
| would be a backwards incompatible change to XML. That is simply not an
| option.
|
| 2. Because we knew that system identifiers allowed characters that
| couldn't appear in URIs, we added some wording to clarify how
| processors must escape those characters if they needed URIs.
|
| Over time, this text was refined, using fragments taken from drafts of
| the IRI spec, and is now "cut-and-pasted" into several
| recommendations.
|
| It's become clear that this cut-and-paste approach is tedious and
| error-prone and does not scale. Asking future specs to continue this
| cut-and-paste process from one or another of the existing specs is
| just not helpful to the community.
|
| 3. The HRRI spec proposes to instantiate the very liberal repertoire
| of characters allowed in a system identifier (and all the other
| places) in a short, stand-alone specification. This specification will
| have a name and will be available for normative reference.
|
| I understand that perhaps the world would be a better place if we
| didn't need another name for another flavor of a string that serves
| the role of identifying a resource. But that's not an option, see
| point 1.
|
| Martin's message that quoted this paragraph from the IRI spec gave a
| glimmer of hope that perhaps we could avoid 3.
|
|    Systems accepting IRIs MAY also deal with the printable characters in
|    US-ASCII that are not allowed in URIs, namely "<", ">", '"', space,
|    "{", "}", "|", "\", "^", and "`", in step 2 above.  If these
|    characters are found but are not converted, then the conversion
|    SHOULD fail.  Please note that the number sign ("#"), the percent
|    sign ("%"), and the square bracket characters ("[", "]") are not part
|    of the above list and MUST NOT be converted.  Protocols and formats
|    that have used earlier definitions of IRIs including these characters
|    MAY require percent-encoding of these characters as a preprocessing
|    step to extract the actual IRI from a given field.  This
|    preprocessing MAY also be used by applications allowing the user to
|    enter an IRI.
|
| Unfortunately, our problem is that system identifiers can contain not
| just "printable characters in US-ASCII that are not allowed in URIs"
| but a wide range of characters from elsewhere in Unicode that are not
| allowed in URIs (or IRIs).
|
| Question: Is the paragraph from the IRI spec above intended to be
| broader than a literal reading would suggest? Is it the intent of the
| IRI spec that systems accepting IRIs MAY also deal with characters not
| allowed in URIs by converting them?
|
| If so, then perhaps we can simply say that system identifiers are IRIs
| and note this provision in the IRI spec for what I'll call "legacy"
| identifiers.
|
| If not, then I think we must proceed with the HRRI spec.
|
| Thoughts?
|
|                                         Be seeing you,
|                                           norm
|
| -- 
| Norman Walsh <ndw@nwalsh.com> | A great deal may be done by severity,
| http://nwalsh.com/            | more by love, but most by clear
|                               | discernment and impartial
|                               | justice.--Goethe

Received on Wednesday, 1 August 2007 15:27:42 UTC