- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Thu, 20 Dec 2007 08:56:53 -0500
- To: public-iri@w3.org
Summary: suggest swapping two sentences to be clearer that %-encoding of
*any* problematic character is permitted during data entry.
Hi Martin,
I hadn't realised that there was a draft - I too have an editorial
problemette.
I was actually trying to use the IRI RFC to resolve a problem from one
of my IRI library users, concerning an xpointer frag-ID involving [ and ].
I found this paragraph, and then had some problems.
Protocols and formats that have used earlier
definitions of IRIs including these characters MAY require percent-
encoding of these characters as a preprocessing step to extract the
actual IRI from a given field. This preprocessing MAY also be used
by applications allowing the user to enter an IRI. Please note that
the number sign ("#"), the percent sign ("%"), and the square bracket
characters ("[", "]") are not part of the above list and MUST NOT be
converted.
This paragraph is in the section about IRI to URI conversion.
The last sentence quoted initially appears clear. IRI library software
MUST NOT do such a conversion, and so
http://example.org/x#xpointer(foo[3])
is not an IRI, since nothing gets converted and it is not a URI.
But then to use something like xpointer, someone, somewhere, has to do
the conversion from [ ] to the appropriate % encodings. At first blush,
the MUST NOT in that last sentence, seems to prohibit everyone,
everywhere, which would then make such frag IDs unusable.
The MUST NOT is in the context of IRI to URI conversion, so, it probably
does not have the universal scope suggested in the first blush.
In terms of the obvious reading of the text, it also seems to cover the
data entry scenario covered in the preceeding sentence. For such a
scenario, it seems too strong.
So, in terms of effect, I would like to suggest that on data entry,
application specific behaviour to percent encode characters that would
otherwise be problematic is OK ... but during IRI processing the MUSTs
stand as is, with the MAY for backward compatibility.
I think text wise it is then best to invert the last two sentences,
since the MUST NOT refers to IRI-to-URI conversion, not to data entry.
e.g. replace quoted text with
Protocols and formats that have used earlier
definitions of IRIs including these characters MAY require percent-
encoding of these characters as a preprocessing step to extract the
actual IRI from a given field. Please note that
the number sign ("#"), the percent sign ("%"), and the square bracket
characters ("[", "]") are not part of the above list and MUST NOT be
converted.
Similarly, preprocessing involving percent encoding of otherwise
problematic characters MAY also be used
by applications allowing the user to enter an IRI.
I do not recall the full discussion concerning these issues. If, in
fact, there was a decision that frag IDs like #xpointer(foo[3]) are
simply too difficult to get right, and were a mistake, and are not
supported, then the current text is OK, but possibly should be
strengthened to:
... and MUST NOT be
converted, either during IRI-to-URI conversion, or by applications
allowing the user to enter an IRI.
(although that could be phrased better).
Jeremy
Received on Thursday, 20 December 2007 13:57:29 UTC