- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Thu, 20 Dec 2007 11:18:48 +0000
- To: public-iri@w3.org
Summary: suggest swapping two sentences to be clearer that %-encoding of *any* problematic character is permitted during data entry. Hi Martin, I hadn't realised that there was a draft - I too have an editorial problemette. I was actually trying to use the IRI RFC to resolve a problem from one of my IRI library users, concerning an xpointer frag-ID involving [ and ]. I found this paragraph, and then had some problems. Protocols and formats that have used earlier definitions of IRIs including these characters MAY require percent- encoding of these characters as a preprocessing step to extract the actual IRI from a given field. This preprocessing MAY also be used by applications allowing the user to enter an IRI. Please note that the number sign ("#"), the percent sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted. This paragraph is in the section about IRI to URI conversion. The last sentence quoted initially appears clear. IRI library software MUST NOT do such a conversion, and so http://example.org/x#xpointer(foo[3]) is not an IRI, since nothing gets converted and it is not a URI. But then to use something like xpointer, someone, somewhere, has to do the conversion from [ ] to the appropriate % encodings. At first blush, the MUST NOT in that last sentence, seems to prohibit everyone, everywhere, which would then make such frag IDs unusable. The MUST NOT is in the context of IRI to URI conversion, so, it probably does not have the universal scope suggested in the first blush. In terms of the obvious reading of the text, it also seems to cover the data entry scenario covered in the preceeding sentence. For such a scenario, it seems too strong. So, in terms of effect, I would like to suggest that on data entry, application specific behaviour to percent encode characters that would otherwise be problematic is OK ... but during IRI processing the MUSTs stand as is, with the MAY for backward compatibility. I think text wise it is then best to invert the last two sentences, since the MUST NOT refers to IRI-to-URI conversion, not to data entry. e.g. replace quoted text with Protocols and formats that have used earlier definitions of IRIs including these characters MAY require percent- encoding of these characters as a preprocessing step to extract the actual IRI from a given field. Please note that the number sign ("#"), the percent sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted. Similarly, preprocessing involving percent encoding of otherwise problematic characters MAY also be used by applications allowing the user to enter an IRI. I do not recall the full discussion concerning these issues. If, in fact, there was a decision that frag IDs like #xpointer(foo[3]) are simply too difficult to get right, and were a mistake, and are not supported, then the current text is OK, but possibly should be strengthened to: ... and MUST NOT be converted, either during IRI-to-URI conversion, or by applications allowing the user to enter an IRI. (although that could be phrased better). Jeremy
Received on Thursday, 20 December 2007 11:19:20 UTC