Another comment on draft-duerst-iri-bis-02

Summary: suggest swapping two sentences to be clearer that %-encoding of
*any* problematic character is permitted during data entry.


Hi Martin,

I hadn't realised that there was a draft - I too have an editorial
problemette.
I was actually trying to use the IRI RFC to resolve a problem from one
of my IRI library users, concerning an xpointer frag-ID involving [ and ].
I found this paragraph, and then had some problems.


    Protocols and formats that have used earlier
    definitions of IRIs including these characters MAY require percent-
    encoding of these characters as a preprocessing step to extract the
    actual IRI from a given field.  This preprocessing MAY also be used
    by applications allowing the user to enter an IRI.  Please note that
    the number sign ("#"), the percent sign ("%"), and the square bracket
    characters ("[", "]") are not part of the above list and MUST NOT be
    converted.


This paragraph is in the section about IRI to URI conversion.

The last sentence quoted initially appears clear. IRI library software
MUST NOT do such a conversion, and so

http://example.org/x#xpointer(foo[3])

is not an IRI, since nothing gets converted and it is not a URI.

But then to use something like xpointer, someone, somewhere, has to do
the conversion from [ ] to the appropriate % encodings. At first blush,
the MUST NOT in that last sentence, seems to prohibit everyone,
everywhere, which would then make such frag IDs unusable.

The MUST NOT is in the context of IRI to URI conversion, so, it probably
does not have the universal scope suggested in the first blush.

In terms of the obvious reading of the text, it also seems to cover the
data entry scenario covered in the preceeding sentence. For such a
scenario, it seems too strong.

So, in terms of effect, I would like to suggest that on data entry,
application specific behaviour to percent encode characters that would
otherwise be problematic is OK ... but during IRI processing the MUSTs
stand as is, with the MAY for backward compatibility.

I think text wise it is then best to invert the last two sentences,
since the MUST NOT refers to IRI-to-URI conversion, not to data entry.

e.g. replace quoted text with

    Protocols and formats that have used earlier
    definitions of IRIs including these characters MAY require percent-
    encoding of these characters as a preprocessing step to extract the
    actual IRI from a given field. Please note that
    the number sign ("#"), the percent sign ("%"), and the square bracket
    characters ("[", "]") are not part of the above list and MUST NOT be
    converted.

    Similarly, preprocessing involving percent encoding of otherwise
    problematic characters MAY also be used
    by applications allowing the user to enter an IRI.


I do not recall the full discussion concerning these issues. If, in
fact, there was a decision that frag IDs like #xpointer(foo[3]) are
simply too difficult to get right, and were a mistake, and are not
supported, then the current text is OK, but possibly should be
strengthened to:
                  ... and MUST NOT be
    converted, either during IRI-to-URI conversion, or by applications
    allowing the user to enter an IRI.

(although that could be phrased better).


Jeremy

Received on Thursday, 20 December 2007 11:19:20 UTC