explaination of I18N-ISSUE-188: special handling of % in IRI

* Martin J. Dürst <duerst@it.aoyama.ac.jp>  [2012-09-08 13:30+0900]
> On 2012/09/08 0:49, Internationalization Core Working Group Issue 
> Tracker wrote:
> > I18N-ISSUE-188: special handling of % in IRI [TURTLE]
> >
> > http://www.w3.org/International/track/issues/188
> >
> > Raised by: Addison Phillips
> > On product: TURTLE
> >
> > http://www.w3.org/2012/08/22-i18n-minutes.html#item05
> >
> > Section 6.4 contains this Note:
> >
> > --
> > %-encoded sequences are in the character range for IRIs and are explicitly allowed in local names. These appear as a '%' followed by two hex characters and represent that same sequence of three characters. These sequences are not decoded during processing. A term written as<http://a.example/%66oo-bar>  in Turtle designates the IRI http://a.example/%66oo-bar and not IRI http://a.example/foo-bar. A term written as ex:%66oo-bar with a prefix @prefix ex:<http://a.example/>  also designates the IRI http://a.example/%66oo-bar.
> 
> > We don't understand why you do this. Can you clarify?
> 
> I'm not speaking for the RDF/TURTLE WG, but RDF (and therefore TURTLE) 
> are doing IRI comparisons strictly character-by-character (see e.g. 
> http://tools.ietf.org/html/rfc3987#section-5.3.1), the same as it is 
> done in XML Namespaces.

The RDF model uses IRIs as identifiers; Turtle merely provides a
serialization of that model. IRIs include %dd sequences, e.g.
<http://伝言.example/?user=أكرم&amp;channel=R%26D>. (It is quite
reasonable that such an IRI would include a '%' as otherwise the
corresponding URL
<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85&amp;channel=R%26D>
would have an extra form-url-encoded parameter from "R&D".)

Turtle could de-escape one level of %s, but that would be pretty
arbitrary behavior, having the unfortunate effect of requiring anyone
composing Turtle to first %-escape RDF's IRIs, including % sequences
for any characters in reserved | unreserved | escaped.


> It would probably help if this was pointed out more explicitly in the 
> above text.

I think the explanation would be long and would teach people general
rules about designing languages with appropriate escaping. I think
we're best off saying "Turtle parsers don't do anything with '%dd'."

Please indicate whether this response addresses the issue.
-- 
-ericP

Received on Tuesday, 2 October 2012 05:04:57 UTC