Re: Aligning Turtle and SPARQL escape sequence processing. from Alex Hall on 2011-11-22 (public-rdf-wg@w3.org from November 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Tue, 22 Nov 2011 16:09:25 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, RDF-WG <public-rdf-wg@w3.org>
Message-ID: <CAFq2bixCeUZA8BfiN0eoP=YBGH_1ZrebivmrS4Ftg-52aUXcag@mail.gmail.com>

Hi Richard,

On Tue, Nov 22, 2011 at 2:48 PM, Richard Cyganiak <richard@cyganiak.de>wrote:
>
> The current situation around escaping in RDF is already a glorious mess.
> Let me illustrate this with an example, let's say querying DBpedia:
>

What exactly are these strings/IRIs/prefixed names intended to represent?
Terms in a SPARQL query? Terms in a Turtle document?

If you're talking about terms in a Turtle document, some of your examples
below don't line up with my reading of the Turtle editor's draft.

>
>    // Special characters in literals…?
>
>    "Éire"      – Works!
>    "\u00C9ire" - Works!
>
>    // Ok, easy enough. What about IRIs?
>
>    <http://dbpedia.org/resource/Éire>      – Doesn't work :-(
>

Why not? É is a legal IRI character in Turtle.

>    <http://dbpedia.org/resource/\u00C9ire> – Doesn't work :-(
>

Why not? Turtle allows Unicode escapes in IRIs.

>    <http://dbpedia.org/resource/%C3%89ire> – Works!
>

Works in the sense that it's a legal IRI, but it's the IRI "
http://dbpedia.org/resource/%C3%89ire" which is not the same as the IRI "
http://dbpedia.org/resource/Éire" (although an application might normalize
it as such).

>
>    // Strange… So what about prefixed names?
>
>    dbpedia:%C3%89ire       – Doesn't work :-(
>

I wouldn't expect that to work. I don't know of any format that supports
percent-encoding with prefixed names.

>    dbpedia:Éire            – Doesn't work :-(
>

Why not? É is a legal pname character in Turtle.

>    dbpedia:\u00C9ire       – Doesn't work :-(
>    dbpedia:\u00C3\u0089ire – Doesn't work :-(
>

I REALLY wouldn't expect this one to work.  Is there any format where
"\u00C3\u0089" will produce "É"?

>
>    // Oh well, back to IRIs I guess.
>
> Now the proposal adds to that mess by adding *another* way of writing
> things differently with *no* increase in expressivity. (The results for all
> the cases above are unaffected by the proposal – the DBpedia IRI simply
> cannot be written as a prefixed name.)
>
> As it stands, none of the following IRIs can be written as prefixed names
> – they all have to be written as full IRIs:
>
>   1. <%C3%89ire>
>   2. <search?q=eire>
>   3. <Galway,_Ireland>
>   4. <Éire> if you don't know how to type É but know that you can use
> \u00C9 instead
>   5. <U.S.>
>   6. <United%20Kingdom>
>
> The proposal adds a whole bunch of complexity to the story that one needs
> to tell to explain how the hell prefixed names work, and what we get in
> return is a solution for the case that matters least – number 4 – while all
> the others still don't work and require falling back to full IRIs.
>
> Escaping in IRIs and literals is necessary for backwards compatibility and
> for Oracle's ASCII-Triples. Adding escaping to prefixed names is *not*
> necessary as there is already a way of escaping them: expand to a full IRI
> and use unicode escapes there.
>

I agree that it's more complexity for questionably more benefits, but I
don't particularly object to the presence of Unicode escapes in prefixed
names.  If users are finding prefixed names insufficient, I'd rather spend
the energy trying to figure out how to make CURIEs work in Turtle than
adding Unicode escapes to prefixed names.

I should also point out that the Turtle editor's draft DOES allow Unicode
escapes in prefixed names, so it's removing them that would be a change.

-Alex

>
> Best,
> Richard
>

Received on Tuesday, 22 November 2011 21:10:21 UTC