Re: Aligning Turtle and SPARQL escape sequence processing.

On 22 Nov 2011, at 21:09, Alex Hall wrote:
>> The current situation around escaping in RDF is already a glorious mess. Let me illustrate this with an example, let's say querying DBpedia:
> 
> What exactly are these strings/IRIs/prefixed names intended to represent? Terms in a SPARQL query? Terms in a Turtle document?

SPARQL.

> If you're talking about terms in a Turtle document, some of your examples below don't line up with my reading of the Turtle editor's draft.

The examples are actual data from DBpedia.

>>     // Special characters in literals…?
>> 
>>    "Éire"      – Works!
>>    "\u00C9ire" - Works!
>> 
>>    // Ok, easy enough. What about IRIs?
>> 
>>    <http://dbpedia.org/resource/Éire>      – Doesn't work :-(
> 
> Why not? É is a legal IRI character in Turtle.

Because it's not an actual DBpedia IRI. DBpedia uses %C3%89, not É.

>>    <http://dbpedia.org/resource/\u00C9ire> – Doesn't work :-(
> 
> Why not? Turtle allows Unicode escapes in IRIs.

See above.

>>     <http://dbpedia.org/resource/%C3%89ire> – Works!
> 
> Works in the sense that it's a legal IRI, but it's the IRI "http://dbpedia.org/resource/%C3%89ire"

Yes – which is an actual DBpedia IRI.

> which is not the same as the IRI "http://dbpedia.org/resource/Éire"

Which is not an actual DBpedia IRI.

> (although an application might normalize it as such).

That would violate several specs.

>>     // Strange… So what about prefixed names?
>> 
>>    dbpedia:%C3%89ire       – Doesn't work :-(
> 
> I wouldn't expect that to work. I don't know of any format that supports percent-encoding with prefixed names.

This example would work in RDFa.

>>    dbpedia:Éire            – Doesn't work :-(
> 
> Why not? É is a legal pname character in Turtle.

Because in DBpedia uses %C3%89, not É.

>>    dbpedia:\u00C9ire       – Doesn't work :-(
>>    dbpedia:\u00C3\u0089ire – Doesn't work :-(
> 
> I REALLY wouldn't expect this one to work.  Is there any format where "\u00C3\u0089" will produce "É"?

Hopefully not! The point is the next sentence.

>> Now the proposal adds to that mess by adding *another* way of writing things differently with *no* increase in expressivity. (The results for all the cases above are unaffected by the proposal – the DBpedia IRI simply cannot be written as a prefixed name.)
…
> I should also point out that the Turtle editor's draft DOES allow Unicode escapes in prefixed names, so it's removing them that would be a change.

So the editors did a change from the Team Submission without WG consensus.

Best,
Richard

Received on Tuesday, 22 November 2011 23:00:40 UTC