Re: [TTL] Differences between SPARQL and Turtle. from Andy Seaborne on 2011-04-24 (public-rdf-wg@w3.org from April 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Sun, 24 Apr 2011 17:40:49 +0100
To: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <4DB45291.2090504@epimorphics.com>

On 23/04/11 20:27, Eric Prud'hommeaux wrote:
> * Andy Seaborne<andy.seaborne@epimorphics.com>  [2011-04-23 17:33+0100]
>> (resent with note of ISSUE-1 for trackbot)
>>
>> RDF-WG ISSUE-1
>> http://www.w3.org/2011/rdf-wg/track/issues/1
>>
>>
>> I've gathered the differences together into a live document
>>
>> http://www.w3.org/2011/rdf-wg/wiki/Diff_SPARQL_Turtle#Relevant_RDF_WG_Decisions
>>
>>
>> And added a new one: Turtle and SPARQL treat \u escape processing
>> differently because they happen at different times in the parsing process.
>
> +1
>
> I've had a hard time defending the fact that one can't simply escape
> characters in PNames (SPARQL's QNames). This comes up in DB dumps, e.g.
>
>    PREFIX p:<http://foo.example/db/People#>  .
>    SELECT ?who ?dept WHERE {
>      ?who p:deptName\u002CdeptCity ?dept
>    }
>
> SPARQL says \u002C is substituted with ',' *before* parsing (and ','
> isn't valid in local names).
>
>
> We could potentially simplify the story for Turtle users by adding
> unicode escape sequences (I called them UCHARs) to qnames. I hacked
> this up in a grammar called turtleEsc http://w3.org/brief/MjM0 . It
> validates strings like:
>
>    @prefix α:<http://foo.example/bar#>  .
>    <ab\u00E9xy>  \u03B1:p "ab\u0022cd" .
>
> and is, IMO, pretty easy to explain to users. The downside is that
> we lose grammar control over folks adding chars like [<>  ] to IRIs
> (i.e. left to semantic validation) but I believe it's still better
> than making PNames un-escapable.

Turtle already has a mechanism for in-parsing quoting using \ as in 
"abc\"def\". That form of \u adds another mechanism.

Surely it would be better to allow a style of \-escapes in prefixed 
names if we want to escape char in? Or change the prefix name rules to 
allow (internal) ","?

\u is a way to input characters that are not on the local keyboard, or 
the need to input a codepoint in the charset that does not have that 
codepoint available.

This does not apply to UTF-8, but it does apply to "text/turtle" because 
that's US-ASCII. (please use "text/turtle;charset=utf-8"!). reserving \u 
for that purposes seems prudent.

The \u mechanism is very general.

<ab\u0020xy>
<ab xy>

Making it easier to try to put spaces into IRIs seems to me to be a bad 
idea.  There is already confusion in this area and the RDF URI reference 
to IRI change isn't going to make it any easier.

You can't rely on the receiving parser to do and complete IRI-parsing 
which is complicated and expensive.  How many systems do full IRI checking?

Test your local parser with this N-Triples file:
---------
    <http://example/> <http://example/[]/g> "foo" .
    <http://example/> <http://example/ /g> "foo" .
---------

Related:

I do think its unfortunate that % is not allowed in the local part of 
prefix names.

The correct fix is to allow it in % in PN_LOCAL (in Turtle and SPARQL).

 Andy

Received on Sunday, 24 April 2011 16:41:15 UTC