Re: [TTL] Differences between SPARQL and Turtle. from Richard Cyganiak on 2011-04-27 (public-rdf-wg@w3.org from April 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 27 Apr 2011 18:38:53 +0100
To: Eric Prud'hommeaux <eric@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <C4528645-6FEF-446D-A8FB-3D7606F65654@cyganiak.de>

On 23 Apr 2011, at 20:27, Eric Prud'hommeaux wrote:
> SPARQL says \u002C is substituted with ',' *before* parsing (and ','
> isn't valid in local names).

I think this approach might be the right one for SPARQL, but it isn't very good for Turtle, for several reasons:

1. It makes content-type sniffing harder, because the telltale "@prefix" or "@base" near the beginning of a file could now be encoded as, say, \u0040prefix. (Sindice has to use content-type sniffing, because lots of web servers don't serve Turtle, N3, and RDF/XML with the correct media types.)

2. I was hoping that we can re-use a simplified version of the Turtle grammar to define N-Triples. N-Triples can be processed with line-based Unix tools such as grep, sed and awk. That would no longer be the case if the line breaks can be encoded as \u000A and the spaces as \u0020. So we either have to process Unicode escapes post-parse, or have different escape mechanisms for Turtle and N-Triples. The former is better, IMO.

3. This approach simplifies the lives of spec writer and parser implementers. It is potentially very confusing for document authors and data consumers. In the long run, letting the former groups do some extra work for the benefit of the latter sounds like a good trade-off.

4. This just has a bit of a WTF taste to it. Why add a feature that enables writing of 100% obfuscated documents?

Hence I'd argue that Unicode escapes in Turtle should only be allowed in string literals and in IRIs.

> We could potentially simplify the story for Turtle users by adding
> unicode escape sequences (I called them UCHARs) to qnames.
> 
>  @prefix α: <http://foo.example/bar#> .
>  <ab\u00E9xy> \u03B1:p "ab\u0022cd" .

I don't find this compelling. Prefixed names are syntactic sugar. They already support full Unicode (via UTF-8). If, for some reason, that doesn't work in some environment, then one can fall back to normal unabbreviated IRIs, where Unicode escapes will be allowed in any case.

Best,
Richard

Received on Wednesday, 27 April 2011 17:39:22 UTC