Re: [TTL] Differences between SPARQL and Turtle. from Eric Prud'hommeaux on 2011-05-04 (public-rdf-wg@w3.org from May 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 4 May 2011 09:27:27 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-ID: <20110504132725.GD25022@w3.org>

* Richard Cyganiak <richard@cyganiak.de> [2011-05-03 00:07+0100]
> On 2 May 2011, at 20:11, Andy Seaborne wrote:
> > # 4 RDF Collections as triple patterns
> > 
> > 3 choices:
> > 
> > A/ Remove from SPARQL.
> > B/ Add to Turtle
> > C/ Leave as is.  Discourage use
> 
> Happy to support whichever of A and B is easier for the editor.
> 
> > # 8 Escape Processing
> > Proposal: Adopt Turtle style / Change SPARQL.
> > 
> > \u escapes can only appear in strings and IRIs

Richard +1'd this on the basis that allowing \u in local names would
confused users. I'm not convinced and suspect that the RDB2RDF WG
would want to give their users a way to algorithmically write
shorthand like:

  @prefix : <http://foo.example/DB/People/> .
  # triples for …People/ID=8 :
  :ID\u003d8 :fname "Bob" ; :lname "Smith" .
  # triples for …People/ID=9 :
  :ID\u003d9 :fname "Sue" ; :lname "Jones" .

Andy's counter proposal was to add allowable chars to the local name,
but I believe that allowing escape chars would be less controversial.


> > Strict \u-escape in strings (STRING_LITERAL1,2 STRING_LITERAL_LONG1,2) and IRI_REF)
> > 
> > \u do not appear in the grammar but are described separately as at present.  
> 
> +1 till here.

What's the motivation for having this grammatical construct outside of
the grammer? It's trivial to include:

[[
[29] <IRI_REF> ::= "<" (( ( [^<>\"{}|^`\\] - [#0000- ] ) | UCHAR ))* ">"
[46] <STRING_LITERAL1>      ::= "'" ( ( [^'\\\n\r] ) | ECHAR | UCHAR )* "'"
[47] <STRING_LITERAL2>      ::= '"' ( ( [^\"\\\n\r] ) | ECHAR | UCHAR )* '"'
[48] <STRING_LITERAL_LONG1> ::= "'''" ( ( "'" | "''" )? ( [^'\\] | ECHAR | UCHAR ) )* "'''"
[49] <STRING_LITERAL_LONG2> ::= '"""' ( ( '"' | '""' )? ( [^\"\\] | ECHAR | UCHAR ) )* '"""'
[50] <UCHAR>   ::= ( "\\u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] )
                 | ( "\\U" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] )
]] — http://www.w3.org/2005/01/yacker/uploads/turtleEsc?lang=perl&markup=html#term-turtleEsc-UCHAR

(You can be even less disruptive if you add it to ECHAR:
  [91] ECHAR ::= '\\' [tbnrf\\"'] | ECHAR
 but that seems misleading.)


> > Their use is discouraged:
> > 
> > "4.3. String Escapes"
> > 
> > """
> > \u and \U escapes should be avoided in UTF-8 charset formats. They are retained in the grammar for compatibility with N-triples formats currently deployed with charset US-ASCII.
> > """
> 
> Unicode escapes can be a helpful fallback when some piece of the toolchain messes up the encoding; in such situations, they can be the only way to make things interoperate.
> 
> Suggested rephrasing that doesn't restrict acceptable uses to backwards compatibility, and uses the RFC2119 SHOULD to be precise:
> 
> """
> Unicode characters SHOULD be used directly instead of \u and \U escapes.
> """
> 
> And in the N-Triples spec (if/wherever we create such a thing):
> 
> """
> Note: Older versions of N-Triples required \u and \U escapes for all Unicode characters beyond the US-ASCII charset. Some older N-Triples parsers may still have that restriction and may not support UTF-8 encoded Unicode characters.
> """
> 
> Richard

-- 
-ericP

Received on Wednesday, 4 May 2011 13:27:57 UTC