- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 4 May 2011 09:27:27 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
* Richard Cyganiak <richard@cyganiak.de> [2011-05-03 00:07+0100] > On 2 May 2011, at 20:11, Andy Seaborne wrote: > > # 4 RDF Collections as triple patterns > > > > 3 choices: > > > > A/ Remove from SPARQL. > > B/ Add to Turtle > > C/ Leave as is. Discourage use > > Happy to support whichever of A and B is easier for the editor. > > > # 8 Escape Processing > > Proposal: Adopt Turtle style / Change SPARQL. > > > > \u escapes can only appear in strings and IRIs Richard +1'd this on the basis that allowing \u in local names would confused users. I'm not convinced and suspect that the RDB2RDF WG would want to give their users a way to algorithmically write shorthand like: @prefix : <http://foo.example/DB/People/> . # triples for …People/ID=8 : :ID\u003d8 :fname "Bob" ; :lname "Smith" . # triples for …People/ID=9 : :ID\u003d9 :fname "Sue" ; :lname "Jones" . Andy's counter proposal was to add allowable chars to the local name, but I believe that allowing escape chars would be less controversial. > > Strict \u-escape in strings (STRING_LITERAL1,2 STRING_LITERAL_LONG1,2) and IRI_REF) > > > > \u do not appear in the grammar but are described separately as at present. > > +1 till here. What's the motivation for having this grammatical construct outside of the grammer? It's trivial to include: [[ [29] <IRI_REF> ::= "<" (( ( [^<>\"{}|^`\\] - [#0000- ] ) | UCHAR ))* ">" [46] <STRING_LITERAL1> ::= "'" ( ( [^'\\\n\r] ) | ECHAR | UCHAR )* "'" [47] <STRING_LITERAL2> ::= '"' ( ( [^\"\\\n\r] ) | ECHAR | UCHAR )* '"' [48] <STRING_LITERAL_LONG1> ::= "'''" ( ( "'" | "''" )? ( [^'\\] | ECHAR | UCHAR ) )* "'''" [49] <STRING_LITERAL_LONG2> ::= '"""' ( ( '"' | '""' )? ( [^\"\\] | ECHAR | UCHAR ) )* '"""' [50] <UCHAR> ::= ( "\\u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] ) | ( "\\U" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] ) ]] — http://www.w3.org/2005/01/yacker/uploads/turtleEsc?lang=perl&markup=html#term-turtleEsc-UCHAR (You can be even less disruptive if you add it to ECHAR: [91] ECHAR ::= '\\' [tbnrf\\"'] | ECHAR but that seems misleading.) > > Their use is discouraged: > > > > "4.3. String Escapes" > > > > """ > > \u and \U escapes should be avoided in UTF-8 charset formats. They are retained in the grammar for compatibility with N-triples formats currently deployed with charset US-ASCII. > > """ > > Unicode escapes can be a helpful fallback when some piece of the toolchain messes up the encoding; in such situations, they can be the only way to make things interoperate. > > Suggested rephrasing that doesn't restrict acceptable uses to backwards compatibility, and uses the RFC2119 SHOULD to be precise: > > """ > Unicode characters SHOULD be used directly instead of \u and \U escapes. > """ > > And in the N-Triples spec (if/wherever we create such a thing): > > """ > Note: Older versions of N-Triples required \u and \U escapes for all Unicode characters beyond the US-ASCII charset. Some older N-Triples parsers may still have that restriction and may not support UTF-8 encoded Unicode characters. > """ > > Richard -- -ericP
Received on Wednesday, 4 May 2011 13:27:57 UTC