Aligning Turtle and SPARQL escape sequence processing. from Andy Seaborne on 2011-11-22 (public-rdf-wg@w3.org from November 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 22 Nov 2011 15:56:20 +0000
To: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <4ECBC624.1060105@epimorphics.com>

There are two kinds of escapes:

1/ character escapes -- \t, \n \r \b \f \" \' \\

These represent a single character and turn off any special meaning like 
string delimiter or newline.

2/ unicode escapes :  \u1234 and \U12345678

These represent a single codepoint.

== SPARQL

SPARQL will change to adopt the Turtle process model (previously, 
unicode escapes were processed in the character stream before reaching 
the parser or tokenizer).

== Turtle

Comparing Turtle and SPARQL, there are a few minor differences still: 
these look more like minor oddities.

1/ Turtle does not have \f or \b character escapes. Turtle adds \>.
2/ Turtle has a bug for IRIs - \> can't be used!
3/ prefix names

There are special rules \" is only allowed in strings.

On 2:
\> is only allowed in IRIs by text (where it's illegal by IRI rules) but 
the grammar production does not allow a character escape sequence.

"<" ( [^<>\"{}|^`\\] - [#0000-#0020] )* ">"

== Changes

Suggested changes for Turtle:

T1/ Allow unicode escapes in prefixed names.

T2/ Only allow character escapes in strings, not IRIs (or prefix names 
but they aren't allowed in them at the moment).

T3/ Add \f and \b character escapes.

T4/ Remove \> (side effect of T2).



 Andy

Received on Tuesday, 22 November 2011 15:57:02 UTC