Re: Aligning Turtle and SPARQL escape sequence processing.

On Tue, Nov 22, 2011 at 10:56 AM, Andy Seaborne <
andy.seaborne@epimorphics.com> wrote:

> There are two kinds of escapes:
>
> 1/ character escapes -- \t, \n \r \b \f \" \' \\
>
> These represent a single character and turn off any special meaning like
> string delimiter or newline.
>
> 2/ unicode escapes :  \u1234 and \U12345678
>
> These represent a single codepoint.
>
> == SPARQL
>
> SPARQL will change to adopt the Turtle process model (previously, unicode
> escapes were processed in the character stream before reaching the parser
> or tokenizer).
>
> == Turtle
>
> Comparing Turtle and SPARQL, there are a few minor differences still:
> these look more like minor oddities.
>
> 1/ Turtle does not have \f or \b character escapes. Turtle adds \>.
> 2/ Turtle has a bug for IRIs - \> can't be used!
> 3/ prefix names
>
> There are special rules \" is only allowed in strings.
>
> On 2:
> \> is only allowed in IRIs by text (where it's illegal by IRI rules) but
> the grammar production does not allow a character escape sequence.
>
> "<" ( [^<>\"{}|^`\\] - [#0000-#0020] )* ">"
>

Where did this come from? The IRI production in the Turtle editor's draft
does allow Unicode escapes:

"<" ( [^<>\"{}|^`\\] - [#0000-#0000] | UCHAR )* ">"


>
> == Changes
>
> Suggested changes for Turtle:
>
> T1/ Allow unicode escapes in prefixed names.
>

+0 - I don't see the harm in adding it, but I don't see it as being all
that useful either.

Would you like to allow unrestricted use of unicode escapes in prefixed
names, or only unicode escapes for characters that are legal in the
position where the escape appears?


>
> T2/ Only allow character escapes in strings, not IRIs (or prefix names but
> they aren't allowed in them at the moment).
>

+1


>
> T3/ Add \f and \b character escapes.
>

+1


>
> T4/ Remove \> (side effect of T2).


+1

-Alex


>
>
>
>
>        Andy
>
>
>

Received on Tuesday, 22 November 2011 20:35:57 UTC