Re: [Fwd: SPARQL: Backslashes in string literals]

Dan Connolly wrote:
> On Thu, 2005-08-04 at 14:36 +0100, Seaborne, Andy wrote:
> [...]
> 
>>>Turtle follows N-Triples and picks just uppercase for hex \u & \U
>>>escapes (I think there was something in the older charmod drafts about
>>>having just one way to encode it).  I'd prefer to follow that [0-9]
>>>[A-F].
>>
>>Can't find anything in current charmod.
> 
> 
> There are several requirements/guidelines near
> http://www.w3.org/TR/2005/REC-charmod-20050215/#def-char-escape
> 
> e.g.
> 
> 
> C042  [S]  Specifications SHOULD NOT invent a new escaping mechanism if
> an appropriate one already exists.

\u is reasonably common (N-TRIPLES, Java, Python).

When in a XML protocol request &...; applies anyway.

> 
> and here's one that might bite a little harder:
> 
> C046 ... In particular, if a character is acceptable in identifiers and
> comments, then its escaped form should also be acceptable.

> 
> so charmod says this should work:
> 
>  SELECT ?foo\x0045bar WHERE { ?foo\x0045bar dc:title ?xyz }.

We can do one of:
1/ Extend \u and \U to apply to variables
2/ Make it so \u is on input processing (before tokenizing)
    so it works everywhere including comments and is transparent to
    parsing proper

1/ is probable clearer 2/ is probably easier to implement

A test is "SELEC\u0054"

Another is the comment

     # \u escapes need thinking about.

which is illegal by 2 but legal by 1.


C046 also says "this does not preclude that syntax-significant  characters, when 
escaped, lose their significance in the syntax."  so I think we don't have to do 
it for keywords or things like "?" but 2) would allow it.

For \t etc it only applies in strings.  I guess that comments are undefined so 
\t is ambiguous as to whether it is "\" and "t" or a tab.

 Andy

> 
> This is starting to look like a new issue... I was thinking of
> saying this is re-opening punctuationSyntax, but I think it's different.
> 
> 
>>Unless there is a single convention that will not catch people out, I prefer to 
>>leave both in - it's not clear to me that there is a convention (I write mine in 
>>upper case.)
> 
> 

Received on Thursday, 4 August 2005 15:58:14 UTC