W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2005

Re: [Fwd: SPARQL: Backslashes in string literals]

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Fri, 05 Aug 2005 11:41:03 +0100
To: andy.seaborne@hp.com
Cc: Dan Connolly <connolly@w3.org>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <1123238464.28895.10.camel@hoth.ilrt.bris.ac.uk>

On Thu, 2005-08-04 at 16:56 +0100, Seaborne, Andy wrote:
> 
> Dan Connolly wrote:
> > On Thu, 2005-08-04 at 14:36 +0100, Seaborne, Andy wrote:
> > [...]
> > 
> >>>Turtle follows N-Triples and picks just uppercase for hex \u & \U
> >>>escapes (I think there was something in the older charmod drafts about
> >>>having just one way to encode it).  I'd prefer to follow that [0-9]
> >>>[A-F].
> >>
> >>Can't find anything in current charmod.
> > 
> > 
> > There are several requirements/guidelines near
> > http://www.w3.org/TR/2005/REC-charmod-20050215/#def-char-escape
> > 
> > e.g.
> > 
> > 
> > C042  [S]  Specifications SHOULD NOT invent a new escaping mechanism if
> > an appropriate one already exists.
> 
> \u is reasonably common (N-TRIPLES, Java, Python).

When I added it along with \U to N-Triples I think I did it as it
matched the style of the language - other \-escapes existed - and there
was no other existing escape mechanism such as & as mentioned:

> When in a XML protocol request &...; applies anyway.
> 
> > 
> > and here's one that might bite a little harder:
> > 
> > C046 ... In particular, if a character is acceptable in identifiers and
> > comments, then its escaped form should also be acceptable.
> 
> > 
> > so charmod says this should work:
> > 
> >  SELECT ?foo\x0045bar WHERE { ?foo\x0045bar dc:title ?xyz }.
> 
> We can do one of:
> 1/ Extend \u and \U to apply to variables
> 2/ Make it so \u is on input processing (before tokenizing)
>     so it works everywhere including comments and is transparent to
>     parsing proper
> 
> 1/ is probable clearer 2/ is probably easier to implement

I just replied that 1/ is much easier for me:

http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0199.html

> A test is "SELEC\u0054"
> 
> Another is the comment
> 
>      # \u escapes need thinking about.
> 
> which is illegal by 2 but legal by 1.
> 
> 
> C046 also says "this does not preclude that syntax-significant  characters, when 
> escaped, lose their significance in the syntax."  so I think we don't have to do 
> it for keywords or things like "?" but 2) would allow it.
> 
> For \t etc it only applies in strings.  I guess that comments are undefined so 
> \t is ambiguous as to whether it is "\" and "t" or a tab.

\-escapes don't matter in comments as we provide no interpretation of
them.  They are allowed as it's in the sequence of characters allowed in
comments.

Dave
Received on Friday, 5 August 2005 10:41:09 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:24 GMT