Re: [Fwd: SPARQL: Backslashes in string literals] from Dave Beckett on 2005-08-05 (public-rdf-dawg@w3.org from July to September 2005)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Fri, 05 Aug 2005 11:37:20 +0100
To: Andy Seaborne <andy.seaborne@hp.com>, Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <1123238240.28895.6.camel@hoth.ilrt.bris.ac.uk>

On Fri, 2005-08-05 at 10:03 +0100, Steve Harris wrote:
> On Thu, Aug 04, 2005 at 04:56:17 +0100, Andy Seaborne wrote:
> > >so charmod says this should work:
> > >
> > > SELECT ?foo\x0045bar WHERE { ?foo\x0045bar dc:title ?xyz }.
> > 
> > We can do one of:
> > 1/ Extend \u and \U to apply to variables
> > 2/ Make it so \u is on input processing (before tokenizing)
> >    so it works everywhere including comments and is transparent to
> >    parsing proper
> > 
> > 1/ is probable clearer 2/ is probably easier to implement

2/ is way harder to implement for me.  The lexer and parser I've been
using do not work on unicode code points so adding an extra layer in
there will substantially complicate things.  I will likely ignore 2/ for
some time if it was chosen and mark any tests for it as will-not-pass.

> > A test is "SELEC\u0054"
> > 
> > Another is the comment
> > 
> >     # \u escapes need thinking about.
> > 
> > which is illegal by 2 but legal by 1.
> 
> Unless comment processing is also done on input (ala the C pre-processor)
> I dont know if I like the idea of
> 	/* comment \u002A/
> being a valid comment

Since comments have no in-language interpretation, they can include any
printable byte that matches the grammar.  If utf8 items or \uxxx are in
comments, software doesn't care.

Dave

Received on Friday, 5 August 2005 10:37:40 UTC