- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 5 Aug 2005 15:28:03 -0400
- To: Dave Beckett <dave.beckett@bristol.ac.uk>
- Cc: Andy Seaborne <andy.seaborne@hp.com>, Steve Harris <S.W.Harris@ecs.soton.ac.uk>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
- Message-ID: <20050805192803.GA3162@w3.org>
On Fri, Aug 05, 2005 at 11:37:20AM +0100, Dave Beckett wrote:
>
> On Fri, 2005-08-05 at 10:03 +0100, Steve Harris wrote:
> > On Thu, Aug 04, 2005 at 04:56:17 +0100, Andy Seaborne wrote:
> > > >so charmod says this should work:
> > > >
> > > > SELECT ?foo\x0045bar WHERE { ?foo\x0045bar dc:title ?xyz }.
> > >
> > > We can do one of:
> > > 1/ Extend \u and \U to apply to variables
> > > 2/ Make it so \u is on input processing (before tokenizing)
> > > so it works everywhere including comments and is transparent to
> > > parsing proper
> > >
> > > 1/ is probable clearer 2/ is probably easier to implement
>
> 2/ is way harder to implement for me. The lexer and parser I've been
> using do not work on unicode code points so adding an extra layer in
> there will substantially complicate things. I will likely ignore 2/ for
> some time if it was chosen and mark any tests for it as will-not-pass.
>
> > > A test is "SELEC\u0054"
> > >
> > > Another is the comment
> > >
> > > # \u escapes need thinking about.
> > >
> > > which is illegal by 2 but legal by 1.
> >
> > Unless comment processing is also done on input (ala the C pre-processor)
> > I dont know if I like the idea of
> > /* comment \u002A/
> > being a valid comment
>
> Since comments have no in-language interpretation, they can include any
> printable byte that matches the grammar. If utf8 items or \uxxx are in
> comments, software doesn't care.
I think Steve meant that one of the comment boundries was formed by
the escape sequence. For example:
SELECT * # get it all\nWHERE {?x\u0020foo:bar\u0020?y}
(un-escapes to:
SELECT * # get it all
WHERE {?x foo:bar ?y}
This would be a possible side effect of taking Andy's option 2 (if escape
expansion is done prior to, rather than during tokenizing).
--
-eric
office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
Shonan Fujisawa Campus, Keio University,
5322 Endo, Fujisawa, Kanagawa 252-8520
JAPAN
+1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell: +81.90.6533.3882
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Friday, 5 August 2005 19:28:06 UTC