- From: Dan Connolly <connolly@w3.org>
- Date: Fri, 18 Aug 2006 14:27:37 -0500
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: Jiri Dokulil <dokulil@gmail.com>, public-rdf-dawg-comments@w3.org
On Fri, 2006-08-18 at 19:33 +0100, Seaborne, Andy wrote: > > > Jiri Dokulil wrote: > > > > I am not sure how should scanner for SPARQL determine whether '<' > > character it encountered is beginning of an IRI or a comparison > > operator. > > > > Consider these queries: > > > > SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b && ?c>?d) } > > SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b&&?c>?d) } > > > > Yacker validator results look troubling to me: > > http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html〈=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb+%26%26+%3Fc%3E%3Fd%29+%7D&action=validate+text > > > > http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html〈=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb%26%26%3Fc%3E%3Fd%29+%7D%0D%0A&action=validate+text > > > > > > The first query validates, the other does not. I believe that's by design, as Andy explained below. Andy, let's get this case in the test suite and ask the WG to confirm. Is that convenient for you to do? Thanks, Jiri, for the careful review. > The rule "longest token wins" resolves the tokenizing problem (and is common > practice in lexers because it also means 123 is a single number, not 3 > individual one digit numbers) although it moves the problem to the grammar. > > It could be disambiguated but it needs more than changes to the lexer. It > needs a context sensitive lexer (< and an IRI can't occur in the same place in > a valid expression, after ?a seeing < must be a comparison in a legal > expression). The WG has chosen to cover the wider range of parser toolkits, > rather than chose the more complicated context sensitive approach. > > I'll look at adding an editorial note that highlights this better. It does > already say: > > http://www.w3.org/TR/rdf-sparql-query/#whitespace > """ > White space (production WS) is used to separate two terminals which would > otherwise be (mis-)recognized as one terminal. > """ > which already covers this case. > > I hope that this message addresses you comment. If it does, please let us know > - if you put [CLOSED] in the subject line, it will help scripts that help > manage this list. > > Andy -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Received on Friday, 18 August 2006 19:27:48 UTC