- From: Jiri Dokulil <dokulil@gmail.com>
- Date: Fri, 18 Aug 2006 21:15:04 +0200
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: public-rdf-dawg-comments@w3.org
On 8/18/06, Seaborne, Andy <andy.seaborne@hp.com> wrote: > > > Jiri Dokulil wrote: > > > > I am not sure how should scanner for SPARQL determine whether '<' > > character it encountered is beginning of an IRI or a comparison > > operator. > > > > Consider these queries: > > > > SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b && ?c>?d) } > > SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b&&?c>?d) } > > > > Yacker validator results look troubling to me: > > http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html&lang=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb+%26%26+%3Fc%3E%3Fd%29+%7D&action=validate+text > > > > http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html&lang=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb%26%26%3Fc%3E%3Fd%29+%7D%0D%0A&action=validate+text > > > > > > The first query validates, the other does not. > > My guess is that the validator uses some flex-like scanner, that > > prefers the longest tokens. In the first case "<?b && ?c>" can't be > > parsed as IRI because of the spaces, so the scanner falls back and > > 'less than' rule is picked. > > On the other hand, "<?b&&?c>" is a valid (according to the grammar) > > IRI. But 'variable iri variable' is not a valid FILTER condition and > > the parser rejects the query. > > > > The problem is more obvious for scanners with one character > > look-ahead, because they are completely unable to distinguish these > > two cases. > > They also have the same problem with () and [] tokens (NIL and ANON > > terminals) but that can easily be solved by going from LL(1) to LL(2). > > > > Jiri Dokulil > > Because the characters < and > are overloaded for IRIs and for comparison > operators there is a potential ambiguity. The SPARQL grammar handles IRI in > two ways - the general grammar rule that is simple and covers any IRI scheme, > but then replies on further validating by an IRI parser. No objection about the simple rule. I don't expect the grammar to provide advanced checks. > > For the http: scheme, <?b> is a valid IRI, as is <?b&&1>. ? and & are legal in > an HTTP URL. > > For example: > > > BASE <http://example/page> > PREFIX : <http://example/ns#> > > ASK { <?b> :p <?b&&1> } > > > > 1 BASE <http://example/page> > 2 PREFIX : <http://example/ns#> > 3 > 4 ASK > 5 WHERE > 6 { <http://example/page?b> > 7 :p <http://example/page?b&&1> . > 8 } > > <?b> is a relative URL relative to base <http://example/page> > That is <http://example/page?b> > > The rule "longest token wins" resolves the tokenizing problem (and is common > practice in lexers because it also means 123 is a single number, not 3 > individual one digit numbers) although it moves the problem to the grammar. > > It could be disambiguated but it needs more than changes to the lexer. It > needs a context sensitive lexer (< and an IRI can't occur in the same place in > a valid expression, after ?a seeing < must be a comparison in a legal > expression). The WG has chosen to cover the wider range of parser toolkits, > rather than chose the more complicated context sensitive approach. Again, no objection here. In fact, the '<' is an issue for me because the lexer I used is too weak to handle even this. > > I'll look at adding an editorial note that highlights this better. It does > already say: > > http://www.w3.org/TR/rdf-sparql-query/#whitespace > """ > White space (production WS) is used to separate two terminals which would > otherwise be (mis-)recognized as one terminal. > """ > which already covers this case. > > I hope that this message addresses you comment. If it does, please let us know > - if you put [CLOSED] in the subject line, it will help scripts that help > manage this list. Thanks for the explanation. It certainly clarified the way SPARQL queries should be parsed. Still, I'm not happy with this solution because it makes the -otherwise simple- language complicated and somewhat tricky. Using an operator as a string delimiter seems highly unusual to me. Unfortunately it is obviously way too late to do anything about this, so I'll have to cope with the problem (I'm creating a SPARQL implementation to experiment with). Thanks again for the explanation. Jiri Dokulil
Received on Friday, 18 August 2006 19:15:21 UTC