[Fwd: Re: Determining whether '<' is a beginning of IRI or 'less than' operator [OK?]]

To do these tests, we need negative syntax tests:

http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0246

For reference: Eric's alternative (which I think is unnecessary because it 
changes existing test manifests):
http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0052.html

 Andy

-------- Original Message --------
Subject: Re: Determining whether '<' is a beginning of IRI or 'less than' 
operator [OK?]
Date: Fri, 18 Aug 2006 20:27:37 +0100
From: Dan Connolly <connolly@w3.org>
Organization: W3C (http://www.w3.org/)
To: Seaborne, Andy <andy.seaborne@hp.com>
CC: Jiri Dokulil <dokulil@gmail.com>, <public-rdf-dawg-comments@w3.org>
References: <6a8224ab0608180833r31f8c81flb4d0c3037286aab3@mail.gmail.com> 
<44E60817.10308@hp.com>

On Fri, 2006-08-18 at 19:33 +0100, Seaborne, Andy wrote:
> 
> 
> Jiri Dokulil wrote:
> > 
> > I am not sure how should scanner for SPARQL determine whether '<'
> > character it encountered is beginning of an IRI or a comparison
> > operator.
> > 
> > Consider these queries:
> > 
> > SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b && ?c>?d) }
> > SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b&&?c>?d) }
> > 
> > Yacker validator results look troubling to me:
> > http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html〈=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb+%26%26+%3Fc%3E%3Fd%29+%7D&action=validate+text 
> > 
> > http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html〈=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb%26%26%3Fc%3E%3Fd%29+%7D%0D%0A&action=validate+text 
> > 
> > 
> > The first query validates, the other does not.

I believe that's by design, as Andy explained below.

Andy, let's get this case in the test suite and ask
the WG to confirm. Is that convenient for you to do?

Thanks, Jiri, for the careful review.

> The rule "longest token wins" resolves the tokenizing problem (and is common 
> practice in lexers because it also means 123 is a single number, not 3 
> individual one digit numbers) although it moves the problem to the grammar.
> 
> It could be disambiguated but it needs more than changes to the lexer.  It 
> needs a context sensitive lexer (< and an IRI can't occur in the same place in 
> a valid expression, after ?a seeing < must be a comparison in a legal 
> expression).  The WG has chosen to cover the wider range of parser toolkits, 
> rather than chose the more complicated context sensitive approach.
> 
> I'll look at adding an editorial note that highlights this better. It does 
> already say:
> 
> http://www.w3.org/TR/rdf-sparql-query/#whitespace
> """
> White space (production WS) is used to separate two terminals which would 
> otherwise be (mis-)recognized as one terminal.
> """
> which already covers this case.
> 
> I hope that this message addresses you comment. If it does, please let us know 
> - if you put [CLOSED] in the subject line, it will help scripts that help 
> manage this list.
> 
>  Andy
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Monday, 21 August 2006 16:36:10 UTC