Re: [Fwd: Re: Determining whether '<' is a beginning of IRI or 'less than' operator [OK?]]

I have at least updated the (unapproved - see the README.txt) SyntaxDev tests 
to includes tests with IRIs like <?b> and negative tests for ?a<?b&&?c>?d

 Andy

Seaborne, Andy wrote:
> To do these tests, we need negative syntax tests:
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0246
> 
> For reference: Eric's alternative (which I think is unnecessary because it 
> changes existing test manifests):
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0052.html
> 
>  Andy
> 
> -------- Original Message --------
> Subject: Re: Determining whether '<' is a beginning of IRI or 'less than' 
> operator [OK?]
> Date: Fri, 18 Aug 2006 20:27:37 +0100
> From: Dan Connolly <connolly@w3.org>
> Organization: W3C (http://www.w3.org/)
> To: Seaborne, Andy <andy.seaborne@hp.com>
> CC: Jiri Dokulil <dokulil@gmail.com>, <public-rdf-dawg-comments@w3.org>
> References: <6a8224ab0608180833r31f8c81flb4d0c3037286aab3@mail.gmail.com> 
> <44E60817.10308@hp.com>
> 
> On Fri, 2006-08-18 at 19:33 +0100, Seaborne, Andy wrote:
>>
>> Jiri Dokulil wrote:
>>> I am not sure how should scanner for SPARQL determine whether '<'
>>> character it encountered is beginning of an IRI or a comparison
>>> operator.
>>>
>>> Consider these queries:
>>>
>>> SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b && ?c>?d) }
>>> SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b&&?c>?d) }
>>>
>>> Yacker validator results look troubling to me:
>>> http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html〈=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb+%26%26+%3Fc%3E%3Fd%29+%7D&action=validate+text 
>>>
>>> http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html〈=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb%26%26%3Fc%3E%3Fd%29+%7D%0D%0A&action=validate+text 
>>>
>>>
>>> The first query validates, the other does not.
> 
> I believe that's by design, as Andy explained below.
> 
> Andy, let's get this case in the test suite and ask
> the WG to confirm. Is that convenient for you to do?
> 
> Thanks, Jiri, for the careful review.
> 
>> The rule "longest token wins" resolves the tokenizing problem (and is common 
>> practice in lexers because it also means 123 is a single number, not 3 
>> individual one digit numbers) although it moves the problem to the grammar.
>>
>> It could be disambiguated but it needs more than changes to the lexer.  It 
>> needs a context sensitive lexer (< and an IRI can't occur in the same place in 
>> a valid expression, after ?a seeing < must be a comparison in a legal 
>> expression).  The WG has chosen to cover the wider range of parser toolkits, 
>> rather than chose the more complicated context sensitive approach.
>>
>> I'll look at adding an editorial note that highlights this better. It does 
>> already say:
>>
>> http://www.w3.org/TR/rdf-sparql-query/#whitespace
>> """
>> White space (production WS) is used to separate two terminals which would 
>> otherwise be (mis-)recognized as one terminal.
>> """
>> which already covers this case.
>>
>> I hope that this message addresses you comment. If it does, please let us know 
>> - if you put [CLOSED] in the subject line, it will help scripts that help 
>> manage this list.
>>
>>  Andy

Received on Monday, 21 August 2006 17:03:41 UTC