W3C home > Mailing lists > Public > public-rdf-wg@w3.org > February 2011

Re: [Turtle] Grammar ambiguity?

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 28 Feb 2011 10:39:07 +0000
Message-ID: <4D6B7B4B.8050108@epimorphics.com>
To: nathan@webr3.org
CC: Yves Raimond <Yves.Raimond@bbc.co.uk>, public-rdf-wg@w3.org
Good example.  Make sure that's added to the test suite!


On 28/02/11 10:12, Nathan wrote:
> Hi Yves,
>
> Integer in your example, given that the EBNF should match the statement
> first thus removing the .
>
> statement ::= directive '.' | triples '.' | ws+
>
> and in this example:
>
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
> @prefix : <http://localhost#>.
>
> :Person
> :age 18.;
> rdf:type :Person.
>
> it'd match decimal
>
> decimal ::= ('-' | '+')? ( [0-9]+ '.' [0-9]* | '.' ([0-9])+ | ([0-9])+ )

This brings up the issue of tokens and grammar.

Many parser generators do not backtrack over tokenization and make 
tokenizing greedy (the longest possible token is returned)
Here, the "18." is longer than "18" and so it's a decimal.


The current doc (2010/01/Turtle/) does touch on this indirectly:

"""
4.1 White Space

White space (production ws) is used to separate two tokens which would 
otherwise be (mis-)recognized as one token.

White space is significant in tokens IRI_REF and string.
"""

The example is then bad syntax because there is no trailing DOT because 
that's required in Turtle.

It is legal N3 where the final DOT is not required.

It is legal TriG for a block of triples, based on the first example in [1].

It is legal SPARQL for the same reason.

RIOT [2] parses it in lax mode and throws a syntax error in strict mode.

	Andy

[1] http://www4.wiwiss.fu-berlin.de/bizer/TriG/
[2] http://openjena.org/wiki/RIOT

>
> afaict,
>
> cheers nathan
>
> Yves Raimond wrote:
>> Hello!
>>
>> I just noticed a potentially ambiguous point in the Tutle grammar at [1].
>>
>> Considering the following document:
>>
>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
>> @prefix : <http://localhost#>.
>>
>> :Person
>> rdf:type :Person;
>> :age 18.
>>
>> seems to be allowed by the grammar (resource resource integer.) But
>> there's a potential ambiguity there (is the value of 'age' supposed to
>> be parsed as a float or as an integer?)
>>
>> Some Turtle parsers seems to reject this document (e.g. SWI-Prolog),
>> rapper makes a best guess (parsing 18 as an xsd:decimal), but throws a
>> syntax error, and the SemWeb.NET library [2] parses it as an
>> xsd:integer without any errors or warnings.

>>
>> Best,
>> y
>>
>> [1] http://www.w3.org/TeamSubmission/turtle/#sec-grammar
>> [2] http://www.rdfabout.com/demo/validator/
>>
>>
>>
>
>
Received on Monday, 28 February 2011 10:39:46 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:39 GMT