Re: Are spaces allowed between terms in N-Triples 1.1?

> On Jun 28, 2017, at 5:31 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> It is not possible for N-Triples parsers to be overly lenient, nor is it
> possible for Turtle parsers to be overly lenient.   The Turtle specification
> has a note in Section 5 on this point.

This note indicates that parsing of non-conforming documents is undefined, not that it is not possible. The presence of numerous tests which include extra white-space would indicate that consuming this, at least, is not considered to be overly lenient. IMHO, that it intended to indicate how parsers may or may not recover from parser/tokenizer errors, if triples are produced up to the point the error is discovered. There certainly are parsers that attempt to perform error recovery and continue to generate triples, which is a real-world consideration for handling many large dumps (the previous Freebase dumps, for example).

> However, even though everything you say below is true, it is still the case
> that the grammar sections in both the N-Triples document and the Turtle
> document are incorrect and need to be rewritten.

Perhaps an erratum would be sufficient. This might just clarify what “whitespace” means so that it can include sequences of multiple whitespace tokens and where it may be optional. As you note, in N-Triples, Whitespace between terminals is always optional (other than within literals).

> It is also not clear that every valid N-Triples document is a valid Turtle
> document.

How is this not clear? N-Triples is certainly intended to be a struct subset of Turtle.

Gregg

> Peter F. Patel-Schneider
> Nuance Communications
> 
> 
> On 06/28/2017 04:48 PM, Gregg Kellogg wrote:
>> Whitespace is typically taken to zero or more characters of whitespace. Note in the Change Log [1]:
>> 
>>> White space rules defined outside of grammar, as in Turtle [2], although the N-Triples grammar restricts White space to tab or (tab U+0009 or space U+0020).
>> 
>> If N-Triples parsers are overly lenient in allowing multiple whitespace characters between terminals, then by that logic, so are Turtle parsers.
>> 
>> The restriction that terminals be separated by exactly a single whitespace is true for the Canonical form of N-Triples [3]. Tokenizers only require whitespace to distinguish two terminals that would otherwise be joined.
>> 
>> Furthermore, there is a minimal whitespace test [4] that verifies that this is the intention of the working group.
>> 
>>   <http://example/s><http://example/p><http://example/o>.
>>   <http://example/s><http://example/p>"Alice".
>>   <http://example/s><http://example/p>_:o.
>>   _:s<http://example/p><http://example/o>.
>>   _:s<http://example/p>"Alice".
>>   _:s<http://example/p>_:bnode1.
>> 
>> There is also the original N-Triples test [5] that contains many instances of terminals separated by mutliple whitespace characters [5], for example:
>> 
>>    # spaces and tabs throughout:
>>         <http://example.org/resource3>          <http://example.org/property>   <http://example.org/resource2>         .        
>> 
>> Gregg Kellogg
>> gregg@greggkellogg.net
>> 
>> [1] https://www.w3.org/TR/n-triples/#changes-between-last-call-working-draft-and-publication-as-note
>> [2] https://www.w3.org/TR/turtle/#grammar-production-WS
>> [3] https://www.w3.org/TR/n-triples/#canonical-ntriples
>> [4] http://w3c.github.io/rdf-tests/ntriples/lantag_with_subtag.nt
>> 
>>> On Jun 28, 2017, at 8:58 AM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>> 
>>> This means that all existing N-Triples parsers are lenient in that they
>>> process documents that are not valid N-Triples documents.  This, however, does
>>> not make them too lenient as there is no requirement that an N-Triples
>>> processor reject inputs that are not N-Triples documents.
>>> 
>>> This does mean that Canonical N-Triples documents are not valid N-Triples
>>> documents.
>>> 
>>> peter
>>> 
>>> PS:  Of course what it really means is that the grammar section of the
>>> N-Triples document needs to be changed.
>>> 
>>> 
>>> On 06/28/2017 08:50 AM, Wouter Beek wrote:
>>>>> So it seems to me that spaces are not allowed anywhere in [1] in N-Triples, i.e.,
>>>>> 
>>>>> <x:y> <x:y> <x:y> .
>>>>> 
>>>>> is not a valid N-Triples triple.
>>>> 
>>>> I do follow your reasoning here, but this would mean that all existing
>>>> N-Triples parsers are too lenient.
>>>> 
>>> 
>> 
> 

Received on Thursday, 29 June 2017 02:31:34 UTC