Re: Proposed fixed version of N-Triples https://www.w3.org/TR/n-triples/ Section 7

On 06/29/2017 11:23 AM, Gregg Kellogg wrote:
>> On Jun 29, 2017, at 5:48 AM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>
>> One problem with providing these test cases is that it is not really possible
>> to determine exactly what the current grammar allows.
> 
> Turtle adds some clarity:
> 
>> White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a Turtle parser.

I don't view this wording as clear at all.  For example,
:::.
can be recognized as a Turtle triple in only one way so white space shouldn't
be needed here.

The intent is probably something like using white space to separate tokens for
a left-to-right greedy tokenizer for the terminal tokens that appear in the
rules for non-terminals.  However, there are lots of other ways of writing
Turtle parsers.
> This language comes from the SPARQL 1.1 grammar, so any change need for N-Triples/Quads/Turtle/TriG would also need to be applied there.
> 
> What it doesn’t do is say that white space may be composed of one or more WS or comments, which it probably should. Comments are treated as white space.
> 
> There is no discussion of white space within terminals, except for the String production and ANON terminal.
> 
> Personally, I’m not a fan if adding explicit WS to the grammar.
> 
> My interpretation for all grammars is that any amount of white space is allowed between any terminals, not required at all in N-Triples/Quads, and necessary in SPARQL/Turtle/TriG to keep two terminals from being confused.
> 
>> For example, there are multiple readings of "terminal".  Is it any terminal,
>> in which case white space might be allowed within blank node labels?  Is it
>> any terminal mentioned in the productions for non-terminals, in which case
>> blank space might be allowed before language tags?  Is it only named terminals
>> mentioned in the productions for non-terminals?
> 
> The quote above qualifies the use of white space to be significant in rule names in capitals, which are the "Productions for terminals”. Thus, WS is significant in the STRING_LITERAL_* terminals, and allowed within ANON. Perhaps it should be made clear that white space within other terminals is specifically not allowed.
> 
> It is somewhat problematic that note 6 specifically says that no white space is allowed between the sign and the number, which might imply that white space is allowed, say, between numbers, or following ‘@‘. My parser implements terminals as regular expressions, where white space is explicit within the regular expression, if necessary. The tokenizer eats any amount of whitespace between matched terminals; I think is is probably a fairly common pattern.
> 
> Gregg
peter

Received on Thursday, 29 June 2017 20:17:59 UTC