W3C home > Mailing lists > Public > public-rdf-comments@w3.org > June 2017

Re: Proposed fixed version of N-Triples https://www.w3.org/TR/n-triples/ Section 7

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 29 Jun 2017 11:23:03 -0700
Cc: Dan Brickley <danbri@google.com>, Ivan Herman <ivan@w3.org>, public-rdf-comments Comments <public-rdf-comments@w3.org>
Message-Id: <B5C08C2F-8BB6-4A09-8DCC-07590C250852@greggkellogg.net>
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
> On Jun 29, 2017, at 5:48 AM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> One problem with providing these test cases is that it is not really possible
> to determine exactly what the current grammar allows.

Turtle adds some clarity:

> White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a Turtle parser.

This language comes from the SPARQL 1.1 grammar, so any change need for N-Triples/Quads/Turtle/TriG would also need to be applied there.

What it doesn’t do is say that white space may be composed of one or more WS or comments, which it probably should. Comments are treated as white space.

There is no discussion of white space within terminals, except for the String production and ANON terminal.

Personally, I’m not a fan if adding explicit WS to the grammar.

My interpretation for all grammars is that any amount of white space is allowed between any terminals, not required at all in N-Triples/Quads, and necessary in SPARQL/Turtle/TriG to keep two terminals from being confused.

> For example, there are multiple readings of "terminal".  Is it any terminal,
> in which case white space might be allowed within blank node labels?  Is it
> any terminal mentioned in the productions for non-terminals, in which case
> blank space might be allowed before language tags?  Is it only named terminals
> mentioned in the productions for non-terminals?

The quote above qualifies the use of white space to be significant in rule names in capitals, which are the "Productions for terminals”. Thus, WS is significant in the STRING_LITERAL_* terminals, and allowed within ANON. Perhaps it should be made clear that white space within other terminals is specifically not allowed.

It is somewhat problematic that note 6 specifically says that no white space is allowed between the sign and the number, which might imply that white space is allowed, say, between numbers, or following ‘@‘. My parser implements terminals as regular expressions, where white space is explicit within the regular expression, if necessary. The tokenizer eats any amount of whitespace between matched terminals; I think is is probably a fairly common pattern.


> However, I could produce some interesting cases.
> peter
> On 06/29/2017 05:41 AM, Dan Brickley wrote:
>> On 29 Jun 2017 12:40 pm, "Ivan Herman" <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>> On 29 Jun 2017, at 13:01, Peter F. Patel-Schneider
>>    <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
>>> I was hoping that my message would (instead) trigger a broader
>>    examination of
>>> the grammars for N-Triples, N-Quads, and Turtle and result in
>>> community-approved revised grammars for each of them.  Each of these
>>    grammars
>>> has problems.  The problems with the N-Triples grammar are the easiest
>>    to fix.
>>    One does not include the other… I mean, you (in plural, seeing the short
>>    discussion on swig) did identify an erratum which must therefore be
>>    recorded. If there is a wider discussion that leads to more proposals, we
>>    just have to record those as well…
>>    (In my experience not many people read and/or active on
>>    public-rdf-comments, I do not think you will get a lot of discussion on
>>    this list…:-(
>> There are a few lurkers!
>> It would be good to have some testcases annotated as being unchanged,
>> previously-ok-now-illegal, previously-illegal-now-ok, etc.
>>    Ivan
>>> peter
>>> On 06/29/2017 03:17 AM, Ivan Herman wrote:
>>>> Peter,
>>>> I have added this to the official Errata list:
>>>> https://www.w3.org/2001/sw/wiki/RDF1.1_Errata
>>    <https://www.w3.org/2001/sw/wiki/RDF1.1_Errata>
>>>> Thanks
>>>> Ivan
>>    ----
>>    Ivan Herman, W3C
>>    Publishing@W3C Technical Lead
>>    Home: http://www.w3.org/People/Ivan/
>>    mobile: +31-641044153 <tel:%2B31-641044153>
>>    ORCID ID: http://orcid.org/0000-0003-0782-2704
>>    <http://orcid.org/0000-0003-0782-2704>
Received on Thursday, 29 June 2017 18:23:37 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 June 2017 18:23:37 UTC