Re: Proposed fixed version of N-Triples https://www.w3.org/TR/n-triples/ Section 7 from Peter F. Patel-Schneider on 2017-06-30 (public-rdf-comments@w3.org from June 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 29 Jun 2017 17:42:24 -0700
To: Eric Prud'hommeaux <eric@w3.org>, public-rdf-comments@w3.org
Cc: Andy Seaborne <andy@apache.org>
Message-ID: <ce6caf90-d533-a0a6-bb5e-a9e7e993ec0f@gmail.com>

On 06/29/2017 03:34 PM, Eric Prud'hommeaux wrote:
> * Andy Seaborne <andy@apache.org> [2017-06-29 21:11+0100]
>> I think that changing the grammar in this way has disadvantages:
>>
>> For larger languages, it adds a lot of clutter.
>>
>> It does not reflect the practical aspects of tools.
>>
>> Whitespace and comment processing is often done during tokenization and
>> tokenizers even have special facilities, or common idioms, for doing that.
>> Having the grammar reflect that helps implementers.
> 
> strong +1. It is the default behavior of almost every lexer [...] to
> break on whitespace.

Not lex, for starters.

> Arguably, we could have been clearer about that,
> though we were clear about matching the longest terminal (which
> requires sorting the directives in some lexers).

I don't find this clear at all.

I assume that you are referring to

"White space (tab U+0009 or space U+0020) is used to separate two terminals
which would otherwise be (mis-)recognized as one terminal. "

In N-Triples, there is no such case.  In N-Quads, it is unstated what counts
as mis-recognition.  For example,

<http://example.org/a><http://example.org/b>_:a_:b.

can only be parsed in one way.  Of course, a parser that does initial
greedy-only tokenization of a particular kind will miss this parse.

Peter F. Patel-Schneider
Nuance Communications

Received on Friday, 30 June 2017 00:43:00 UTC