- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 29 Jun 2017 17:42:24 -0700
- To: Eric Prud'hommeaux <eric@w3.org>, public-rdf-comments@w3.org
- Cc: Andy Seaborne <andy@apache.org>
On 06/29/2017 03:34 PM, Eric Prud'hommeaux wrote: > * Andy Seaborne <andy@apache.org> [2017-06-29 21:11+0100] >> I think that changing the grammar in this way has disadvantages: >> >> For larger languages, it adds a lot of clutter. >> >> It does not reflect the practical aspects of tools. >> >> Whitespace and comment processing is often done during tokenization and >> tokenizers even have special facilities, or common idioms, for doing that. >> Having the grammar reflect that helps implementers. > > strong +1. It is the default behavior of almost every lexer [...] to > break on whitespace. Not lex, for starters. > Arguably, we could have been clearer about that, > though we were clear about matching the longest terminal (which > requires sorting the directives in some lexers). I don't find this clear at all. I assume that you are referring to "White space (tab U+0009 or space U+0020) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. " In N-Triples, there is no such case. In N-Quads, it is unstated what counts as mis-recognition. For example, <http://example.org/a><http://example.org/b>_:a_:b. can only be parsed in one way. Of course, a parser that does initial greedy-only tokenization of a particular kind will miss this parse. Peter F. Patel-Schneider Nuance Communications
Received on Friday, 30 June 2017 00:43:00 UTC