- From: Andy Seaborne <andy@apache.org>
- Date: Fri, 30 Jun 2017 09:53:10 +0100
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Eric Prud'hommeaux <eric@w3.org>, public-rdf-comments@w3.org
On 30/06/17 01:42, Peter F. Patel-Schneider wrote:
> On 06/29/2017 03:34 PM, Eric Prud'hommeaux wrote:
>> * Andy Seaborne <andy@apache.org> [2017-06-29 21:11+0100]
>>> I think that changing the grammar in this way has disadvantages:
>>>
>>> For larger languages, it adds a lot of clutter.
>>>
>>> It does not reflect the practical aspects of tools.
>>>
>>> Whitespace and comment processing is often done during tokenization and
>>> tokenizers even have special facilities, or common idioms, for doing that.
>>> Having the grammar reflect that helps implementers.
>>
>> strong +1. It is the default behavior of almost every lexer [...] to
>> break on whitespace.
>
> Not lex, for starters.
The "common idioms" I was referring to include the way the lex-family
handle this.
The idiom is to recognize whitespace then not generate a token, and not
to pass it to the rule parser.
It's in the man page, the wikipedia page and the yacc documentation.
flex.1:
[ \t\n]+ /* eat up whitespace */
Andy
Received on Friday, 30 June 2017 08:53:45 UTC