Re: Production 21 (and others)
> Indirection and hand-waving of the type recommended by Lee and Dave
> are a recipe for inconsistent and incompatible implementations.
I'm not sure that I am recommending hand waving. Many other languages
do something similar in this sort of cases -- C comments were specified
as follows :
The characters /* introduce a comment, which terminates with the
characters */. Comments do not nest.
Comments in C are white space (Lexical Conventions, para 1).
> a surprising number of
> intelligent people with an interest in parsing and grammars can *fail*
> to formalize them correctly when working from a natural-language
> description, and trying to express them without recourse to scanning
Some problems are very difficult to express with regular expressions.
I still don't have one that will match <!--xx-x--> correctly,
for example (I'm using x instead of * because it's easier to test).
> I think having full regular expressions defining comments, etc., is in
> fact useful to implementors.
I think you are right if the expressions are simple enough.
Expressions this complex are unlikely to work the same way in enough
tools, although I suppose that giving expressions for lex, perl 4,
perl 5, jacc and sed would be sufficient in practice, at least for now.
(perl 4 and 5 have slightly different rules about nested brackets,
and also about whitespace in expressions, but that's not for this list!)
Most perl 5 scripts will work fine with <!--x.*?x-->
For example, try
perl -w -p -e 's/<!--\*.*?\*-->/COM/g;'
This is because the perl *? operator is not greedy, so the first
occurrence of *--> satisfies the following literal, and is not
eaten by the ".*".
But not everyone is using perl 5 :-)