[Prev][Next][Index][Thread]

Re: Production 21 (and others)



> Indirection and hand-waving of the type recommended by Lee and Dave
> are a recipe for inconsistent and incompatible implementations.

I'm not sure that I am recommending hand waving.  Many other languages
do something similar in this sort of cases -- C comments were specified
as follows [1]:
    The characters /* introduce a comment, which terminates with the
    characters */.  Comments do not nest.

Comments in C are white space (Lexical Conventions, para 1).

> a surprising number of
> intelligent people with an interest in parsing and grammars can *fail*
> to formalize them correctly when working from a natural-language
> description, and trying to express them without recourse to scanning
> modes.

Some problems are very difficult to express with regular expressions.

I still don't have one that will match <!--xx-x--> correctly,
for example (I'm using x instead of * because it's easier to test).

> I think having full regular expressions defining comments, etc., is in
> fact useful to implementors.

I think you are right if the expressions are simple enough.

Expressions this complex are unlikely to work the same way in enough
tools, although I suppose that giving expressions for lex, perl 4,
perl 5, jacc and sed would be sufficient in practice, at least for now.
(perl 4 and 5 have slightly different rules about nested brackets,
and also about whitespace in expressions, but that's not for this list!)

Most perl 5 scripts will work fine with <!--x.*?x-->

For example, try
    perl -w -p -e 's/<!--\*.*?\*-->/COM/g;'

This is because the perl *? operator is not greedy, so the first
occurrence of *--> satisfies the following literal, and is not
eaten by the ".*".
Hence,
    <!--* *-->hello<!--**-*-->
will produce
    COMhelloCOM
correctly.

But not everyone is using perl 5 :-)

Lee