W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > February 1997

Re: Production 21 (and others)

From: <lee@sq.com>
Date: Sat, 1 Feb 97 01:30:54 EST
Message-Id: <9702010630.AA10963@sqrex.sq.com>
To: U35395@UICVM.UIC.EDU, w3c-sgml-wg@www10.w3.org
> Indirection and hand-waving of the type recommended by Lee and Dave
> are a recipe for inconsistent and incompatible implementations.

I'm not sure that I am recommending hand waving.  Many other languages
do something similar in this sort of cases -- C comments were specified
as follows [1]:
    The characters /* introduce a comment, which terminates with the
    characters */.  Comments do not nest.

Comments in C are white space (Lexical Conventions, para 1).

> a surprising number of
> intelligent people with an interest in parsing and grammars can *fail*
> to formalize them correctly when working from a natural-language
> description, and trying to express them without recourse to scanning
> modes.

Some problems are very difficult to express with regular expressions.

I still don't have one that will match <!--xx-x--> correctly,
for example (I'm using x instead of * because it's easier to test).

> I think having full regular expressions defining comments, etc., is in
> fact useful to implementors.

I think you are right if the expressions are simple enough.

Expressions this complex are unlikely to work the same way in enough
tools, although I suppose that giving expressions for lex, perl 4,
perl 5, jacc and sed would be sufficient in practice, at least for now.
(perl 4 and 5 have slightly different rules about nested brackets,
and also about whitespace in expressions, but that's not for this list!)

Most perl 5 scripts will work fine with <!--x.*?x-->

For example, try
    perl -w -p -e 's/<!--\*.*?\*-->/COM/g;'

This is because the perl *? operator is not greedy, so the first
occurrence of *--> satisfies the following literal, and is not
eaten by the ".*".
    <!--* *-->hello<!--**-*-->
will produce

But not everyone is using perl 5 :-)

Received on Saturday, 1 February 1997 01:31:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:07 UTC