- From: <lee@sq.com>
- Date: Sat, 1 Feb 97 01:30:54 EST
- To: U35395@UICVM.UIC.EDU, w3c-sgml-wg@www10.w3.org
> Indirection and hand-waving of the type recommended by Lee and Dave
> are a recipe for inconsistent and incompatible implementations.
I'm not sure that I am recommending hand waving. Many other languages
do something similar in this sort of cases -- C comments were specified
as follows [1]:
The characters /* introduce a comment, which terminates with the
characters */. Comments do not nest.
Comments in C are white space (Lexical Conventions, para 1).
> a surprising number of
> intelligent people with an interest in parsing and grammars can *fail*
> to formalize them correctly when working from a natural-language
> description, and trying to express them without recourse to scanning
> modes.
Some problems are very difficult to express with regular expressions.
I still don't have one that will match <!--xx-x--> correctly,
for example (I'm using x instead of * because it's easier to test).
> I think having full regular expressions defining comments, etc., is in
> fact useful to implementors.
I think you are right if the expressions are simple enough.
Expressions this complex are unlikely to work the same way in enough
tools, although I suppose that giving expressions for lex, perl 4,
perl 5, jacc and sed would be sufficient in practice, at least for now.
(perl 4 and 5 have slightly different rules about nested brackets,
and also about whitespace in expressions, but that's not for this list!)
Most perl 5 scripts will work fine with <!--x.*?x-->
For example, try
perl -w -p -e 's/<!--\*.*?\*-->/COM/g;'
This is because the perl *? operator is not greedy, so the first
occurrence of *--> satisfies the following literal, and is not
eaten by the ".*".
Hence,
<!--* *-->hello<!--**-*-->
will produce
COMhelloCOM
correctly.
But not everyone is using perl 5 :-)
Lee
Received on Saturday, 1 February 1997 01:31:23 UTC