- From: <lee@sq.com>
- Date: Sat, 1 Feb 97 01:30:54 EST
- To: U35395@UICVM.UIC.EDU, w3c-sgml-wg@www10.w3.org
> Indirection and hand-waving of the type recommended by Lee and Dave > are a recipe for inconsistent and incompatible implementations. I'm not sure that I am recommending hand waving. Many other languages do something similar in this sort of cases -- C comments were specified as follows [1]: The characters /* introduce a comment, which terminates with the characters */. Comments do not nest. Comments in C are white space (Lexical Conventions, para 1). > a surprising number of > intelligent people with an interest in parsing and grammars can *fail* > to formalize them correctly when working from a natural-language > description, and trying to express them without recourse to scanning > modes. Some problems are very difficult to express with regular expressions. I still don't have one that will match <!--xx-x--> correctly, for example (I'm using x instead of * because it's easier to test). > I think having full regular expressions defining comments, etc., is in > fact useful to implementors. I think you are right if the expressions are simple enough. Expressions this complex are unlikely to work the same way in enough tools, although I suppose that giving expressions for lex, perl 4, perl 5, jacc and sed would be sufficient in practice, at least for now. (perl 4 and 5 have slightly different rules about nested brackets, and also about whitespace in expressions, but that's not for this list!) Most perl 5 scripts will work fine with <!--x.*?x--> For example, try perl -w -p -e 's/<!--\*.*?\*-->/COM/g;' This is because the perl *? operator is not greedy, so the first occurrence of *--> satisfies the following literal, and is not eaten by the ".*". Hence, <!--* *-->hello<!--**-*--> will produce COMhelloCOM correctly. But not everyone is using perl 5 :-) Lee
Received on Saturday, 1 February 1997 01:31:23 UTC