Re: WWW-Validator Bug (response to private mail on other topic)

On June 17, 1999 at 00:13, =?ISO-8859-1?Q?Claus_F=E4rber?= wrote:

> Earl Hood <ehood@hydra.acs.uci.edu> schrieb/wrote:
> > (I'm unclear why the document is split into an array).  If the data is
> > passed in as a single string, a comment stripping regex:
> >
> > 	s/<!--([^-]|-[^-])*--\s*>//go;
> 
> This not true either. It would only be valid if there were no elements  
> that could contain CDATA.

Since we are restricting ourselves to HTML and XML, there are no
CDATA elements.  If I remember correctly, XML does not support CDATA
elements.  Now, CDATA marked sections would be a more valid argument.
However, since the goal of the code in question is to just find the
doctype declarations, CDATA marked sections is a non-issue.

BTW, within the context of just try to find the doctype declaration,
any CDATA elements would not matter either.

> And it won't catch legal comment syntax:
> 
> <!-- comment 1 --
>   -- comment 2 -->

That is what I get when I cut-n-haste from some code w/o checking the
context the regex was being used.  Here is a probably a more
appropriate regex:

    s/<!(?:--(?:[^-]|-[^-])*--\s*)+>//go

--ewh

Received on Thursday, 17 June 1999 16:12:03 UTC