Re: WWW-Validator Bug (response to private mail on other topic)
From: Gerald Oskoboiny (gerald@w3.org)
Date: Mon, Aug 02 1999
Date: Mon, 2 Aug 1999 23:35:30 -0400
From: Gerald Oskoboiny <gerald@w3.org>
To: Earl Hood <ehood@hydra.acs.uci.edu>
Cc: www-validator@w3.org
Message-ID: <19990802233530.A1887@w3.org>
Subject: Re: WWW-Validator Bug (response to private mail on other topic)
Hi,
Sorry for the slow reply, I've been away for the last while.
On Wed, Jun 16, 1999 at 02:06:03AM -0700, Earl Hood wrote:
> On June 15, 1999 at 23:57, someone wrote:
> > Wow, I would have never thought to question their validator.
>
> Not your fault. You either need a background in SGML or read
> the HTML 4.0 spec very carefully.
>
> I just checked the source (cgi-bin/check) of their validator, and I
> spot their error. The bug is in the check_for_doctype() function. The
> check for a doctype declaration is not robust enough to deal with
> leading comment declarations that could contain "tag" like data. They
> have the following statement:
>
> last if ( $line =~ /<[a-z]/i ); # found an element
>
> However, it does not take in account that it could be inside
> of a comment declaration.
I anticipated this problem (hence the comment in the code,
"@@ this needs to be fixed to handle commented-out markup which
appears before the doctype"), but hoped I'd be able to fix it
before someone actually stumbled onto it. I guess not...
> Dealing with comment declarations can be ugly since the program reads
> the data into an array instead of keeping it in a single scalar string
> (I'm unclear why the document is split into an array).
I keep it in an array because I need to output individual lines
of the file showing where the errors are in the HTML code.
> If the data is passed in as a single string, a comment
> stripping regex:
>
> s/<!--([^-]|-[^-])*--\s*>//go;
>
> Could first be applied before checking for a doctype declaration.
Thanks for the regex; I added it in the doctype-checking
function. Unfortunately this still won't handle multi-line
comments, so I need to figure out what to do about those.
(patches would be welcome ;)
--
Gerald Oskoboiny <gerald@w3.org> +1 617 253 2920
System Administrator http://www.w3.org/People/Gerald/
World Wide Web Consortium (W3C) http://www.w3.org/