Re: WWW-Validator Bug (response to private mail on other topic)

Hi,

Sorry for the slow reply, I've been away for the last while.

On Wed, Jun 16, 1999 at 02:06:03AM -0700, Earl Hood wrote:
> On June 15, 1999 at 23:57, someone wrote:
> > Wow, I would have never thought to question their validator.
> 
> Not your fault. You either need a background in SGML or read
> the HTML 4.0 spec very carefully.
> 
> I just checked the source (cgi-bin/check) of their validator, and I
> spot their error.  The bug is in the check_for_doctype() function.  The
> check for a doctype declaration is not robust enough to deal with
> leading comment declarations that could contain "tag" like data.  They
> have the following statement:
> 
> 	last if ( $line =~ /<[a-z]/i );		 # found an element
> 
> However, it does not take in account that it could be inside
> of a comment declaration.

I anticipated this problem (hence the comment in the code,
"@@ this needs to be fixed to handle commented-out markup which
appears before the doctype"), but hoped I'd be able to fix it
before someone actually stumbled onto it. I guess not...

> Dealing with comment declarations can be ugly since the program reads
> the data into an array instead of keeping it in a single scalar string
> (I'm unclear why the document is split into an array).

I keep it in an array because I need to output individual lines
of the file showing where the errors are in the HTML code.

> If the data is passed in as a single string, a comment
> stripping regex:
> 
> 	s/<!--([^-]|-[^-])*--\s*>//go;
> 
> Could first be applied before checking for a doctype declaration.

Thanks for the regex; I added it in the doctype-checking
function. Unfortunately this still won't handle multi-line
comments, so I need to figure out what to do about those.
(patches would be welcome ;)

-- 
Gerald Oskoboiny       <gerald@w3.org>  +1 617 253 2920
System Administrator   http://www.w3.org/People/Gerald/
World Wide Web Consortium (W3C)      http://www.w3.org/

Received on Monday, 2 August 1999 23:35:33 UTC