- From: Gerald Oskoboiny <gerald@w3.org>
- Date: Mon, 2 Aug 1999 23:35:30 -0400
- To: Earl Hood <ehood@hydra.acs.uci.edu>
- Cc: www-validator@w3.org
Hi, Sorry for the slow reply, I've been away for the last while. On Wed, Jun 16, 1999 at 02:06:03AM -0700, Earl Hood wrote: > On June 15, 1999 at 23:57, someone wrote: > > Wow, I would have never thought to question their validator. > > Not your fault. You either need a background in SGML or read > the HTML 4.0 spec very carefully. > > I just checked the source (cgi-bin/check) of their validator, and I > spot their error. The bug is in the check_for_doctype() function. The > check for a doctype declaration is not robust enough to deal with > leading comment declarations that could contain "tag" like data. They > have the following statement: > > last if ( $line =~ /<[a-z]/i ); # found an element > > However, it does not take in account that it could be inside > of a comment declaration. I anticipated this problem (hence the comment in the code, "@@ this needs to be fixed to handle commented-out markup which appears before the doctype"), but hoped I'd be able to fix it before someone actually stumbled onto it. I guess not... > Dealing with comment declarations can be ugly since the program reads > the data into an array instead of keeping it in a single scalar string > (I'm unclear why the document is split into an array). I keep it in an array because I need to output individual lines of the file showing where the errors are in the HTML code. > If the data is passed in as a single string, a comment > stripping regex: > > s/<!--([^-]|-[^-])*--\s*>//go; > > Could first be applied before checking for a doctype declaration. Thanks for the regex; I added it in the doctype-checking function. Unfortunately this still won't handle multi-line comments, so I need to figure out what to do about those. (patches would be welcome ;) -- Gerald Oskoboiny <gerald@w3.org> +1 617 253 2920 System Administrator http://www.w3.org/People/Gerald/ World Wide Web Consortium (W3C) http://www.w3.org/
Received on Monday, 2 August 1999 23:35:33 UTC