- From: Earl Hood <ehood@hydra.acs.uci.edu>
- Date: Wed, 16 Jun 1999 02:06:03 -0700
- To: www-validator@w3.org
On June 15, 1999 at 23:57, someone wrote: > Wow, I would have never thought to question their > validator. Not your fault. You either need a background in SGML or read the HTML 4.0 spec very carefully. I just checked the source (cgi-bin/check) of their validator, and I spot their error. The bug is in the check_for_doctype() function. The check for a doctype declaration is not robust enough to deal with leading comment declarations that could contain "tag" like data. They have the following statement: last if ( $line =~ /<[a-z]/i ); # found an element However, it does not take in account that it could be inside of a comment declaration. Dealing with comment declarations can be ugly since the program reads the data into an array instead of keeping it in a single scalar string (I'm unclear why the document is split into an array). If the data is passed in as a single string, a comment stripping regex: s/<!--([^-]|-[^-])*--\s*>//go; Could first be applied before checking for a doctype declaration. Another possible solution is to call nsgmls first and see if it complains about a missing document type. One has to be careful if dealing with an XML document since the XML SGML declaration needs to be passed to nsgmls for parsing (to avoid invalid character and other errors). However, a simple pattern match checking for XML specific markup could be used to determine if XML-related arguments to nsgmls are needed. --ewh ---- Earl Hood | University of California: Irvine ehood@medusa.acs.uci.edu | Electronic Loiterer http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME
Received on Wednesday, 16 June 1999 05:06:14 UTC