- From: <pdf@bizfon.com>
- Date: Tue, 10 Oct 2000 10:39:51 -0400
- To: www-validator@w3.org
Hehe... sorry Terje! I'll try to keep my interesting problems to a minimum. :) Thanks! -Pete Terje Bless <link@tss.no> on 10/10/2000 12:21:50 AM To: W3C Validator <www-validator@w3.org> cc: (bcc: Peter Foti) Subject: [PATCH] DOCTYPE Override Well, since Peter just couldn't leave well enough alone but just /had/ get me started on this; here is a patch that adds a DOCTYPE override function to the Validator. :-) A couple of points: * This is seriously nasty code! Testing is essential before it should be used in production. It's also a quick hack and it will have to be completely rewritten at some point. You have been warned. * It requires HTML::Parser. The corollary being that I dump all[0] the nitty details on HTML::Parser so if HTML::Parser chokes on something, we blow up too. * In addition to the patch, I've attached a necessary configuration file for mapping version names to DOCTYPE declarations. The flip side is that HTML::Parser looks to work great (for HTML!) so we can probably soon[1] ditch all that skanky code that deals with content sniffing (charset, META elements, DOCTYPEs, you name it) and use HTML::Parser callbacks instead. We're fucked if we get fed XML, of course, but XML support isn't that great in any case. For starters we can disable the stuff for XML and then look at using XML::Parser to provide equivalent functionality. XML support needs to be revised as soon as I can find a decent XML parser that groks Schemas, but that's a whole new ball game. Anyways, gotta go. It's 6am up here on the North Pole and I'm supposed to be at work in two hours. I should know better then to start looking at interesting problems after 8pm... :-( [0] - And when I say "all" I really mean *all*. :-) [1] - For certain odd definitions of the word. :-) -- By definition there is *no*way* any problem can be my fault. Any problems you think you can find in my code are all in your imagination. If you continue with such derranged imaginings then I may be forced to perform corrective brain surgery... with an axe. - Stephen Harris <sweh@spuddy.mew.co.uk> in asr. diff -r -u /usr/local/validator/htdocs/index.html /tmp/validator/htdocs/index.html --- /usr/local/validator/htdocs/index.html Fri Apr 28 11:05:35 2000 +++ /tmp/validator/htdocs/index.html Tue Oct 10 05:37:29 2000 @@ -59,7 +59,27 @@ </p> <form method="get" action="/check"> - Address: <input name="uri" size="50" /> + Address: <input name="uri" size="50" /><br /> + Doctype: + <select name="doctype"> + <option>Inline</option> + <option>XHTML 1.0 Strict</option> + <option>XHTML 1.0 Transitional</option> + <option>XHTML 1.0 Frameset</option> + <option>HTML 4.01 Strict</option> + <option>HTML 4.01 Transitional</option> + <option>HTML 4.01 Frameset</option> + <option>HTML 2.0</option> + <option>HTML 3.0 (AdvaSoft version)</option> + <option>HTML 3.2</option> + <option>HTML 3.2 + Style</option> + <option>HTML Pro</option> + <option>Spyglass HTML 2.0 Extended</option> + <option>HTML Level Cougar</option> + <option>HTML 4.0 Strict</option> + <option>HTML 4.0 Transitional</option> + <option>HTML 4.0 Frameset</option> + </select> <table cellpadding="0" cellspacing="0"> <!-- <tr> diff -r -u /usr/local/validator/httpd/cgi-bin/check /tmp/validator/httpd/cgi-bin/check --- /usr/local/validator/httpd/cgi-bin/check Tue Oct 10 05:35:47 2000 +++ /tmp/validator/httpd/cgi-bin/check Tue Oct 10 05:40:25 2000 @@ -22,6 +22,7 @@ use CGI::Carp; use CGI qw(:cgi -newstyle_urls -private_tempfiles); use Text::Wrap; +use HTML::Parser; ############################################################################# @@ -38,7 +39,7 @@ # # Define global variables use vars qw($VERSION $DATE $MAINTAINER); # Strings we need. -use vars qw($frag $pub_ids $element_uri $file_type); # Cfg hashes. +use vars qw($frag $pub_ids $element_uri $file_type $doctypes); # Cfg hashes. # # Paths and file locations @@ -49,6 +50,7 @@ my $fpis_db = $html_path . 'config/fpis.cfg'; my $frag_db = $html_path . 'config/frag.cfg'; my $type_db = $html_path . 'config/type.cfg'; +my $dtds_db = $html_path . 'config/doctypes.cfg'; my $sgmlstuff = $html_path . 'sgml-lib'; my $sgmldecl = $sgmlstuff . '/REC-html40-19980424/HTML4.decl'; my $xhtmldecl = $sgmlstuff . '/REC-xhtml1-20000126/xhtml1.dcl'; @@ -110,6 +112,7 @@ $pub_ids = &read_cfg($fpis_db); # Errors -> fragment identifier $element_uri = &read_cfg($elem_db); # Element -> URI fragment $file_type = &read_cfg($type_db); # Content -> File -type +$doctypes = &read_cfg($dtds_db); # Name -> doctype # # Set up signal handlers. @@ -251,7 +254,13 @@ # 4. if there is an xmlns= attribute, check for XML well-formedness # 5. if there is no xmlns= attribute, validate as HTML using the doctype # inferred by the check_for_doctype function + # +# Override DOCTYPE. +if (defined $q->param('doctype') and not $q->param('doctype') =~ /Inline/i) { + $File->{Content} = &supress_doctype($File->{Content}); + unshift @{$File->{Content}}, $doctypes->{$q->param('doctype')}; +} # # Try to extract or guess the DOCTYPE for HTML and XHTML files. @@ -1377,4 +1386,18 @@ $file =~ s(\015) {\n}g; # Turn ASCII CR into native newline. return [split /\n/, $file]; +} + +# +# Supress any existing DOCTYPE by commenting it out. +sub supress_doctype { + no strict 'vars'; + my $file = shift; + local $HTML = ''; + + HTML::Parser->new( + default_h => [sub {$HTML .= shift}, 'text'], + declaration_h => [sub {$HTML .= '<!-- ' . $_[0] . ' -->'}, 'text'] + )->parse(join "\n", @{$file}); + return [split /\n/, $HTML]; } HTML 0.0 <!DOCTYPE html PUBLIC "-//IETF//DTD HTML Level 0//EN//2.0"> Strict HTML 0.0 <!DOCTYPE html PUBLIC "-//IETF//DTD HTML Strict Level 0//EN//2.0"> HTML 1.0 <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0 Level 1//EN"> Strict HTML 1.0 <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN"> Strict HTML 2.0 <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN"> HTML 2.0 <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"> HTML 2.1E <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.1E//EN"> HTML 3.0 (AdvaSoft version) <!DOCTYPE html PUBLIC "-//AS//DTD HTML 3.0 asWedit + extensions//EN"> HTML 3.0 (Beta) <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 3.0//EN"> Strict HTML 3.0 (Beta) <!DOCTYPE html PUBLIC "-//W3O//DTD W3 HTML Strict 3.0//EN//"> Hotjava-HTML <!DOCTYPE html PUBLIC "-//Sun Microsystems Corp.//DTD HotJava HTML//EN"> Strict Hotjava-HTML <!DOCTYPE html PUBLIC "-//Sun Microsystems Corp.//DTD HotJava Strict HTML//EN"> Netscape-HTML <!DOCTYPE html PUBLIC "-//WebTechs//DTD Mozilla HTML 2.0//EN"> Strict Netscape-HTML <!DOCTYPE html PUBLIC "-//Netscape Comm. Corp. Strict//DTD HTML//EN"> MSIE-HTML <!DOCTYPE html PUBLIC "-//Microsoft//DTD Internet Explorer 2.0 HTML//EN"> Strict MSIE-HTML <!DOCTYPE html PUBLIC "-//Microsoft//DTD Internet Explorer 2.0 HTML Strict//EN"> MSIE 3.0 HTML <!DOCTYPE html PUBLIC "-//Microsoft//DTD Internet Explorer 3.0 HTML//EN"> Strict MSIE 3.0 HTML <!DOCTYPE html PUBLIC "-//Microsoft//DTD Internet Explorer 3.0 HTML Strict//EN"> ORA HTML Extended v1.0 <!DOCTYPE html PUBLIC "-//OReilly and Associates//DTD HTML Extended 1.0//EN"> ORA HTML Extended Relaxed v1.0 <!DOCTYPE html PUBLIC "-//OReilly and Associates//DTD HTML Extended Relaxed 1.0//EN"> HTML 2.2 <!DOCTYPE html PUBLIC "-//IETF//DTD HTML V2.2//EN"> HTML 1996-01 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 1996-01//EN"> HTML 3.2 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> HTML 3.2 + Style <!DOCTYPE html PUBLIC "-//W3C//DTD HTML Experimental 970421//EN"> HTML Pro <!DOCTYPE html PUBLIC "+//Silmaril//DTD HTML Pro v0r11 19970101//EN"> Spyglass HTML 2.0 Extended <!DOCTYPE html PUBLIC "-//Spyglass//DTD HTML 2.0 Extended//EN"> HTML Level Cougar <!DOCTYPE html PUBLIC " http://www.w3.org/MarkUp/Cougar/Cougar.dtd"> HTML 4.0 Strict <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"> HTML 4.0 Transitional <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> HTML 4.0 Frameset <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN"> HTML 4.01 Strict <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> HTML 4.01 Transitional <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> HTML 4.01 Frameset <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"> XHTML 1.0 Strict <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> XHTML 1.0 Transitional <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> XHTML 1.0 Frameset <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Received on Tuesday, 10 October 2000 10:40:16 UTC