W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

[BUGFIX] Tidy supplies obsolete DOCTYPE

From: J. David Bryan <jdbryan@acm.org>
Date: Wed, 23 Feb 2000 00:35:13 -0500
Message-Id: <200002230535.AAA11140@mail.bcpl.net>
To: HTML Tidy List <html-tidy@w3.org>
This report is for the Tidy version of 13th January 2000.

When Tidy is asked to supply a DOCTYPE (e.g., with the configuration option 
"doctype: strict"), it will supply one for HTML 4.0, which is obsolete.

The HTML 4.01 specification says, "This document obsoletes previous 
versions of HTML 4.0...W3C recommends that user agents and authors (and in 
particular, authoring tools) produce HTML 4.01 documents rather than HTML 
4.0 documents."  Therefore, Tidy should generate DOCTYPEs with the 4.01 
version.

For example, given a "bug.html" file containing:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <html>
  <head>
  <title>Bug test</title>
  </head>
  <body>
  <p>Test file.</p>
  </body>
  </html>

...then running:

  tidy --doctype strict bug.html

...will produce a file with the HTML 4.0 DTD.

The error is in lexer.c, lines 51-67):

  struct _vers
  {
      char *name;
      char *voyager_name;
      char *profile;
      int code;
  } W3C_Version[] =
  {
      {"HTML 2.0", "XHTML 1.0 Strict", voyager_strict, VERS_HTML20},
      {"HTML 3.2", "XHTML 1.0 Transitional", voyager_loose, VERS_HTML32},
      {"HTML 4.0", "XHTML 1.0 Strict", voyager_strict, VERS_HTML40_STRICT},
      {"HTML 4.0 Transitional", "XHTML 1.0 Transitional", voyager_loose, VERS_HTML40_LOOSE},
      {"HTML 4.0 Frameset", "XHTML 1.0 Frameset", voyager_frameset, VERS_FRAMES},
      {"HTML 4.01", "XHTML 1.0 Strict", voyager_strict, VERS_HTML40_STRICT},
      {"HTML 4.01 Transitional", "XHTML 1.0 Transitional", voyager_loose, VERS_HTML40_LOOSE},
      {"HTML 4.01 Frameset", "XHTML 1.0 Frameset", voyager_frameset, VERS_FRAMES}
  };

Because the HTML 4.0 and 4.01 DOCTYPE strings carry the same internal 
version flags (e.g., VERS_HTML40_STRICT), Tidy uses the first string 
encountered with the desired version flag when generating the requested 
DOCTYPE.  As the HTML 4.0 strings are first, they are used in preference to 
the 4.01 strings.  Placing the 4.01 strings ahead of the 4.0 strings solves 
the problem.

                                      -- Dave Bryan
Received on Wednesday, 23 February 2000 00:35:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT