[BUGFIX] Tidy supplies obsolete DOCTYPE

This report is for the Tidy version of 13th January 2000.

When Tidy is asked to supply a DOCTYPE (e.g., with the configuration option 
"doctype: strict"), it will supply one for HTML 4.0, which is obsolete.

The HTML 4.01 specification says, "This document obsoletes previous 
versions of HTML 4.0...W3C recommends that user agents and authors (and in 
particular, authoring tools) produce HTML 4.01 documents rather than HTML 
4.0 documents."  Therefore, Tidy should generate DOCTYPEs with the 4.01 
version.

For example, given a "bug.html" file containing:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <html>
  <head>
  <title>Bug test</title>
  </head>
  <body>
  <p>Test file.</p>
  </body>
  </html>

...then running:

  tidy --doctype strict bug.html

...will produce a file with the HTML 4.0 DTD.

The error is in lexer.c, lines 51-67):

  struct _vers
  {
      char *name;
      char *voyager_name;
      char *profile;
      int code;
  } W3C_Version[] =
  {
      {"HTML 2.0", "XHTML 1.0 Strict", voyager_strict, VERS_HTML20},
      {"HTML 3.2", "XHTML 1.0 Transitional", voyager_loose, VERS_HTML32},
      {"HTML 4.0", "XHTML 1.0 Strict", voyager_strict, VERS_HTML40_STRICT},
      {"HTML 4.0 Transitional", "XHTML 1.0 Transitional", voyager_loose, VERS_HTML40_LOOSE},
      {"HTML 4.0 Frameset", "XHTML 1.0 Frameset", voyager_frameset, VERS_FRAMES},
      {"HTML 4.01", "XHTML 1.0 Strict", voyager_strict, VERS_HTML40_STRICT},
      {"HTML 4.01 Transitional", "XHTML 1.0 Transitional", voyager_loose, VERS_HTML40_LOOSE},
      {"HTML 4.01 Frameset", "XHTML 1.0 Frameset", voyager_frameset, VERS_FRAMES}
  };

Because the HTML 4.0 and 4.01 DOCTYPE strings carry the same internal 
version flags (e.g., VERS_HTML40_STRICT), Tidy uses the first string 
encountered with the desired version flag when generating the requested 
DOCTYPE.  As the HTML 4.0 strings are first, they are used in preference to 
the 4.01 strings.  Placing the 4.01 strings ahead of the 4.0 strings solves 
the problem.

                                      -- Dave Bryan

Received on Wednesday, 23 February 2000 00:35:23 UTC