[BUGFIX] Tidy supplies obsolete DOCTYPE

This report is for the Tidy version of 13th January 2000.

When Tidy is asked to supply a DOCTYPE (e.g., with the configuration option
"doctype: strict"), it will supply one for HTML 4.0, which is obsolete.

The HTML 4.01 specification says, "This document obsoletes previous
versions of HTML 4.0...W3C recommends that user agents and authors (and in
particular, authoring tools) produce HTML 4.01 documents rather than HTML
4.0 documents."  Therefore, Tidy should generate DOCTYPEs with the 4.01
version.

For example, given a "bug.html" file containing:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <html>
  <head>
  <title>Bug test</title>  <body>
  <p>Test file.</p>  </html>

...then running:

  tidy --doctype strict bug.html

...will produce a file with the HTML 4.0 DTD.

The error is in lexer.c, lines 51-67):

  struct _vers
  {
      char *name;
      char *voyager_name;
      char *profile;
      int code;
  } W3C_Version[] =
  {
      {"HTML 2.0", "XHTML 1.0 Strict", voyager_strict, VERS_HTML20},
      {"HTML 3.2", "XHTML 1.0 Transitional", voyager_loose, VERS_HTML32},
      {"HTML 4.0", "XHTML 1.0 Strict", voyager_strict, VERS_HTML40_STRICT},
      {"HTML 4.0 Transitional", "XHTML 1.0 Transitional", voyager_loose,
VERS_HTML40_LOOSE},
      {"HTML 4.0 Frameset", "XHTML 1.0 Frameset", voyager_frameset,
VERS_FRAMES},
      {"HTML 4.01", "XHTML 1.0 Strict", voyager_strict,
VERS_HTML40_STRICT},
      {"HTML 4.01 Transitional", "XHTML 1.0 Transitional", voyager_loose,
VERS_HTML40_LOOSE},
      {"HTML 4.01 Frameset", "XHTML 1.0 Frameset", voyager_frameset,
VERS_FRAMES}
  };

Because the HTML 4.0 and 4.01 DOCTYPE strings carry the same internal
version flags (e.g., VERS_HTML40_STRICT), Tidy uses the first string
encountered with the desired version flag when generating the requested
DOCTYPE.  As the HTML 4.0 strings are first, they are used in preference to
the 4.01 strings.  Placing the 4.01 strings ahead of the 4.0 strings solves
the problem.

                                      -- Dave Bryan

Received on Friday, 24 March 2000 13:12:57 UTC