W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

[BUGFIX] Tidy ignores config line after "doctype" and "alt-text"

From: J. David Bryan <jdbryan@acm.org>
Date: Wed, 23 Feb 2000 00:35:14 -0500
Message-Id: <200002230535.AAA11145@mail.bcpl.net>
To: HTML Tidy List <html-tidy@w3.org>
This report is for the Tidy version of 13th January 2000.

When a configuration file contains the "alt-text:" option or the "doctype: 
<fpi>" option, Tidy will ignore the line in the configuration file that 
follows either of these options.

For example, given a "config.txt" file consisting of these lines:

  doctype: "-//ACME//DTD HTML 3.14159//EN"
  logical-emphasis: yes

...and an HTML test file "bug.html" containing:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <html>
  <head>
  <title>Bug test</title>
  </head>
  <body>
  <p>This <b>paragraph</b> has <i>presentation</i> markup
  which should be replaced by structural markup.</p>
  </body>
  </html>

...then the following command:

  tidy -config config.txt bug.html

will *not* replace the <i> tags with <em> tags, etc.  A corresponding test 
using the "alt-text" option will display the same behavior.  Note that if 
the order of the configuration options is reversed, or if an intervening 
comment is placed after the "doctype" option, replacement will occur.

The fault appears to be in the "ParseString" routine of config.c, line 717:

  NextProperty();

This line should be removed.

The "NextProperty" routine discards characters until the succeeding line is 
encountered.  However, the ParseString routine calls NextProperty *after* 
it has consumed all of the "doctype" (e.g.) option line and the character 
pointer is pointing at the first character of the succeeding line (at the 
"logical-emphasis" line in my example).  Therefore, NextProperty skips that 
line and returns with the pointer at the start of the *second* following 
line (EOF in my example).


Note that there appears to be another (unrelated) problem in the 
ParseString routine.  Lines 691-702 of config.c are:

  if (IsWhite(c))
  {
    if (waswhite)
    {
      AdvanceChar();
      continue;
    }

    c = ' ';
  }
  else
    waswhite = no;

ParseString appears to attempt to compress white space in "doctype" and 
"alt-text" parameters.  However, there appears to be a flaw in the logic 
that prevents this from occurring ("waswhite" is never set true, so the 
compression doesn't work).  Regardless, I believe that compressing white 
space from the user-specified parameters is incorrect.  If I specify a 
"doctype" or "alt-text" parameter containing extra space, then Tidy 
shouldn't second-guess me by removing it (unless it's required by the HTML 
spec; is it?).

                                      -- Dave Bryan
Received on Wednesday, 23 February 2000 00:35:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT