W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

[BUGFIX] Tidy ignores config line after "doctype" and "alt-text"

From: J. David Bryan <jdbryan@acm.org>
Date: Fri, 24 Mar 2000 11:47:13 -0600
To: HTML Tidy List <html-tidy@w3.org>
Message-ID: <OFAC253D13.1275C72C-ON8625688E.001EC665@rfdinc.com>

This report is for the Tidy version of 13th January 2000.

When a configuration file contains the "alt-text:" option or the "doctype:
<fpi>" option, Tidy will ignore the line in the configuration file that
follows either of these options.

For example, given a "config.txt" file consisting of these lines:

  doctype: "-//ACME//DTD HTML 3.14159//EN"
  logical-emphasis: yes

...and an HTML test file "bug.html" containing:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <html>
  <head>
  <title>Bug test</title>  <body>
  <p>This <b>paragraph</b> has <i>presentation</i> markup
  which should be replaced by structural markup.</p>  </html>

...then the following command:

  tidy -config config.txt bug.html

will *not* replace the <i> tags with <em> tags, etc.  A corresponding test
using the "alt-text" option will display the same behavior.  Note that if
the order of the configuration options is reversed, or if an intervening
comment is placed after the "doctype" option, replacement will occur.

The fault appears to be in the "ParseString" routine of config.c, line 717:

  NextProperty();

This line should be removed.

The "NextProperty" routine discards characters until the succeeding line is
encountered.  However, the ParseString routine calls NextProperty *after*
it has consumed all of the "doctype" (e.g.) option line and the character
pointer is pointing at the first character of the succeeding line (at the
"logical-emphasis" line in my example).  Therefore, NextProperty skips that
line and returns with the pointer at the start of the *second* following
line (EOF in my example).


Note that there appears to be another (unrelated) problem in the
ParseString routine.  Lines 691-702 of config.c are:

  if (IsWhite(c))    if (waswhite)
    {
      AdvanceChar();
      continue;
    }

    c = ' ';
  }
  else
    waswhite = no;

ParseString appears to attempt to compress white space in "doctype" and
"alt-text" parameters.  However, there appears to be a flaw in the logic
that prevents this from occurring ("waswhite" is never set true, so the
compression doesn't work).  Regardless, I believe that compressing white
space from the user-specified parameters is incorrect.  If I specify a
"doctype" or "alt-text" parameter containing extra space, then Tidy
shouldn't second-guess me by removing it (unless it's required by the HTML
spec; is it?).

                                      -- Dave Bryan
Received on Friday, 24 March 2000 13:12:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT