W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Re: [BUGFIX] Tidy ignores config line after "doctype" and "alt-text"

From: Dave Raggett <dsr@w3.org>
Date: Fri, 24 Mar 2000 11:47:14 -0600
To: "J. David Bryan" <jdbryan@acm.org>
Cc: HTML Tidy List <html-tidy@w3.org>
Message-ID: <OF1C785380.C3A03BD5-ON86256894.00492F4C@rfdinc.com>

On Wed, 23 Feb 2000, J. David Bryan wrote:

> This report is for the Tidy version of 13th January 2000.
>
> When a configuration file contains the "alt-text:" option or the
> "doctype:  <fpi>" option, Tidy will ignore the line in the
> configuration file that follows either of these options.
>
> For example, given a "config.txt" file consisting of these lines:
>
>   doctype: "-//ACME//DTD HTML 3.14159//EN"
>   logical-emphasis: yes
>
> ...and an HTML test file "bug.html" containing:
>
>   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
>   <html>
>   <head>
>   <title>Bug test</title>>   <body>
>   <p>This <b>paragraph</b> has <i>presentation</i> markup
>   which should be replaced by structural markup.</p>>   </html>
>
> ...then the following command:
>
>   tidy -config config.txt bug.html
>
> will *not* replace the <i> tags with <em> tags, etc.  A
> corresponding test using the "alt-text" option will display the
> same behavior.  Note that if the order of the configuration
> options is reversed, or if an intervening comment is placed
> after the "doctype" option, replacement will occur.
>
> The fault appears to be in the "ParseString" routine of
> config.c, line 717:
>
>   NextProperty();
>
> This line should be removed.
>
> The "NextProperty" routine discards characters until the
> succeeding line is encountered.  However, the ParseString
> routine calls NextProperty *after* it has consumed all of the
> "doctype" (e.g.) option line and the character pointer is
> pointing at the first character of the succeeding line (at the
> "logical-emphasis" line in my example).  Therefore, NextProperty
> skips that line and returns with the pointer at the start of the
> *second* following line (EOF in my example).

Well spotted.

> Note that there appears to be another (unrelated) problem in the
> ParseString routine.  Lines 691-702 of config.c are:
>
>   if (IsWhite(c))>     if (waswhite)
>     {
>       AdvanceChar();
>       continue;
>     }
>
>     c = ' ';
>   }
>   else
>     waswhite = no;
>
> ParseString appears to attempt to compress white space in
> "doctype" and "alt-text" parameters.  However, there appears to
> be a flaw in the logic that prevents this from occurring
> ("waswhite" is never set true, so the compression doesn't work).
> Regardless, I believe that compressing white space from the
> user-specified parameters is incorrect.  If I specify a
> "doctype" or "alt-text" parameter containing extra space, then
> Tidy shouldn't second-guess me by removing it (unless it's
> required by the HTML spec; is it?).

Tidy already applies such whitespace munging to attribute values
and it seems reasonable to apply the same to the alt-text values
and to doctype FPIs. ParseString is only used for these two
purposes.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 385 320 444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)
Received on Friday, 24 March 2000 13:13:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT