W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2000

RE: tidy problems on www.altavista.com

From: <susan_levine@peoplesoft.com>
Date: Thu, 22 Jun 2000 07:18:28 -0700
To: biron@ifh.de
cc: html-tidy@w3.org, ittay@vigiltech.com
Message-ID: <OFFDC65A70.6DD3EA0F-ON88256906.004E8C06@peoplesoft.com>

Alex- can you recommend the optimal tool for such a task - do you know of

                    Biron"                To:     "Ittay Freiman" <ittay@vigiltech.com>                             
                    <biron@ifh.de>        cc:     "'html-tidy@w3.org'" <html-tidy@w3.org>                           
                    Sent by:              Subject:     RE: tidy problems on www.altavista.com                       
                    06/22/00 01:58                                                                                  

On Thu, 22 Jun 2000, Ittay Freiman wrote:

> you are, of course, right. however, i need to parse files as the regular
> browser parses them, that is, as the writer of the page intended on them
> be parsed.

First of all, these can be two different things.
Secondly, I understand your ambition, but fear tidy is not the optimal
tool for that task - it was designed for something else.

> more than that, i think tidy is wrong here. td is an inline tag,
> while form is a block, so <td> followed by <form> should be converted to
> <td></td><form>. the same thing goes to the /form.

Form is block level, correct. TD however may contain flow. flow may
contain block level. So td may contain form. Please search
http://www.w3.org/TR/REC-html40/sgml/dtd.html to see this:

<!ENTITY % block
                "P | %heading; | %list; | %preformatted; | DL | DIV |
i.e. <form> is block (as is table)

<!ELEMENT (TH|TD)  - O (%flow;)*       -- table header cell, table data
i.e. <td> may contain flow
<!ENTITY % flow "%block; | %inline;">
i.e. block is flow.

So tidy's result is syntax correct here:
<td><form></td></form> -> <td><form></form</td>

A different question is whether a different correct syntax comes closer
to what the author intended:


This is to my understanding a very common syntax when using

<!ELEMENT FORM - - (%block;|SCRIPT)+ -(FORM) -- interactive form -->
i.e. <form> may contain block, so the above example is also legal.

So tidy has to choose which one of the two legal syntaxes is the one the
author intended. I am not shure that the second one is _always_
the better one (in your case it certainly seems so.) But maybe the
default should be
<td><form></td></form> -> <form><table></table></form> instead of
<td><form></td></form> -> <td><form></form</td>

Cheers alex          Alexander Biron

Support the ban of Dihydrogen Monoxide: http://www.dhmo.org/

work:     http://www.ifh.de/~biron/      private:
     Tel (+49)33762-77-483    Tel(+49)30-4948857
     mailto:biron@ifh.de           mailto:biron@frohnau-flamingos.de
Received on Thursday, 22 June 2000 10:18:44 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:48 UTC