RE: tidy problems on www.altavista.com

Alex- can you recommend the optimal tool for such a task - do you know of
one?



                                                                                                                    
                    "Alexander                                                                                      
                    Biron"                To:     "Ittay Freiman" <ittay@vigiltech.com>                             
                    <biron@ifh.de>        cc:     "'html-tidy@w3.org'" <html-tidy@w3.org>                           
                    Sent by:              Subject:     RE: tidy problems on www.altavista.com                       
                    html-tidy-requ                                                                                  
                    est@w3.org                                                                                      
                                                                                                                    
                                                                                                                    
                    06/22/00 01:58                                                                                  
                    AM                                                                                              
                                                                                                                    
                                                                                                                    





On Thu, 22 Jun 2000, Ittay Freiman wrote:

> you are, of course, right. however, i need to parse files as the regular
> browser parses them, that is, as the writer of the page intended on them
to
> be parsed.

First of all, these can be two different things.
Secondly, I understand your ambition, but fear tidy is not the optimal
tool for that task - it was designed for something else.

> more than that, i think tidy is wrong here. td is an inline tag,
> while form is a block, so <td> followed by <form> should be converted to
> <td></td><form>. the same thing goes to the /form.

Form is block level, correct. TD however may contain flow. flow may
contain block level. So td may contain form. Please search
http://www.w3.org/TR/REC-html40/sgml/dtd.html to see this:


<!ENTITY % block
                "P | %heading; | %list; | %preformatted; | DL | DIV |
     NOSCRIPT | BLOCKQUOTE | FORM | HR | TABLE | FIELDSET | ADDRESS">
i.e. <form> is block (as is table)

<!ELEMENT (TH|TD)  - O (%flow;)*       -- table header cell, table data
                         cell-->
i.e. <td> may contain flow
<!ENTITY % flow "%block; | %inline;">
i.e. block is flow.

So tidy's result is syntax correct here:
<td><form></td></form> -> <td><form></form</td>

A different question is whether a different correct syntax comes closer
to what the author intended:

<form><table></table></form>

This is to my understanding a very common syntax when using
forms.

<!ELEMENT FORM - - (%block;|SCRIPT)+ -(FORM) -- interactive form -->
i.e. <form> may contain block, so the above example is also legal.


So tidy has to choose which one of the two legal syntaxes is the one the
author intended. I am not shure that the second one is _always_
the better one (in your case it certainly seems so.) But maybe the
default should be
<td><form></td></form> -> <form><table></table></form> instead of
<td><form></td></form> -> <td><form></form</td>


--
Cheers alex          Alexander Biron

Support the ban of Dihydrogen Monoxide: http://www.dhmo.org/

work:     http://www.ifh.de/~biron/      private:
     Tel (+49)33762-77-483    Tel(+49)30-4948857
     mailto:biron@ifh.de           mailto:biron@frohnau-flamingos.de

Received on Thursday, 22 June 2000 10:18:44 UTC