- From: Reza Ferrydiansyah <ref11+@pitt.edu>
- Date: Thu, 20 Nov 2003 19:07:57 -0500
- To: html-tidy@w3.org
I have had some problems with JTidy's (and I believe Tidy's in general)
handling of forms inside of table:
For example the html snippet of:
Form testing
<table>
<form action='pageA'>
<tr><td><form>Input 1 Form 1: <input name='v'/></td><td>Input 2 Form 1:
<input name='w'/></form><form action='pageB'>Input 1 Form 2: <input
name='x'></td></tr>
<tr><td>Input 2 Form 2: <input name='y'></td></tr></table>Input 3 Form 2:
<input name='z'></form>
will be turned into
Form testing
<form action='pageA'></form>
<table>
<tr>
<td>
<form>Input 1 Form 1: <input name='v' /></form>
</td>
<td>Input 2 Form 1: <input name='w' />
<form action='pageB'>Input 1 Form 2: <input name='x' /></form>
</td>
</tr>
<tr>
<td>Input 2 Form 2: <input name='y' /></td>
</tr>
</table>
Input 3 Form 2: <input name='z' /> Hello World
Notice that input w, y, z are now input without a form.
I have tried using another HTML cleaner, TagSoup (unfortunately, still a
lot of errors) which returns:
<form enctype = "application/x-www-form-urlencoded" method = "get" action =
"pageA">
<table><tbody><tr><td colspan = "1" rowspan = "1">
<form enctype = "application/x-www-form-urlencoded" method = "get">Input
1 Form 1: <input type = "text" name = "v"></input></form></td>
<td colspan = "1" rowspan = "1">Input 2 Form 1: <input type = "text"
name = "w"></input></td></tr></tbody>
</table>
</form>
<form enctype = "application/x-www-form-urlencoded" method = "get" action
= "pageB">
Input 1 Form 2: <input type = "text" name = "x"></input>
<table><tbody><tr><td colspan = "1" rowspan = "1">Input 2 Form 2:
<input type = "text" name = "y"></input></td></tr></tbody></table>
Input 3 Form 2: <input type = "text" name = "z"></input>
</form>
In here, all inputs are inside a form. although the input v is inside a
form which is inside a form.
The way I see it, tidy usually deletes erroneous tags.
Therefore
<table>
<form><tr><td>...</form> ...
will be changed to
<form>
<table><tr><td>...</form>...
because a form cannot be inside a table. It is then corrected to
<form>
</form>
<table><tr><td>...
While in tagsoup
<table>
<form><tr><td>...</form> ...
is changed into
<table>
</table>
<form><tr><td>...</form> ...
and then
<table>
</table>
<form><table><tr><td>...</form> ...
in most cases I think the tag soup version is much more preferable than the
tidy version. Anybody know how to do this with jtidy?
Relevantly, one of the task I am doing currently is cleaning up HTML code
from popular websites. Yahoo, Lycos all have trouble with their form in the
form of <form> tags located under <table> tags (<table><form><tr><td>...)
So I am currently having a lot of trouble with this...
--
Reza Ferrydiansyah
Received on Thursday, 20 November 2003 19:12:46 UTC