W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2003

Re: Problem tidying.

From: Matthew Stanfield <mattstan@blueyonder.co.uk>
Date: Wed, 11 Jun 2003 18:53:05 +0100
Message-ID: <3EE76C81.1050506@blueyonder.co.uk>
To: Dave Raggett <dsr@w3.org>
CC: jany.quintard@free.fr, Fred.Bone@dial.pipex.com, html-tidy@w3.org

Dave Raggett wrote:
> Tidy does its best but it is too dangerous for it move the form
> start and end tags arbitrarily. To do that would require a much
> more in depth understanding of how a given page works.
> This problem is why it makes sense to allow form fields to
> directly reference the form rather than rely on being within
> a form element. This is one of the benefits of XForms. We just
> didn't realize the issue back in 1993 :-(
>  Dave Raggett <dsr@w3.org>  W3C lead for voice and multimodal.
>  http://www.w3.org/People/Raggett +44 1225 866240 (or 867351)

Thanks for the info. Dave.

One possible way around this problem for me would be to remove all form 
tags, is there a way to tell Tidy to do this? All I really need is to get 
valid xml outputted. So if I could simply tell Tidy not to process 
<form...> and </form> tags then I'd be sorted even if the <input> tags 
remained making the output invalid html -- I just want valid xml. Is this 
or something like it possible?

If not I'm going to have to write my own supplementary tidy routine to do 
this. :-(

Many thanks for your help everyone, regards,


PS. Dave -- Thanks for creating Tidy, it is very useful to me, and I 
appreciate your placing it in the public domain a lot.

> On Wed, 11 Jun 2003, Matthew Stanfield wrote:
>>Jany Quintard wrote:
>>>* Matthew Stanfield [Wed, 11/06/2003 at 15:41 +0100]
>>>>>>The problem is that a form is started in a table after the first <tr> but
>>>>>>before that <tr>'s first <td>. Hence the following errors and warning.
>>>>>>How can I make this get tidied. -- I am stuck!!
>>>>>Move <form> to precede <table> and </form> to follow </table>
>>>>Thanks. I just worked this out a few minutes before your email arrived, but
>>>>my problem is that I need to Tidy a lot of these pages - too many to do
>>>>manually. Does anyone know how I can get HTML Tidy to tidy this without
>>>>having to change it manually? What options could I use? Or, since I don't
>>>>actually need any of the data in the forms, how I can remove them? Any
>>>>other ideas?
>>>If your tags are on the same line, use sed
>>>sed s!<table><form>!<form><table>!g
>>>else use awk.
>>>You can even remove the breakline between end and begin tags and use
>>>It is a bit "brute force", but it should work.
>>Thanks. I believe sed and awk are UNIX commands. I use HTML Tidy in some
>>screen scraping software I've written. The html is downloaded and tidied
>>into xml using tidied using Charles Reitzel's .net Wrapper for HTML Tidy,
>>then I run some xslt on the xml to extract the useful information. I don't
>>want to have to run an extra non-tidy routine to tidy my data. Is there no
>>way to achieve what I want from within Tidy?
>>Thanks guys and regards,
Received on Wednesday, 11 June 2003 13:53:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:06:49 UTC