W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2003

Re: Problem tidying.

From: Matthew Stanfield <mattstan@blueyonder.co.uk>
Date: Wed, 11 Jun 2003 16:32:59 +0100
Message-ID: <3EE74BAB.6070302@blueyonder.co.uk>
To: jany.quintard@free.fr
CC: Fred.Bone@dial.pipex.com, html-tidy@w3.org

Jany Quintard wrote:
> * Matthew Stanfield [Wed, 11/06/2003 at 15:41 +0100]
> 
>>>>The problem is that a form is started in a table after the first <tr> but 
>>>>before that <tr>'s first <td>. Hence the following errors and warning.
>>>
>>>
>>>>How can I make this get tidied. -- I am stuck!!
>>>
>>>
>>>Move <form> to precede <table> and </form> to follow </table>
>>
>>Thanks. I just worked this out a few minutes before your email arrived, but 
>>my problem is that I need to Tidy a lot of these pages - too many to do 
>>manually. Does anyone know how I can get HTML Tidy to tidy this without 
>>having to change it manually? What options could I use? Or, since I don't 
>>actually need any of the data in the forms, how I can remove them? Any 
>>other ideas?
> 
> If your tags are on the same line, use sed
> sed s!<table><form>!<form><table>!g
> else use awk.
> You can even remove the breakline between end and begin tags and use
> sed.
> It is a bit "brute force", but it should work.
> 
> Jany
> 

Thanks. I believe sed and awk are UNIX commands. I use HTML Tidy in some 
screen scraping software I've written. The html is downloaded and tidied 
into xml using tidied using Charles Reitzel's .net Wrapper for HTML Tidy, 
then I run some xslt on the xml to extract the useful information. I don't 
want to have to run an extra non-tidy routine to tidy my data. Is there no 
way to achieve what I want from within Tidy?

Thanks guys and regards,

..matthew
Received on Wednesday, 11 June 2003 11:33:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:54 GMT