- From: Dave Raggett <dsr@w3.org>
- Date: Wed, 11 Jun 2003 20:11:25 +0100 (BST)
- To: Matthew Stanfield <mattstan@blueyonder.co.uk>
- cc: jany.quintard@free.fr, Fred.Bone@dial.pipex.com, html-tidy@w3.org
You could fairly easily modify Tidy to strip the form start and end tags, but this isn't a feature of the current code base as is. Dave Raggett <dsr@w3.org> W3C lead for voice and multimodal. http://www.w3.org/People/Raggett +44 1225 866240 (or 867351) On Wed, 11 Jun 2003, Matthew Stanfield wrote: > Dave Raggett wrote: > > Tidy does its best but it is too dangerous for it move the form > > start and end tags arbitrarily. To do that would require a much > > more in depth understanding of how a given page works. > > > > This problem is why it makes sense to allow form fields to > > directly reference the form rather than rely on being within > > a form element. This is one of the benefits of XForms. We just > > didn't realize the issue back in 1993 :-( > > > > Dave Raggett <dsr@w3.org> W3C lead for voice and multimodal. > > http://www.w3.org/People/Raggett +44 1225 866240 (or 867351) > > Thanks for the info. Dave. > > One possible way around this problem for me would be to remove all form > tags, is there a way to tell Tidy to do this? All I really need is to get > valid xml outputted. So if I could simply tell Tidy not to process > <form...> and </form> tags then I'd be sorted even if the <input> tags > remained making the output invalid html -- I just want valid xml. Is this > or something like it possible? > > If not I'm going to have to write my own supplementary tidy routine to do > this. :-( > > Many thanks for your help everyone, regards, > > ..matthew > > PS. Dave -- Thanks for creating Tidy, it is very useful to me, and I > appreciate your placing it in the public domain a lot. > > > > > > > > > > On Wed, 11 Jun 2003, Matthew Stanfield wrote: > > > > > >>Jany Quintard wrote: > >> > >>>* Matthew Stanfield [Wed, 11/06/2003 at 15:41 +0100] > >>> > >>> > >>>>>>The problem is that a form is started in a table after the first <tr> but > >>>>>>before that <tr>'s first <td>. Hence the following errors and warning. > >>>>> > >>>>> > >>>>>>How can I make this get tidied. -- I am stuck!! > >>>>> > >>>>> > >>>>>Move <form> to precede <table> and </form> to follow </table> > >>>> > >>>>Thanks. I just worked this out a few minutes before your email arrived, but > >>>>my problem is that I need to Tidy a lot of these pages - too many to do > >>>>manually. Does anyone know how I can get HTML Tidy to tidy this without > >>>>having to change it manually? What options could I use? Or, since I don't > >>>>actually need any of the data in the forms, how I can remove them? Any > >>>>other ideas? > >>> > >>>If your tags are on the same line, use sed > >>>sed s!<table><form>!<form><table>!g > >>>else use awk. > >>>You can even remove the breakline between end and begin tags and use > >>>sed. > >>>It is a bit "brute force", but it should work. > >>> > >>>Jany > >>> > >> > >>Thanks. I believe sed and awk are UNIX commands. I use HTML Tidy in some > >>screen scraping software I've written. The html is downloaded and tidied > >>into xml using tidied using Charles Reitzel's .net Wrapper for HTML Tidy, > >>then I run some xslt on the xml to extract the useful information. I don't > >>want to have to run an extra non-tidy routine to tidy my data. Is there no > >>way to achieve what I want from within Tidy? > >> > >>Thanks guys and regards, > >> > >>..matthew > >> > >> > > > > > >
Received on Wednesday, 11 June 2003 15:11:09 UTC