W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2003

Re: Problem tidying.

From: Dave Raggett <dsr@w3.org>
Date: Wed, 11 Jun 2003 20:11:25 +0100 (BST)
To: Matthew Stanfield <mattstan@blueyonder.co.uk>
cc: jany.quintard@free.fr, Fred.Bone@dial.pipex.com, html-tidy@w3.org
Message-ID: <Pine.LNX.4.53.0306112010370.1056@localhost.localdomain>

You could fairly easily modify Tidy to strip the form start and end
tags, but this isn't a feature of the current code base as is.

 Dave Raggett <dsr@w3.org>  W3C lead for voice and multimodal.
 http://www.w3.org/People/Raggett +44 1225 866240 (or 867351)


On Wed, 11 Jun 2003, Matthew Stanfield wrote:

> Dave Raggett wrote:
> > Tidy does its best but it is too dangerous for it move the form
> > start and end tags arbitrarily. To do that would require a much
> > more in depth understanding of how a given page works.
> >
> > This problem is why it makes sense to allow form fields to
> > directly reference the form rather than rely on being within
> > a form element. This is one of the benefits of XForms. We just
> > didn't realize the issue back in 1993 :-(
> >
> >  Dave Raggett <dsr@w3.org>  W3C lead for voice and multimodal.
> >  http://www.w3.org/People/Raggett +44 1225 866240 (or 867351)
>
> Thanks for the info. Dave.
>
> One possible way around this problem for me would be to remove all form
> tags, is there a way to tell Tidy to do this? All I really need is to get
> valid xml outputted. So if I could simply tell Tidy not to process
> <form...> and </form> tags then I'd be sorted even if the <input> tags
> remained making the output invalid html -- I just want valid xml. Is this
> or something like it possible?
>
> If not I'm going to have to write my own supplementary tidy routine to do
> this. :-(
>
> Many thanks for your help everyone, regards,
>
> ..matthew
>
> PS. Dave -- Thanks for creating Tidy, it is very useful to me, and I
> appreciate your placing it in the public domain a lot.
>
>
>
>
> >
> >
> > On Wed, 11 Jun 2003, Matthew Stanfield wrote:
> >
> >
> >>Jany Quintard wrote:
> >>
> >>>* Matthew Stanfield [Wed, 11/06/2003 at 15:41 +0100]
> >>>
> >>>
> >>>>>>The problem is that a form is started in a table after the first <tr> but
> >>>>>>before that <tr>'s first <td>. Hence the following errors and warning.
> >>>>>
> >>>>>
> >>>>>>How can I make this get tidied. -- I am stuck!!
> >>>>>
> >>>>>
> >>>>>Move <form> to precede <table> and </form> to follow </table>
> >>>>
> >>>>Thanks. I just worked this out a few minutes before your email arrived, but
> >>>>my problem is that I need to Tidy a lot of these pages - too many to do
> >>>>manually. Does anyone know how I can get HTML Tidy to tidy this without
> >>>>having to change it manually? What options could I use? Or, since I don't
> >>>>actually need any of the data in the forms, how I can remove them? Any
> >>>>other ideas?
> >>>
> >>>If your tags are on the same line, use sed
> >>>sed s!<table><form>!<form><table>!g
> >>>else use awk.
> >>>You can even remove the breakline between end and begin tags and use
> >>>sed.
> >>>It is a bit "brute force", but it should work.
> >>>
> >>>Jany
> >>>
> >>
> >>Thanks. I believe sed and awk are UNIX commands. I use HTML Tidy in some
> >>screen scraping software I've written. The html is downloaded and tidied
> >>into xml using tidied using Charles Reitzel's .net Wrapper for HTML Tidy,
> >>then I run some xslt on the xml to extract the useful information. I don't
> >>want to have to run an extra non-tidy routine to tidy my data. Is there no
> >>way to achieve what I want from within Tidy?
> >>
> >>Thanks guys and regards,
> >>
> >>..matthew
> >>
> >>
> >
> >
>
>
Received on Wednesday, 11 June 2003 15:11:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:54 GMT