W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2003

RE: Validate only the body text of a HTML document

From: Ben Noblet <ben@lateralsystems.com.au>
Date: Fri, 4 Apr 2003 23:16:01 +1000
To: <html-tidy@w3.org>
Message-ID: <007101c2faac$5a8bde70$0202a8c0@SKYWALKER>

I think show-body-only should do what you want.  I use it for exactly
the same purpose.  If you put the contents of the textarea inside
<body></body> it should not add the header information (and not return
the actual <body> tags either)

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0  
This option specifies if Tidy should print only the contents of the body
tag as an HTML fragment. Useful for incorporating existing whole pages
as a portion of another page. 


-----Original Message-----
From: html-tidy-request@w3.org [mailto:html-tidy-request@w3.org] On
Behalf Of Jesus Angel del Pozo
Sent: Friday, 4 April 2003 9:20 PM
To: html-tidy@w3.org
Subject: Validate only the body text of a HTML document


I'm using tidy executable, version 1st March 2002 in Debian Woody. I'm
trying to parse some html text that inputs a user in a TEXTAREA (well,
I'm using a WYSIWYG editor, htmlarea). I want to filter out those Word

But the tidy command thinks it has a complete HTML page and adds the
header information.

Lets see an example:

Supose that the user writes this in the TEXTAREA: "<table><tr><td>Firs
cell</td><td>Second cell</tr></table>"

Then I call tidy from a script with this text (throught the standar
input). The output is something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1st March 2002), see www.w3.org">
<title></title> </head> <body> <table> <tr> <td>Firs cell</td>
<td>Second cell</td> </tr> </table> </body> </html>

And I want only this part:
<td>Firs cell</td>
<td>Second cell</td>

Is that posible?

Thanks in advance.
Received on Friday, 4 April 2003 08:18:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:06:49 UTC