Validate only the body text of a HTML document

Hello.

I'm using tidy executable, version 1st March 2002 in Debian Woody.
I'm trying to parse some html text that inputs a user in a TEXTAREA
(well, I'm using a WYSIWYG editor, htmlarea). I want to filter out
those Word tags.

But the tidy command thinks it has a complete HTML page and adds the
header information.

Lets see an example:

Supose that the user writes this in the TEXTAREA:
"<table><tr><td>Firs cell</td><td>Second cell</tr></table>"

Then I call tidy from a script with this text (throught the standar
input). The output is something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1st March 2002), see www.w3.org">
<title></title>
</head>
<body>
<table>
<tr>
<td>Firs cell</td>
<td>Second cell</td>
</tr>
</table>
</body>
</html>

And I want only this part:
<table>
<tr>
<td>Firs cell</td>
<td>Second cell</td>
</tr>
</table>


Is that posible?

Thanks in advance.

Received on Friday, 4 April 2003 06:27:06 UTC