Re: XML Tidy? from Ignacio Vazquez-Abrams on 2001-06-19 (html-tidy@w3.org from April to June 2001)

From: Ignacio Vazquez-Abrams <ignacio@openservices.net>
Date: Tue, 19 Jun 2001 10:02:22 -0400 (EDT)
To: <html-tidy@w3.org>
Message-ID: <Pine.LNX.4.33.0106190953570.16384-100000@terbidium.openservices.net>

On Mon, 18 Jun 2001, Klaus Johannes Rusch wrote:

> In <Pine.LNX.4.33.0106181025230.30759-100000@terbidium.openservices.net>, Ignacio Vazquez-Abrams <ignacio@openservices.net> writes:
> > I was wondering if there exists any version or variant or configuration of
> > Tidy which could deal with an XML/HTML hybrid? More specifically I need to
> > just deal with the stuff that would appear inside the BODY tag, without adding
> > the HTML, HEAD, and TITLE tags. I have tried a lot of configuration options
> > for HTML Tidy, but have had no success so far.
>
> You can either use the -xml option to only process the fragment as an XML
> fragment, however this will not do any of the usual HTML cleanup.

The problem is that I need to do the HTML cleanup; I need to clean up a
pseudoHTML document entered by the user, and this document will only contain a
piece of an HTML page.

> Or, run the fragment through tidy using the -asxml option, then extract
> everything between <body> and </body>.

While that works for the output stage (Oh no! select="html/body"! The horror!
:P ), I would also like to provide entry-time verification and cleanup of
code. Having to search for /line ([0-9]+) / and subtracting when displaying
errors to the user, while not difficult, is something I'd like to avoid.

-- 
Ignacio Vazquez-Abrams  <ignacio@openservices.net>

Received on Tuesday, 19 June 2001 10:02:25 UTC