- From: Ignacio Vazquez-Abrams <ignacio@openservices.net>
- Date: Tue, 19 Jun 2001 17:45:20 -0400 (EDT)
- To: <html-tidy@w3.org>
- Message-ID: <Pine.LNX.4.33.0106191733450.26671-200000@terbidium.openservices.net>
On Tue, 19 Jun 2001, Reitzel, Charlie wrote: > I have the same type of thing in mind. It is the kind of thing you should > be able to accomplish with a library version of Tidy. > > In the meantime, if you preserve the original input, you should be able to > simply subtract N lines from the _reported_ line numbers - that is, I am > assuming, the N lines of header you put in front of the user's input. > > take it easy, > Charlie > > -----Original Message----- > From: Ignacio Vazquez-Abrams [mailto:ignacio@openservices.net] > Sent: Tuesday, June 19, 2001 10:02 AM > To: html-tidy@w3.org > Subject: Re: XML Tidy? > > > On Mon, 18 Jun 2001, Klaus Johannes Rusch wrote: > > > In <Pine.LNX.4.33.0106181025230.30759-100000@terbidium.openservices.net>, > > Ignacio Vazquez-Abrams <ignacio@openservices.net> writes: > > > I was wondering if there exists any version or variant > > > or configuration of Tidy which could deal with an XML/HTML > > > hybrid? More specifically I need to just deal with the > > > stuff that would appear inside the BODY tag, without adding > > > the HTML, HEAD, and TITLE tags. I have tried a lot of > > > configuration options for HTML Tidy, but have had no > > > success so far. > > > > You can either use the -xml option to only process the > > fragment as an XML fragment, however this will not do > > any of the usual HTML cleanup. > > The problem is that I need to do the HTML cleanup; I need to clean up a > pseudoHTML document entered by the user, and this document will only contain > a piece of an HTML page. > > > Or, run the fragment through tidy using the -asxml > > option, then extract everything between <body> and </body>. > > While that works for the output stage (Oh no! select="html/body"! The > horror!:P ), I would also like to provide entry-time verification and > cleanup of code. Having to search for /line ([0-9]+) / and subtracting when > displaying errors to the user, while not difficult, is something I'd like to > avoid. Submitted for refinement is a patch for doing almost exactly the original requirements. It defines another configuration option ("body-only") that skips generation of the <html> node and all header tags in parser.c. The only problem seems to be that if it actually encounters <html> and/or header tags, well, it dies a horrible death. If somebody could take a look at the patch and clean it up that would be much appreciated. Me take it easy? Riiight... :) -- Ignacio Vazquez-Abrams <ignacio@openservices.net>
Attachments
- TEXT/PLAIN attachment: tidy4aug00-bodyonly.patch
Received on Tuesday, 19 June 2001 18:14:52 UTC