- From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
- Date: Tue, 19 Jun 2001 19:58:47 -0400
- To: "'Ignacio Vazquez-Abrams'" <ignacio@openservices.net>, html-tidy@w3.org
Great idea. I'd try a somewhat different tact, tho. Rather than mess w/ the parser _or_ your input. I'd just feed the prospective <body> contents to Tidy as is. Let the parser merrily add <head><title></title></head><body> your stuff! </body>. Then, if the --body-only option is set, call a new PPrintContent() function in pprint.c. This function will call PPrintTree() for each member of the body->content node list. The important point here is you can safely add functionality without changing w/ the inner workings of the parser. Even better, it will report the line numbers correctly for any errors/warnings it emits. take it as easy as you can stand it (which is qualified by noting that I am an uptight east coaster. It's one those "do as I say, not as I do" kind of things), Charlie -----Original Message----- From: Ignacio Vazquez-Abrams [mailto:ignacio@openservices.net] Sent: Tuesday, June 19, 2001 5:45 PM To: html-tidy@w3.org Subject: RE: XML Tidy? On Tue, 19 Jun 2001, Reitzel, Charlie wrote: > I have the same type of thing in mind. It is the kind > of thing you should be able to accomplish with a > library version of Tidy. > > In the meantime, if you preserve the original input, > you should be able to simply subtract N lines from the > _reported_ line numbers - that is, I am assuming, the > N lines of header you put in front of the user's > input. > > take it easy, > Charlie > > -----Original Message----- > From: Ignacio Vazquez-Abrams [mailto:ignacio@openservices.net] > Sent: Tuesday, June 19, 2001 10:02 AM > To: html-tidy@w3.org > Subject: Re: XML Tidy? > > > On Mon, 18 Jun 2001, Klaus Johannes Rusch wrote: > > > In <Pine.LNX.4.33.0106181025230.30759-100000@terbidium.openservices.net>, > > Ignacio Vazquez-Abrams <ignacio@openservices.net> writes: > > > I was wondering if there exists any version or variant > > > or configuration of Tidy which could deal with an > > > XML/HTML hybrid? More specifically I need to just > > > deal with the stuff that would appear inside the > > > BODY tag, without adding the HTML, HEAD, and TITLE > > > tags. I have tried a lot of configuration options > > > for HTML Tidy, but have had no success so far. > > > > You can either use the -xml option to only process the > > fragment as an XML fragment, however this will not do > > any of the usual HTML cleanup. > > The problem is that I need to do the HTML cleanup; I > need to clean up a pseudoHTML document entered by the > user, and this document will only contain a piece of > an HTML page. > > > Or, run the fragment through tidy using the -asxml > > option, then extract everything between <body> and > > </body>. > > While that works for the output stage (Oh no! > select="html/body"! The horror!:P ), I would also > like to provide entry-time verification and > cleanup of code. Having to search for /line ([0-9]+) > / and subtracting when displaying errors to the user, > while not difficult, is something I'd like to avoid. Submitted for refinement is a patch for doing almost exactly the original requirements. It defines another configuration option ("body-only") that skips generation of the <html> node and all header tags in parser.c. The only problem seems to be that if it actually encounters <html> and/or header tags, well, it dies a horrible death. If somebody could take a look at the patch and clean it up that would be much appreciated. Me take it easy? Riiight... :) -- Ignacio Vazquez-Abrams <ignacio@openservices.net>
Received on Tuesday, 19 June 2001 19:58:10 UTC