- From: Dave Raggett <dsr@w3.org>
- Date: Mon, 24 Jul 2000 11:59:32 +0100 (GMT Daylight Time)
- To: "Dickey, Will" <wdickey@gettuit.com>
- cc: "'html-tidy@w3.org'" <html-tidy@w3.org>, gerald@w3.org
On Fri, 21 Jul 2000, Dickey, Will wrote: > Hello. I would like to parse the results of a tidy operation > into a DOM. I'm not sure if this is possible, and it apparently > is not with MSXML, as it raises numerous errors on any HTML > document I tidy and then try to parse. > > Is my premise wrong - parsing HTML into an XML DOM can't be > done, or am I using the wrong parser? Any help would be greatly > appreciated. The simplest thing is to use Tidy to clean up the markup and convert it into well formed XML, and follow this up with an off-the-shelf XML tool, e.g. the IBM java tool kit for XML. You could alternatively add code into Tidy to do what you want. Tidy provides a simple interface for walking markup trees, although it doesn't conform the the DOM, but this is hardly surprising given that work on Tidy started before the DOM. Regards, -- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile) World Wide Web Consortium (on assignment from HP Labs)
Received on Monday, 24 July 2000 06:59:39 UTC