Re: XML Parse

On Fri, 21 Jul 2000, Dickey, Will wrote:

> Hello.  I would like to parse the results of a tidy operation
> into a DOM. I'm not sure if this is possible, and it apparently
> is not with MSXML, as it raises numerous errors on any HTML
> document I tidy and then try to parse.
> 
> Is my premise wrong - parsing HTML into an XML DOM can't be
> done, or am I using the wrong parser?  Any help would be greatly
> appreciated.

The simplest thing is to use Tidy to clean up the markup and
convert it into well formed XML, and follow this up with an
off-the-shelf XML tool, e.g. the IBM java tool kit for XML.

You could alternatively add code into Tidy to do what you want.
Tidy provides a simple interface for walking markup trees, although
it doesn't conform the the DOM, but this is hardly surprising given
that work on Tidy started before the DOM.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)

Received on Monday, 24 July 2000 06:59:39 UTC