W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

Re: MSHTML as an alternative to Tidy

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 14 Feb 2003 14:46:31 +0100
To: "Lucas W. Fletcher" <lucas@dealersinnotions.com>
Cc: html-tidy@w3.org
Message-ID: <3e59f230.68305327@smtp.bjoern.hoehrmann.de>

* Lucas W. Fletcher wrote:
>Is anyone aware of a publicly available program that
>uses the MSHTML API to convert an HTML file into XHTML?

I wrote a little Perl script that converts the Internet Explorer DOM to
a SAX (simple API for XML) event stream (search the archives of
tidy-develop@lists.sourceforge.net / perl-xml@lists.activestate.com).

>If one assumes that the ultimate version of Tidy is one
>where it can parse pages in as fault-tolerant a manner as
>the popular browsers such as IE, then wouldn't it make sense
>to actually utilize the DOM exposed by the browser itself
>in order to create the XHTML?

The DOM created by Internet Explorer from broken documents is rather
useless. For example


In the MSHTML DOM this is represented as beeing


while the expected result (what IE renders) is either



Received on Friday, 14 February 2003 08:46:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:53 UTC