W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

Re: MSHTML as an alternative to Tidy

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 14 Feb 2003 14:46:31 +0100
To: "Lucas W. Fletcher" <lucas@dealersinnotions.com>
Cc: html-tidy@w3.org
Message-ID: <3e59f230.68305327@smtp.bjoern.hoehrmann.de>

* Lucas W. Fletcher wrote:
>Is anyone aware of a publicly available program that
>uses the MSHTML API to convert an HTML file into XHTML?

I wrote a little Perl script that converts the Internet Explorer DOM to
a SAX (simple API for XML) event stream (search the archives of
tidy-develop@lists.sourceforge.net / perl-xml@lists.activestate.com).

>If one assumes that the ultimate version of Tidy is one
>where it can parse pages in as fault-tolerant a manner as
>the popular browsers such as IE, then wouldn't it make sense
>to actually utilize the DOM exposed by the browser itself
>in order to create the XHTML?

The DOM created by Internet Explorer from broken documents is rather
useless. For example

  <p>1<em>2<strong>3</em>4</strong>5<p>6

In the MSHTML DOM this is represented as beeing

  <p>1<em>2<strong>34</strong></em>45</p>
  <p>6</p>

while the expected result (what IE renders) is either

  <p>1<em>2</em><strong><em>3</em>4</strong>5</p>
  <p>6</p>

or

  <p>1<em>2<strong>3</strong></em><strong>4</strong>5</p>
  <p>6</p>
Received on Friday, 14 February 2003 08:46:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:53 GMT