Re: MSHTML as an alternative to Tidy from Nikolay Georgiev on 2003-02-17 (html-tidy@w3.org from January to March 2003)

From: Nikolay Georgiev <nikolay.georgiev@netvalue.com>
Date: Mon, 17 Feb 2003 11:43:05 +0100
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: "Lucas W. Fletcher" <lucas@dealersinnotions.com>, html-tidy@w3.org
Message-ID: <3E50BCB9.7030506@netvalue.com>

Hello,

You can try if you want with the Mozilla Parser (the gecko engine). I 
use the XPFE of mozilla platform and it parses the HTML and creates the 
DOM structure as expected.

After parsing the previous example (bad html) it creates the following DOM:

<HTML>
    <HEAD/>
    <BODY>
        <P>
            1
            <EM>2
                  <STRONG>3</STRONG>
            </EM>  
            <STRONG>4</STRONG>
        5
        </P>
        <P>6</P>
 </BODY>
</HTML>

I realy use it from some time now to create XML from HTML (to make it valid)

Regards Nikolay

Bjoern Hoehrmann wrote:

>* Lucas W. Fletcher wrote:
>  
>
>>Is anyone aware of a publicly available program that
>>uses the MSHTML API to convert an HTML file into XHTML?
>>    
>>
>
>I wrote a little Perl script that converts the Internet Explorer DOM to
>a SAX (simple API for XML) event stream (search the archives of
>tidy-develop@lists.sourceforge.net / perl-xml@lists.activestate.com).
>
>  
>
>>If one assumes that the ultimate version of Tidy is one
>>where it can parse pages in as fault-tolerant a manner as
>>the popular browsers such as IE, then wouldn't it make sense
>>to actually utilize the DOM exposed by the browser itself
>>in order to create the XHTML?
>>    
>>
>
>The DOM created by Internet Explorer from broken documents is rather
>useless. For example
>
>  <p>1<em>2<strong>3</em>4</strong>5<p>6
>
>In the MSHTML DOM this is represented as beeing
>
>  <p>1<em>2<strong>34</strong></em>45</p>
>  <p>6</p>
>
>while the expected result (what IE renders) is either
>
>  <p>1<em>2</em><strong><em>3</em>4</strong>5</p>
>  <p>6</p>
>
>or
>
>  <p>1<em>2<strong>3</strong></em><strong>4</strong>5</p>
>  <p>6</p>
>
>
>  
>

-- 
Nikolay Georgiev

Recherche et Développement

NetValue (http://www.netvalue.com)
67, rue Anatole France
92 300 Levallois-Perret, France
tel : +33.1.47.59.57.42

Received on Monday, 17 February 2003 05:45:15 UTC