- From: Nikolay Georgiev <nikolay.georgiev@netvalue.com>
- Date: Mon, 17 Feb 2003 11:43:05 +0100
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: "Lucas W. Fletcher" <lucas@dealersinnotions.com>, html-tidy@w3.org
Hello, You can try if you want with the Mozilla Parser (the gecko engine). I use the XPFE of mozilla platform and it parses the HTML and creates the DOM structure as expected. After parsing the previous example (bad html) it creates the following DOM: <HTML> <HEAD/> <BODY> <P> 1 <EM>2 <STRONG>3</STRONG> </EM> <STRONG>4</STRONG> 5 </P> <P>6</P> </BODY> </HTML> I realy use it from some time now to create XML from HTML (to make it valid) Regards Nikolay Bjoern Hoehrmann wrote: >* Lucas W. Fletcher wrote: > > >>Is anyone aware of a publicly available program that >>uses the MSHTML API to convert an HTML file into XHTML? >> >> > >I wrote a little Perl script that converts the Internet Explorer DOM to >a SAX (simple API for XML) event stream (search the archives of >tidy-develop@lists.sourceforge.net / perl-xml@lists.activestate.com). > > > >>If one assumes that the ultimate version of Tidy is one >>where it can parse pages in as fault-tolerant a manner as >>the popular browsers such as IE, then wouldn't it make sense >>to actually utilize the DOM exposed by the browser itself >>in order to create the XHTML? >> >> > >The DOM created by Internet Explorer from broken documents is rather >useless. For example > > <p>1<em>2<strong>3</em>4</strong>5<p>6 > >In the MSHTML DOM this is represented as beeing > > <p>1<em>2<strong>34</strong></em>45</p> > <p>6</p> > >while the expected result (what IE renders) is either > > <p>1<em>2</em><strong><em>3</em>4</strong>5</p> > <p>6</p> > >or > > <p>1<em>2<strong>3</strong></em><strong>4</strong>5</p> > <p>6</p> > > > > -- Nikolay Georgiev Recherche et Développement NetValue (http://www.netvalue.com) 67, rue Anatole France 92 300 Levallois-Perret, France tel : +33.1.47.59.57.42
Received on Monday, 17 February 2003 05:45:15 UTC