- From: Chris Reid <Chris.Reid@oracle.com>
- Date: Mon, 3 Nov 2003 15:56:51 -0000
- To: <html-tidy@w3.org>
Charlie, Here is an sample HTML file I am having similar problems with using HTMLTidy. The config file is used was as follows : word-2000: yes clean: yes output-xhtml: yes drop-proprietary-attributes: yes The input file I used was as follows : <html> <head> <meta name="Version" content="8.0.3410"> <meta name="Date" content="10/11/96"> <meta name="Template" content="C:\Program Files\Microsoft Office\Office\HTML.DOT"> <meta name="GENERATOR" content="Microsoft FrontPage 3.0"> <title>test case</title> </head> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:Arial"> At the root of the problem in Microsoft Word or Microsoft frontpage <strong>They produce bloated HTML <span style="font-weight:normal">Designed to confuse</span></strong> and blind people with code<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:Arial">More wibble about Microsoft</span> <span style="font-size:10.0pt;font-family:Arial; mso-fareast-font-family:"Times New Roman";color:black;mso-ansi-language:EN-GB; mso-fareast-language:EN-US;mso-bidi-language:AR-SA">Stuff fitted<br> <br> </span><strong><u>Support<o:p></o:p></u></strong> </p> </html> The output from HTMLTidy was as follows : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content= "HTML Tidy for Windows (vers 1st October 2003), see www.w3.org" /> <meta name="Version" content="8.0.3410" /> <meta name="Date" content="10/11/96" /> <meta name="Template" content= "C:\Program Files\Microsoft Office\Office\HTML.DOT" /> <meta name="GENERATOR" content="Microsoft FrontPage 3.0" /> <title>test case</title> <style type="text/css"> /*<![CDATA[*/ span.c3 {font-size:10.0pt;font-family:Arial; mso-fareast-font-family:"Times New Roman";color:black;mso-ansi-language:EN-GB; mso-fareast-language:EN-US;mso-bidi-language:AR-SA} span.c2 {font-size:10.0pt;font-family:Arial} span.c1 {font-weight:normal} /*]]>*/ </style> </head> <body> <p class="MsoNormal"><span class="c2">At the root of the problem in Microsoft Word or Microsoft frontpage <strong>They produce bloated HTML <span class="c1">Designed to confuse</span></strong> and blind people with code<o:p></o:p></span></p> <p class="MsoNormal"><span class="c2">More wibble about Microsoft</span> <span class="c3">Stuff fitted<br /> <br /></span> <strong><u>Support<o:p></o:p></u></strong></p> </body> </html> As you can see the <o:p></o:p> tags are still there :-( I have managed to get rid of the <o:p></o:p> tags using the Microsoft Office Filter which you can download from http://office.microsoft.com/assistance/preview.aspx?AssetID=HA010549981033&C TT=98 If you call is with no parameters then it removed the offending <o:p></o:p> tags Hope this helps Chris
Received on Monday, 3 November 2003 11:10:37 UTC