- From: Chris Reid <Chris.Reid@oracle.com>
- Date: Mon, 3 Nov 2003 15:56:51 -0000
- To: <html-tidy@w3.org>
Charlie,
Here is an sample HTML file I am having similar problems with using
HTMLTidy.
The config file is used was as follows :
word-2000: yes
clean: yes
output-xhtml: yes
drop-proprietary-attributes: yes
The input file I used was as follows :
<html>
<head>
<meta name="Version" content="8.0.3410">
<meta name="Date" content="10/11/96">
<meta name="Template" content="C:\Program Files\Microsoft
Office\Office\HTML.DOT">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<title>test case</title>
</head>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Arial">
At the root of the problem in Microsoft Word or Microsoft frontpage
<strong>They produce bloated HTML
<span style="font-weight:normal">Designed to confuse</span></strong>
and blind people with code<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:Arial">More wibble about
Microsoft</span>
<span style="font-size:10.0pt;font-family:Arial;
mso-fareast-font-family:"Times New
Roman";color:black;mso-ansi-language:EN-GB;
mso-fareast-language:EN-US;mso-bidi-language:AR-SA">Stuff fitted<br>
<br>
</span><strong><u>Support<o:p></o:p></u></strong>
</p>
</html>
The output from HTMLTidy was as follows :
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 1st October 2003), see www.w3.org" />
<meta name="Version" content="8.0.3410" />
<meta name="Date" content="10/11/96" />
<meta name="Template" content=
"C:\Program Files\Microsoft Office\Office\HTML.DOT" />
<meta name="GENERATOR" content="Microsoft FrontPage 3.0" />
<title>test case</title>
<style type="text/css">
/*<![CDATA[*/
span.c3 {font-size:10.0pt;font-family:Arial; mso-fareast-font-family:"Times
New Roman";color:black;mso-ansi-language:EN-GB;
mso-fareast-language:EN-US;mso-bidi-language:AR-SA}
span.c2 {font-size:10.0pt;font-family:Arial}
span.c1 {font-weight:normal}
/*]]>*/
</style>
</head>
<body>
<p class="MsoNormal"><span class="c2">At the root of the problem in
Microsoft Word or Microsoft frontpage <strong>They produce bloated
HTML <span class="c1">Designed to confuse</span></strong> and blind
people with code<o:p></o:p></span></p>
<p class="MsoNormal"><span class="c2">More wibble about
Microsoft</span> <span class="c3">Stuff fitted<br />
<br /></span> <strong><u>Support<o:p></o:p></u></strong></p>
</body>
</html>
As you can see the <o:p></o:p> tags are still there :-(
I have managed to get rid of the <o:p></o:p> tags using the Microsoft Office
Filter which you can download from
http://office.microsoft.com/assistance/preview.aspx?AssetID=HA010549981033&C
TT=98
If you call is with no parameters then it removed the offending <o:p></o:p>
tags
Hope this helps
Chris
Received on Monday, 3 November 2003 11:10:37 UTC