W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2003

Re: Tidy Question

From: Chris Reid <Chris.Reid@oracle.com>
Date: Mon, 3 Nov 2003 15:56:51 -0000
To: <html-tidy@w3.org>
Message-ID: <PHENJGDFMJPEBOBLGEMOGEIDCBAA.Chris.Reid@oracle.com>

Charlie,

Here is an sample HTML file I am having similar problems with using
HTMLTidy.

The config file is used was as follows :
word-2000: yes
clean: yes
output-xhtml: yes
drop-proprietary-attributes: yes


The input file I used was as follows :
<html>
<head>
<meta name="Version" content="8.0.3410">
<meta name="Date" content="10/11/96">
<meta name="Template" content="C:\Program Files\Microsoft
Office\Office\HTML.DOT">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<title>test case</title>
</head>
	<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Arial">
		At the root of the problem in Microsoft Word or Microsoft frontpage
		<strong>They produce bloated HTML
		<span style="font-weight:normal">Designed to confuse</span></strong>
		and blind people with code<o:p></o:p></span></p>
    	<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:Arial">More wibble about
Microsoft</span>
		<span style="font-size:10.0pt;font-family:Arial;
		mso-fareast-font-family:&quot;Times New
Roman&quot;;color:black;mso-ansi-language:EN-GB;
		mso-fareast-language:EN-US;mso-bidi-language:AR-SA">Stuff fitted<br>
    		<br>
    		</span><strong><u>Support<o:p></o:p></u></strong>
	</p>
</html>

The output from HTMLTidy was as follows :
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 1st October 2003), see www.w3.org" />
<meta name="Version" content="8.0.3410" />
<meta name="Date" content="10/11/96" />
<meta name="Template" content=
"C:\Program Files\Microsoft Office\Office\HTML.DOT" />
<meta name="GENERATOR" content="Microsoft FrontPage 3.0" />
<title>test case</title>
<style type="text/css">
/*<![CDATA[*/
 span.c3 {font-size:10.0pt;font-family:Arial; mso-fareast-font-family:"Times
New Roman";color:black;mso-ansi-language:EN-GB;
mso-fareast-language:EN-US;mso-bidi-language:AR-SA}
 span.c2 {font-size:10.0pt;font-family:Arial}
 span.c1 {font-weight:normal}
/*]]>*/
</style>
</head>
<body>
<p class="MsoNormal"><span class="c2">At the root of the problem in
Microsoft Word or Microsoft frontpage <strong>They produce bloated
HTML <span class="c1">Designed to confuse</span></strong> and blind
people with code<o:p></o:p></span></p>
<p class="MsoNormal"><span class="c2">More wibble about
Microsoft</span> <span class="c3">Stuff fitted<br />
<br /></span> <strong><u>Support<o:p></o:p></u></strong></p>
</body>
</html>


As you can see the <o:p></o:p> tags are still there :-(


I have managed to get rid of the <o:p></o:p> tags using the Microsoft Office
Filter which you can download from
http://office.microsoft.com/assistance/preview.aspx?AssetID=HA010549981033&C
TT=98

If you call is with no parameters then it removed the offending <o:p></o:p>
tags

Hope this helps

Chris
Received on Monday, 3 November 2003 11:10:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:54 UTC