W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

Re: tidy is not happy tidying up html generated by Microsoft word

From: Fred <fred@gloryofgod.com>
Date: Fri, 10 Jan 2003 09:03:50 -0800 (PST)
To: Erwin Rollauer <erwin.rollauer@mcgill.ca>
cc: html-tidy@w3.org
Message-ID: <Pine.LNX.4.44.0301100854090.24471-100000@drizzle.com>

Hi Erwin,

Turns out Microsoft Word produces html that is sub-standard, very 
sub-standard in many ways.  But there are some configurable options in 
Tidy that may help you out, take a look at these.

Word-2000  http://tidy.sourceforge.net/docs/quickref.html#word-2000
force-output  http://tidy.sourceforge.net/docs/quickref.html#force-output
bare  http://tidy.sourceforge.net/docs/quickref.html#bare

Also look at Microsofts own cleaning app
http://office.microsoft.com/downloads/2000/Msohtmf2.aspx


I have been working on a custom app to convert Word output to XHTML and 
learning alot about what it takes to clean up the junk and leave behind 
useful info.

Cheers
Fred

On Fri, 10 Jan 2003, Erwin Rollauer wrote:

> 
> I am currently evaluating Ultraedit and noticed the TIDY that came with
> it. I tried it against a simple Micrsoft word 2002 "save as html" file
> and got lots of errors. This is just a headup notice on the chance that
> you have not tried it against miscrosoft generated code.
> 
> 
> Erwin Rollauer 
> Senior Systems Analyst
> Information Systems Resources
> McGill University
> 688 Sherbrooke St. West, Suite 500
> Montreal, QC   H3A 3R1
> Tel:   514 398-5023 ex 00626
> Fax:   514 398-8252
> Email: erwin.rollauer@mcgill.ca
> 
> 
Received on Friday, 10 January 2003 12:04:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:53 GMT