W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2005

Re: Problem with Microsoft Word and Tidy

From: David Wilczynski <dwilczyn@usc.edu>
Date: Sat, 06 Aug 2005 08:43:49 -0700
Message-Id: <>
To: John Campbell <jdc.rpv@cox.net>
Cc: html-tidy@w3.org

Thanks for your reply. Several of my colleagues STRONGLY suggested never 
using Microsoft Word again for html documents. Another proposed that I use 
Composer from the Mozilla suite. I just downloaded it and will try it for a 
few weeks. In the few trials I've made I found that it doesn't do the 
strange things Word does and it seems to accept all the Word generated html 
without complaint. To bad the Composer isn't a free-running executable; it 
can only be started from the mozilla.exe browser. Oh well...

If that few-week experiment works, I will try to de-Wordify my current html 
(probably using emacs keyboard macros) and revisit Tidy.


At 03:22 AM 8/6/2005, John Campbell wrote:
>David Wilczynski wrote:
>>Don't know if this a bug, but it stops me from using Tidy. Microsoft Word 
>>does many things to HTML documents, many of which Tidy does fix, 
>>including changing "\" to "/" in url's. However, in most of my tables, 
>>Word insists on putting in the following:
>><![if !supportEmptyParas]>&nbsp;<![endif]><o:p></o:p>
>>I don't know how it gets there. When I remove it, it puts it back. Tidy 
>>calls these errors that must be fixed before it generates new cleaned up 
>>output. That negates the value of Tidy.
>I know what you mean.  What I ended up doing was to add the following line 
>to my tidy.conf:
>new-blocklevel-tags: st1:date, st1:city, st1:country-region, st1:place, 
>st1:time, o:p, o:smarttagtype, st1:placename, st1:placetype, st1:street, 
>st1:address, st1:state, st2:place, st2:placename, st2:placetype, st2:city, 
>st2:street, st2:address, st2:time, st2:state, st2:country-region, quote, dt, dd
>And add new tags every time I hit another one...  I wish tidy would just 
>strip the "st?:" and "o:" McTags when "word-2000: yes" is chosen...
>I also wish there was a "strip javascript" option... and maybe a "strip 
>everything except the following..." option.
Received on Saturday, 6 August 2005 15:44:10 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:55 UTC