W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2002

Re[2]: converting Word2000 and Word97 documents

From: Charles Reitzel <creitzel@rcn.com>
Date: Fri, 19 Jul 2002 10:46:12 -0400
Message-Id: <4.3.2.7.2.20020719103006.029a9da8@pop.rcn.com>
To: Nadezhda <tnv@rnivc.kis.ru>
Cc: html-tidy@w3.org

Hi Nadezhda,

The original reason I started using Tidy was to do Word-To-HTML 
conversion.   MS has published an improved filter, which I believe Fred 
uses.  For information on how to get and use the MS filter, see:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;q291325

Maintaining files in Word is convenient if you need both PDF and HTML 
output.  We found a bit of prep work on the Word side helped a great 
deal.  In particular, getting folks to use Word styles like Heading 1, 2, 3 
and apply styles for bullet and numbered lists.  These come through to HTML 
as class attributes for applying fonts, colors, etc.  The only 
post-processing we needed to do was to insert a link to our real CSS 
stylesheet.  A very simple sed script did the trick.

About size, Tidy is an in-memory program.  But with most computers having 
tens of MB of RAM available, it would take a huge document to blow it 
out.  To guesstimate the RAM needed, take the HTML file size and double or 
triple it.

hope this helps,
Charlie
Received on Friday, 19 July 2002 10:40:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:52 GMT