Re[2]: converting Word2000 and Word97 document

I am converting Word to HTML through a few steps.
Save word as HTML
Make some manual changes to the document to support fields
Run Microsofts "filter.exe" on the document
Run Tidy on the document to output xhtml, but not using the option 
designed for word html.

Make some more manual changes
Do a transform using xslt 

I have some specific needs and thus found I had to use all these steps.
I am doing this on some "large" word docs  about 200Kb

Tidy is working very consistently so I am happy. 
The only weirdness I have found is that non-ascii text inside comments is 
getting mangled, I cut the comments out, run tidy, then paste the comments 
back in, no longer commented out.  My xslt transform handles them then.

On Thu, 18 Jul 2002, Nadezhda wrote:

> 
> Hello Terry,
> 
> TT> At 10:49 AM -0400 7/16/02, Nadezhda wrote:
> >>May I know on Word documents of what size and styles
> >>HTML Tidy was tested?
> 
> TT> I'm not personally a Word-2000 user, but we do include various Word
> TT> documents in our regression tests for Tidy. The files probably range in
> TT> size from 20 K to 100 K. I couldn't tell you how complex the documents are,
> TT> but they are usually from real-world cases.
> Not more than 100 K?
> 
> TT> The Word-2000 support in Tidy is improving all the time.
> 
> TT> Do you have a specific task in mind or concern? i.e. have you tried current
> TT> versions of Tidy, and if you are unhappy with the results, please report
> TT> any bugs and provide the Word-2000 files etc.
> I have a real task to convert .doc to .html.
> I want to know something beforehand, because although I have one more way
> to convert .doc to .html, but results till now are not as desired.
> 
> TT> I'm sure users that Tidy Word-2000 documents regularly, will speak up.
> I hope so. And ask users who use Tidy for cleaning Word2000 documents to
> answer.
> 
> --
> Best regards,
>  Nadezhda                            mailto:tnv@rnivc.kis.ru
> 
> 

Received on Friday, 19 July 2002 03:42:36 UTC