Best options for converting Word 2000 => X/HTML => XML?

Hi all,

I've been experimenting with the output of Word 2000, when using the
"export to compact HTML" and "save as web page" features.

What I'd like is to end up with well-formed XML, but the tidy options
I've been using don't always give me what I'd expect.

Tidy makes a heroic effort on the giant mess Word produces, but I need
all attributes to be quoted and no repeated attributes.  For example,
Word
seems to produce a lot of :

        <p class=foo1 ... class=foo2> ... </p>

Which I need as:

        <p class="foo1" class2="foo2"> ... </p>

Has anybody else had any experiences they could share?

Stu

Received on Thursday, 17 February 2000 00:59:42 UTC