Best options for converting Word 2000 => X/HTML => XML?

Hi all,

I've been experimenting with the output of Word 2000, when using the
"export to compact HTML" and "save as web page" features.

What I'd like is to end up with well-formed XML, but the tidy options
I've been using don't always give me what I'd expect.

Tidy makes a heroic effort on the giant mess Word produces, but I need
all attributes to be quoted and no repeated attributes.  For example,
Word
seems to produce a lot of :

        <p class=foo1 ... class=foo2> ... </p>

Which I need as:

        <p class="foo1" class2="foo2"> ... </p>

Has anybody else had any experiences they could share?

Stu

Received on Friday, 24 March 2000 12:47:34 UTC