Re: Best options for converting Word 2000 => X/HTML => XML?

Dave Raggett wrote:

> [...]
>
> Tidy's word-2000 option is draconian and strips out the class, lang
> and style attributes, see PurgeAttributes(). It also strips out
> width attributes from th and td. This was based upon an inspection
> of the markup produced by the save as web page export filter from
> Word2000. I figured it would be more cost effective to strip these
> out and to later add back in class attributes manually.
>
> I would be interested to get suggestions for improvements.

   How about a way to remove duplicate atttributes in XML output?
   The Word X/HTML seems to have a lot of :

               <... class="classname" class="c43" ...>

   If these repeats could be optionally removed, we'd be close to having
   a very useful Word 2000 ==> X/HTML ==> XML workflow!


Stu

Received on Friday, 24 March 2000 12:47:35 UTC