Re: Best options for converting Word 2000 => X/HTML => XML?

On Fri, 24 Mar 2000, Stuart Hungerford wrote:

> 
> Dave Raggett wrote:
> 
> > [...]
> >
> > Tidy's word-2000 option is draconian and strips out the class, lang
> > and style attributes, see PurgeAttributes(). It also strips out
> > width attributes from th and td. This was based upon an inspection
> > of the markup produced by the save as web page export filter from
> > Word2000. I figured it would be more cost effective to strip these
> > out and to later add back in class attributes manually.
> >
> > I would be interested to get suggestions for improvements.
> 
>    How about a way to remove duplicate atttributes in XML output?
>    The Word X/HTML seems to have a lot of :
> 
>                <... class="classname" class="c43" ...>
> 
>    If these repeats could be optionally removed, we'd be close to having
>    a very useful Word 2000 ==> X/HTML ==> XML workflow!

This will be fixed in the next release. Thanks for the input.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 385 320 444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)

Received on Tuesday, 28 March 2000 10:55:47 UTC