W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Re: Best options for converting Word 2000 => X/HTML => XML?

From: Dave Raggett <dsr@w3.org>
Date: Tue, 28 Mar 2000 16:55:30 +0100 (GMT Daylight Time)
To: Stuart Hungerford <stuart.hungerford@webone.com.au>
cc: html-tidy@w3.org
Message-ID: <Pine.WNT.4.10.10003281655020.-462379@hazel.hpl.hp.com>
On Fri, 24 Mar 2000, Stuart Hungerford wrote:

> 
> Dave Raggett wrote:
> 
> > [...]
> >
> > Tidy's word-2000 option is draconian and strips out the class, lang
> > and style attributes, see PurgeAttributes(). It also strips out
> > width attributes from th and td. This was based upon an inspection
> > of the markup produced by the save as web page export filter from
> > Word2000. I figured it would be more cost effective to strip these
> > out and to later add back in class attributes manually.
> >
> > I would be interested to get suggestions for improvements.
> 
>    How about a way to remove duplicate atttributes in XML output?
>    The Word X/HTML seems to have a lot of :
> 
>                <... class="classname" class="c43" ...>
> 
>    If these repeats could be optionally removed, we'd be close to having
>    a very useful Word 2000 ==> X/HTML ==> XML workflow!

This will be fixed in the next release. Thanks for the input.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 385 320 444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)
Received on Tuesday, 28 March 2000 10:55:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT