W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Re: Best options for converting Word 2000 => X/HTML => XML?

From: Stuart Hungerford <stuart.hungerford@webone.com.au>
Date: Fri, 18 Feb 2000 18:56:02 +1100
Message-ID: <38ACFB12.B36DD3C4@webone.com.au>
To: Dave Raggett <dsr@w3.org>, html-tidy@w3.org
Dave Raggett wrote:

> [...]
>
> Tidy's word-2000 option is draconian and strips out the class, lang
> and style attributes, see PurgeAttributes(). It also strips out
> width attributes from th and td. This was based upon an inspection
> of the markup produced by the save as web page export filter from
> Word2000. I figured it would be more cost effective to strip these
> out and to later add back in class attributes manually.
>
> I would be interested to get suggestions for improvements.

   How about a way to remove duplicate atttributes in XML output?
   The Word X/HTML seems to have a lot of :

               <... class="classname" class="c43" ...>

   If these repeats could be optionally removed, we'd be close to having
   a very useful Word 2000 ==> X/HTML ==> XML workflow!


Stu
Received on Friday, 18 February 2000 02:55:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT