W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Re: Best options for converting Word 2000 => X/HTML => XML?

From: Stuart Hungerford <stuart.hungerford@webone.com.au>
Date: Fri, 24 Mar 2000 11:45:58 -0600
To: Dave Raggett <dsr@w3.org>, html-tidy@w3.org
Message-ID: <OF918F5356.F9D10570-ON86256889.002B8E02@rfdinc.com>

Dave Raggett wrote:

> [...]
>
> Tidy's word-2000 option is draconian and strips out the class, lang
> and style attributes, see PurgeAttributes(). It also strips out
> width attributes from th and td. This was based upon an inspection
> of the markup produced by the save as web page export filter from
> Word2000. I figured it would be more cost effective to strip these
> out and to later add back in class attributes manually.
>
> I would be interested to get suggestions for improvements.

   How about a way to remove duplicate atttributes in XML output?
   The Word X/HTML seems to have a lot of :

               <... class="classname" class="c43" ...>

   If these repeats could be optionally removed, we'd be close to having
   a very useful Word 2000 ==> X/HTML ==> XML workflow!


Stu
Received on Friday, 24 March 2000 12:47:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT