W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > April to June 2003

Re: XHTML from Microsoft Word

From: Barry McMullin <mcmullin@eeng.dcu.ie>
Date: Wed, 28 May 2003 09:34:50 +0100 (IST)
To: w3c-wai-ig@w3.org
cc: mcmullin@eeng.dcu.ie
Message-ID: <Pine.LNX.4.44.0305280908150.19854-100000@narnia.dcu.ie>

On Tue, 27 May 2003, Matthew Smith wrote:

> Is anyone aware of a tool (preferably something that will run under a 
> Unix-ish operating system) that can take the HTML created by Microsoft 
> Word and turn it into clean, Accessible XHTML?

I generally ignore the native MS-HTML, but use wvWare on linux to
work directly on the original doc format file:

  http://www.wvware.com/

I believe its specific translation behaviour is highly
configurable; however I usually also run the output through tidy
and some perl script to remove anything I really don't want
(generally pure presentational markup).  tidy can also yield
xhtml of course.  This chain works OK on simple documents, and
may well work in your application.  Of course, it'll work best if
authors use appropriate Word style markup (headings etc.).

A local company here in Dublin doing nice work in this area ia
XML workshop (no, this is not a paid announcement!); they do
their own tools for this purpose, but also maintain a list of
tools available elsewhere:

  http://www.xmlw.ie/aboutxml/word2xml.htm

They also have a recent discussion of the XML support in Word
2003:

  http://www.xmlw.ie/aboutxml/word2003.htm

(Mind you, even though the company offers "accessibility"
consultancy, I would not suggest that their own site is a model
of best practice; it certainly appears to use a fixed width
multi-column format, at least as viewed in opera 7...).

Best,

- Barry.

-- 
Barry McMullin
http://www.eeng.dcu.ie/~mcmullin/
Received on Wednesday, 28 May 2003 04:34:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:14:09 GMT