- From: philip steven lanier <planier@u.washington.edu>
- Date: Tue, 27 May 2003 09:04:55 -0700 (PDT)
- cc: WAI Interest Group <w3c-wai-ig@w3.org>
Bob Boiko (author of Content Management Bible) taught a Content Management course where we managed content in Word documents and published them to xml, html, and other word documents. The very interesting technique he uses to create xml from Word documents is to write custom conversions using VBA & Word macros. This works well for simple word docs and is relatively easy to do. However, things get a bit trickier if the document has embedded images or complex tables in it. Other products to look at would be those by Stellent, who owns over 90% of the conversion technologies in the market. (Or so they claim.) One of their products that I have used, Stellent Site Builder (which is part of their Content Publisher package), is designed to allow individuals to convert from many different file formats, including Word, Excel, Powerpoint, Notes, and others, into html. How the output gets formatted is somewhat customizable but I'm not certain that it can produce valid XHTML. Nonetheless, it might be worth a look. (Note, the Site Builder product uses their proprietary Inside Out conversion technology... it may be possible to obtain just the conversion engine. Publisher and Site Builder are intended to complement their content management system, but can be used just for conversion if desired.) www.stellent.com Unfortunately, both of these solutions require MS Word to be present on the machine doing the conversion, which would count out any *nix OS. One alternative, though I don't think it would be horribly reasonable, would be to save the Word docs as RTF and use something like Perl to convert to html. On a brighter note, the next version of Word is supposed to fully integrate XML, meaning that you could save to XML, which is much easier to work with. That ought to be a lifesaver for many people! Philip Lanier Senior, Informatics University of Washington On Tue, 27 May 2003, Jon Hanna wrote: > > > Is anyone aware of a tool (preferably something that will run under a > > Unix-ish operating system) that can take the HTML created by Microsoft > > Word and turn it into clean, Accessible XHTML? > > > > My application is for a forum where agendae and minutes of meetings are > > recorded and posted to an otherwise-Accessible site. > > I put a challenge before another list some time ago where I promised to > donate to the charity of choice of the person who managed to get valid HTML > out of Word. Nobody claimed the bet (although the most recent Mac version of > Word came pretty close) but some were able to get very good results by > putting their output through HTMLTidy > <http://www.w3.org/People/Raggett/tidy/>. If the original Word doc is pretty > simple then there should be little or no accessibility problems remaining to > deal with by hand. > >
Received on Tuesday, 27 May 2003 12:04:58 UTC