Re: Microsoft Word from Office 2000 `HTML' fails to validate

scripsit Philip Riebold:
> > > Surely this is a FAQ, but I've just found that the `HTML' output of
> > > Microsoft Word doesn't validate with either the W3C or WDG validators:
> > 
> > You are correct.  Microsoft Word does not output valid HTML, nor does
> > any Microsoft product of which I am aware.
> > 
> > There used to be a program called the "demoronizer" which would clean up
> > MSHTML to create something approximating valid HTML, but I don't know if
> > it has kept up with recent versions of MS Office.  The best way to get
> > valid HTML from MS Word files is to save as plain text (ASCII or
> > Unicode) and add the markup by hand.
> 
> You could also try the superb HTML Tidy program which has a 'word-2000'
> option for stripping out all the extraneous rubbish put in by MS. 
> 
> The program is described at,
> 
> 	http://www.w3.org/People/Raggett/tidy/

Tidy is apparently now a SourceForge project [1].  Raggett's page
directs you to SourceForge now (I believe this is quite new).  Anyway,
the latest on Tidy will be found there.


References

1. <http://tidy.sourceforge.net>

-- 
Thanasis Kinias
Web Developer, Information Technology
Graduate Student, Department of History
Arizona State University
Tempe, Arizona, U.S.A.

Ash nazg durbatulūk, ash nazg gimbatul,
Ash nazg thrakatulūk agh burzum-ishi krimpatul

Received on Thursday, 23 May 2002 09:58:15 UTC