W3C home > Mailing lists > Public > www-validator@w3.org > May 2002

Re: Microsoft Word from Office 2000 `HTML' fails to validate

From: Thanasis Kinias <tkinias@optimalco.com>
Date: Thu, 23 May 2002 06:56:59 -0700
To: Philip Riebold <philip@livenet.ac.uk>
Cc: www-validator@w3.org
Message-ID: <20020523065659.A8539@glaux.ph.cox.net>
scripsit Philip Riebold:
> > > Surely this is a FAQ, but I've just found that the `HTML' output of
> > > Microsoft Word doesn't validate with either the W3C or WDG validators:
> > 
> > You are correct.  Microsoft Word does not output valid HTML, nor does
> > any Microsoft product of which I am aware.
> > 
> > There used to be a program called the "demoronizer" which would clean up
> > MSHTML to create something approximating valid HTML, but I don't know if
> > it has kept up with recent versions of MS Office.  The best way to get
> > valid HTML from MS Word files is to save as plain text (ASCII or
> > Unicode) and add the markup by hand.
> You could also try the superb HTML Tidy program which has a 'word-2000'
> option for stripping out all the extraneous rubbish put in by MS. 
> The program is described at,
> 	http://www.w3.org/People/Raggett/tidy/

Tidy is apparently now a SourceForge project [1].  Raggett's page
directs you to SourceForge now (I believe this is quite new).  Anyway,
the latest on Tidy will be found there.


1. <http://tidy.sourceforge.net>

Thanasis Kinias
Web Developer, Information Technology
Graduate Student, Department of History
Arizona State University
Tempe, Arizona, U.S.A.

Ash nazg durbatulūk, ash nazg gimbatul,
Ash nazg thrakatulūk agh burzum-ishi krimpatul
Received on Thursday, 23 May 2002 09:58:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:58:27 UTC