W3C home > Mailing lists > Public > www-validator@w3.org > May 2002

Re: Microsoft Word from Office 2000 `HTML' fails to validate

From: Philip Riebold <philip@livenet.ac.uk>
Date: Thu, 23 May 2002 11:09:31 +0100 (BST)
To: www-validator@w3.org
Message-Id: <Pine.SUN.3.90.1020523110124.13797A-100000@livenet.ac.uk>
> > Surely this is a FAQ, but I've just found that the `HTML' output of
> > Microsoft Word doesn't validate with either the W3C or WDG validators:
> 
> You are correct.  Microsoft Word does not output valid HTML, nor does
> any Microsoft product of which I am aware.
> 
> There used to be a program called the "demoronizer" which would clean up
> MSHTML to create something approximating valid HTML, but I don't know if
> it has kept up with recent versions of MS Office.  The best way to get
> valid HTML from MS Word files is to save as plain text (ASCII or
> Unicode) and add the markup by hand.

You could also try the superb HTML Tidy program which has a 'word-2000'
option for stripping out all the extraneous rubbish put in by MS. 

The program is described at,

	http://www.w3.org/People/Raggett/tidy/


TTFN,

   Philip Riebold                                /"\
   Media Resources                               \ /
   University College London                      X  ASCII Ribbon Campaign
   Windeyer Building, 46 Cleveland Street        / \ Against HTML Mail
   London, W1T 4JF
   +44 (0)20 7580 9872
   http://www.ucl.ac.uk/mediares/vconf
Received on Thursday, 23 May 2002 06:09:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:03 GMT