- From: John Murdie <john@cs.york.ac.uk>
- Date: Wed, 22 May 2002 17:34:13 +0100 (BST)
- To: www-validator@w3.org
- cc: John Murdie <john@cs.york.ac.uk>
On 22 May, Thanasis Kinias wrote: > scripsit John Murdie: >> Surely this is a FAQ, but I've just found that the `HTML' output of >> Microsoft Word doesn't validate with either the W3C or WDG validators: > > You are correct. Microsoft Word does not output valid HTML, nor does > any Microsoft product of which I am aware. > > There used to be a program called the "demoronizer" which would clean up > MSHTML to create something approximating valid HTML, but I don't know if > it has kept up with recent versions of MS Office. The best way to get > valid HTML from MS Word files is to save as plain text (ASCII or > Unicode) and add the markup by hand. > Thanks, Thanasis. Yes, I'd already found the `Demoroniser' (http://www.fourmilab.ch/webtools/demoroniser/) but haven't yet tried it out; its web page mentions several small-scale fixes it applies to Microsoft `HTML', but does it also cope with the apparent non-conformity of the document declarations? After all, such files commence: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <meta name=ProgId content=Word.Document> <meta name=Generator content="Microsoft Word 9"> <meta name=Originator content="Microsoft Word 9"> ... which isn't anything I recognise. -- John A. Murdie Experimental Officer (Software) Department of Computer Science University of York England
Received on Wednesday, 22 May 2002 12:38:00 UTC