- From: Huw Wyn Jones <huw@pioden.net>
- Date: Thu, 07 Apr 2005 13:37:34 +0100
- To: html-tidy@w3.org
Hi everyone,
I have Tidy set-up to 'tidy' HTML inputed by clients. More often that
not the clients paste in 'HTML' generated by Word - which I'm trying to
strip the junk out of. Example below:
<P class=MsoNormal style=\"MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 36pt\"><B
style=\"mso-bidi-font-weight: normal\"><SPAN lang=EN-GB
style=\"FONT-SIZE: 11pt; COLOR: #006600; FONT-FAMILY: \'Trebuchet MS\';
mso-bidi-font-size: 10.0pt\">MENTER IAITH ABERTAWE<?xml:namespace prefix
= o ns = \"urn:schemas-microsoft-com:office:office\"
/><o:p></o:p></SPAN></B></P>
<P class=MsoNormal style=\"MARGIN: 0cm 0cm 0pt; TEXT-INDENT:
36pt\"><SPAN lang=EN-GB style=\"FONT-SIZE: 11pt; FONT-FAMILY:
\'Trebuchet MS\'; mso-bidi-font-weight: bold; mso-bidi-font-size:
10.0pt\">SIWAN THOMAS, Field Officer<o:p></o:p></SPAN></P>
My Tidy config file is as follows:
bare: yes
clean: yes
doctype: omit
drop-empty-paras: yes
drop-proprietary-attributes: yes
enclose-text: yes
escape-cdata: yes
fix-backslash: yes
join-styles: yes
logical-emphasis: yes
lower-literals: yes
output-xhtml: yes
show-body-only: yes
word-2000: yes
indent: yes
output-encoding: utf8
force-output: yes
quiet: yes
write-back: yes
Is there anything else I can add to strip out the Word cr*p ? I thought
that Tidy would have a greater impact than it's having now :(
TIA
Huw
--
===============================
Huw Wyn Jones
Cyfarwyddwr Technegol
Pioden Rhyngweithiol
106-108 Stryd Fawr
Bangor
Gwynedd
LL57 1NS
Ffon: 01248 364970 neu
01248 354626
E-bost: huw@pioden.net
WWW: http://www.pioden.net
===============================
Received on Thursday, 7 April 2005 12:37:41 UTC