W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2005

Configuration question

From: Huw Wyn Jones <huw@pioden.net>
Date: Thu, 07 Apr 2005 13:37:34 +0100
Message-ID: <4255298E.3040004@pioden.net>
To: html-tidy@w3.org

Hi everyone,

I have Tidy set-up to 'tidy' HTML inputed by clients. More often that 
not the clients paste in 'HTML' generated by Word - which I'm trying to 
strip the junk out of. Example below:

<P class=MsoNormal style=\"MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 36pt\"><B 
style=\"mso-bidi-font-weight: normal\"><SPAN lang=EN-GB 
style=\"FONT-SIZE: 11pt; COLOR: #006600; FONT-FAMILY: \'Trebuchet MS\'; 
mso-bidi-font-size: 10.0pt\">MENTER IAITH ABERTAWE<?xml:namespace prefix 
= o ns = \"urn:schemas-microsoft-com:office:office\" 
/><o:p></o:p></SPAN></B></P>
<P class=MsoNormal style=\"MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 
36pt\"><SPAN lang=EN-GB style=\"FONT-SIZE: 11pt; FONT-FAMILY: 
\'Trebuchet MS\'; mso-bidi-font-weight: bold; mso-bidi-font-size: 
10.0pt\">SIWAN THOMAS, Field Officer<o:p></o:p></SPAN></P>

My Tidy config file is as follows:

bare: yes
clean: yes
doctype: omit
drop-empty-paras: yes
drop-proprietary-attributes: yes
enclose-text: yes
escape-cdata: yes
fix-backslash: yes
join-styles: yes
logical-emphasis: yes
lower-literals: yes
output-xhtml: yes
show-body-only: yes
word-2000: yes
indent: yes
output-encoding: utf8
force-output: yes
quiet: yes
write-back: yes

Is there anything else I can add to strip out the Word cr*p ? I thought 
that Tidy would have a greater impact than it's having now :(

TIA

Huw

-- 
===============================
Huw Wyn Jones
Cyfarwyddwr Technegol
Pioden Rhyngweithiol
106-108 Stryd Fawr
Bangor
Gwynedd
LL57 1NS

Ffon:   01248 364970 neu
        01248 354626
E-bost: huw@pioden.net
WWW:    http://www.pioden.net
===============================
Received on Thursday, 7 April 2005 12:37:41 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:15:53 UTC