W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2005

Configuration question (repost)

From: Huw Wyn Jones <huw@pioden.net>
Date: Mon, 11 Apr 2005 12:44:20 +0100
Message-ID: <425A6314.4060403@pioden.net>
To: html-tidy@w3.org

** This is a repost to see if I get a response !! :) **
Hi everyone,

I have Tidy set-up to 'tidy' HTML inputed by clients. More often that 
not the clients paste in 'HTML' generated by Word - which I'm trying to 
strip the junk out of. Example below:

<P class=MsoNormal style=\"MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 36pt\"><B 
style=\"mso-bidi-font-weight: normal\"><SPAN lang=EN-GB 
style=\"FONT-SIZE: 11pt; COLOR: #006600; FONT-FAMILY: \'Trebuchet MS\'; 
mso-bidi-font-size: 10.0pt\">MENTER IAITH ABERTAWE<?xml:namespace prefix 
= o ns = \"urn:schemas-microsoft-com:office:office\" 
<P class=MsoNormal style=\"MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 
36pt\"><SPAN lang=EN-GB style=\"FONT-SIZE: 11pt; FONT-FAMILY: 
\'Trebuchet MS\'; mso-bidi-font-weight: bold; mso-bidi-font-size: 
10.0pt\">SIWAN THOMAS, Field Officer<o:p></o:p></SPAN></P>

My Tidy config file is as follows:

bare: yes
clean: yes
doctype: omit
drop-empty-paras: yes
drop-proprietary-attributes: yes
enclose-text: yes
escape-cdata: yes
fix-backslash: yes
join-styles: yes
logical-emphasis: yes
lower-literals: yes
output-xhtml: yes
show-body-only: yes
word-2000: yes
indent: yes
output-encoding: utf8
force-output: yes
quiet: yes
write-back: yes

Is there anything else I can add to strip out the Word cr*p ? I thought 
that Tidy would have a greater impact than it's having now :(



Huw Wyn Jones
Cyfarwyddwr Technegol
Pioden Rhyngweithiol
106-108 Stryd Fawr
LL57 1NS

Ffon:   01248 364970 neu
       01248 354626
E-bost: huw@pioden.net
WWW:    http://www.pioden.net
Received on Monday, 11 April 2005 11:44:58 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:55 UTC