Re: Stipping classes from HTML from patricka@mkdoc.com on 2003-11-07 (html-tidy@w3.org from October to December 2003)

From: <patricka@mkdoc.com>
Date: Fri, 07 Nov 2003 09:37:19 +0000
To: html-tidy@w3.org
Message-Id: <20031107093720.047EE28511@mail.webarchitects.co.uk>

Cristian Balan writes: 

> I been using Tidy to clean Word 2000 documents and get them ready for the
> Web.
> Tidy seems to be doing a great job, the only tags that are left that I still
> want to get rid of are the class attributes: 
> 
> <body class='c10'>
>   <div class="Section1"> 
> 
> <li class="c4"> 
> 
> How can I do this either in the UI for Win32 or command line Tidy?

i don't think this is possible[1]. :( 

try either: 

 - textism's word html cleaner[2], or
 - roll your own perl solution with MKDoc::XML::Stripper[3] 

warning: the perl solution requires xml input, so you'll need to run it 
through tidy first with the output-xhtml option (if you're throwing it 
html). 

hth, 

 - p 

1. http://tidy.sourceforge.net/docs/quickref.html
2. http://www.textism.com/resources/cleanwordhtml/
3. http://search.cpan.org/~jhiver/MKDoc-XML/lib/MKDoc/XML/Stripper.pm

Received on Friday, 7 November 2003 04:37:29 UTC