W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2003

RE: Stipping classes from HTML

From: Ben Noblet <ben@lateralsystems.com.au>
Date: Sat, 8 Nov 2003 00:38:00 +1100
To: <html-tidy@w3.org>
Message-Id: <20031107133324.12DA3136EF@dr-nick.w3.org>

A roll your own solution using Regular expressions could be something as
simple as this ... (example in javascript)


function stripClass(content)
{
oReg = new RegExp("(<[^>]+) class=[^ |^>]*([^>]*>)","ig");
return content.replace(oReg, "$1 $2");
}
content = stripClass('This is some HTML code <p align="center"
class="Rubbish">Text</p>');


Cheers
Ben

> -----Original Message-----
> From: html-tidy-request@w3.org 
> [mailto:html-tidy-request@w3.org] On Behalf Of patricka@mkdoc.com
> Sent: Friday, 7 November 2003 8:37 PM
> To: html-tidy@w3.org
> Subject: Re: Stipping classes from HTML
> 
> 
> Cristian Balan writes: 
> 
> > I been using Tidy to clean Word 2000 documents and get them 
> ready for the
> > Web.
> > Tidy seems to be doing a great job, the only tags that are 
> left that I still
> > want to get rid of are the class attributes: 
> > 
> > <body class='c10'>
> >   <div class="Section1"> 
> > 
> > <li class="c4"> 
> > 
> > How can I do this either in the UI for Win32 or command line Tidy?
> 
> i don't think this is possible[1]. :( 
> 
> try either: 
> 
>  - textism's word html cleaner[2], or
>  - roll your own perl solution with MKDoc::XML::Stripper[3] 
> 
> warning: the perl solution requires xml input, so you'll need 
> to run it 
> through tidy first with the output-xhtml option (if you're 
> throwing it 
> html). 
> 
> hth, 
> 
>  - p 
> 
> 1. http://tidy.sourceforge.net/docs/quickref.html
> 2. http://www.textism.com/resources/cleanwordhtml/
> 3. http://search.cpan.org/~jhiver/MKDoc-XML/lib/MKDoc/XML/Stripper.pm 
> 
> 
> 
> 
Received on Friday, 7 November 2003 08:33:30 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:54 UTC