W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Re: HTML 2 Text

From: Dave Raggett <dsr@w3.org>
Date: Fri, 24 Mar 2000 11:47:56 -0600
To: Spencer Marks <smarks@digisolutions.com>
Cc: html-tidy@w3.org
Message-ID: <OF182725CB.0D436C0A-ON862568A6.006A2DFB@rfdinc.com>

On 18 Mar 2000, Spencer Marks wrote:

>
> Hi, I was wondering if there's a way to use Tidy to remove all
> HTML from a page and just get the text.
>
> In other words, I like to use Tidy as an HTML to Text conversion
> utility that I can call problematically.
>
> Actually, I am planning on using JTidy so that I can do this
> conversion as part of an application I am working on.


This feature is supported by W3C's open source line mode
browser. However, you could adapt Tidy to do this via
a new routine for pretty printing the parse tree.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 385 320 444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)
Received on Friday, 24 March 2000 13:13:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT