W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Re: HTML 2 Text

From: Dave Raggett <dsr@w3.org>
Date: Sat, 18 Mar 2000 19:19:40 +0000 (GMT Standard Time)
To: Spencer Marks <smarks@digisolutions.com>
cc: html-tidy@w3.org
Message-ID: <Pine.WNT.4.10.10003181918270.-433143@hazel.hpl.hp.com>
On 18 Mar 2000, Spencer Marks wrote:

> 
> Hi, I was wondering if there's a way to use Tidy to remove all
> HTML from a page and just get the text.
> 
> In other words, I like to use Tidy as an HTML to Text conversion
> utility that I can call problematically.
> 
> Actually, I am planning on using JTidy so that I can do this
> conversion as part of an application I am working on.


This feature is supported by W3C's open source line mode
browser. However, you could adapt Tidy to do this via
a new routine for pretty printing the parse tree.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 385 320 444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)
Received on Saturday, 18 March 2000 14:19:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT