W3C home > Mailing lists > Public > www-lib@w3.org > October to December 1998

Re: HTMLToPlain and libwww 5.2

From: Kent Vander Velden <graphix@iastate.edu>
Date: Sat, 21 Nov 1998 12:43:49 CST
Message-Id: <199811211843.MAA21784@isua3.iastate.edu>
To: Henrik Frystyk Nielsen <frystyk@w3.org>
cc: www-lib@w3.org

>>  In short, 
>>    this works:
>>      ./w3c -to "text/latex" http://www.w3.org/ -o w3home.txt
>>    this does not:
>>      ./w3c -to "text/plain" http://www.w3.org/ -o w3home.txt
>
>I don't think that any of these work - the command line tool [1] doesn't
>have an HTML parser integrated - I only added the HTML parser to the webbot
>[2] (which needs it for finding links) and the line mode browser (because
>it's a browser!) [3].

  I added the following code to main() in HTLine.c:

    HTList * converters = HTList_new();
    HTConverterInit(converters);
    HTMLInit(converters);
    HTFormat_setConversion(converters);

  that seems to add the converters and I can see with debug enabled that
the converter is being found and used.  The output when converting to
text/plain seems to disappear however.  Did I miss something?

>The following should work as intended:
>
>	./www -to "text/latex" http://www.w3.org/ -o w3home.tex
>
>	./www -to "text/plain" http://www.w3.org/ -o w3home.txt
>
>(it may not generate fully compliant tex, though). You can remove the [n]
>link references by using the "-na" command line option.

  These work great!  I had seen these mentioned in the mailing
list archive but somehow but the idea that www became w3c.

  Thanks!

---
Kent Vander Velden
kent@iastate.edu
Received on Saturday, 21 November 1998 13:43:50 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 17:59:22 EDT