- From: Henrik Frystyk Nielsen <frystyk@w3.org>
- Date: Sat, 21 Nov 1998 13:32:30 -0500
- To: kent@iastate.edu, www-lib@w3.org
At 00:07 11/21/98 CST, Kent Vander Velden wrote: > > I have been trying to convert the returned html to plain text. So >far I have not been able to do this. Using the w3c program I can >retrieve the remote file in "text/latex" format but no in >"text/x-c" or "text/plain". I have added the extra converters >with a call to HTMLInit() and can see that when maximum debug >is enabled that the converter is found. It is also clear from >the parser output that the converter is running; there just is >no output. > > In short, > this works: > ./w3c -to "text/latex" http://www.w3.org/ -o w3home.txt > this does not: > ./w3c -to "text/plain" http://www.w3.org/ -o w3home.txt I don't think that any of these work - the command line tool [1] doesn't have an HTML parser integrated - I only added the HTML parser to the webbot [2] (which needs it for finding links) and the line mode browser (because it's a browser!) [3]. The following should work as intended: ./www -to "text/latex" http://www.w3.org/ -o w3home.tex ./www -to "text/plain" http://www.w3.org/ -o w3home.txt (it may not generate fully compliant tex, though). You can remove the [n] link references by using the "-na" command line option. Henrik [1] http://www.w3.org/ComLine/ [2] http://www.w3.org/Robot/ [3] http://www.w3.org/LineMode/ -- Henrik Frystyk Nielsen, World Wide Web Consortium http://www.w3.org/People/Frystyk
Received on Saturday, 21 November 1998 13:32:32 UTC