Recursive URL retriever
Subject: Recursive URL retriever
From: "Giovanni Vigna" <firstname.lastname@example.org>
Date: Fri, 01 Dec 1995 17:33:13 +0100
From email@example.com Fri Dec 1 11: 33:51 1995
X-Mailer: exmh version 1.6.4 10/10/95
Dear WWW Library developers,
I am writing an application using your (great) WWW library.
The application should retrieve recursively document referenced from a starting URL under some constraints in order to avoid downloading the entire Web :).
I read carefully the Library documentation, and examined your LineMode textual browser in order to understand how to implement my tool.
Unfortunately, due to MY limits I was not able to understand how to retrieve all the subparts of a document once I got the starting URL anchor.
Question: how can I retrieve all the document-related parts, including included images, icons used as bullets in dotted lists, dingbats of TITLE tags etc.? How can I retrieve referenced documents e.g. HREF="http://www.test.com/page.html"?
Can I retrieve this data starting from the anchor (maybe using child anchors and links) or I must parse the SGML tags?
Question: once i got tha anchor of a text/html document, how can I access the actual HTML data?
Question: which is the actual meaning of anchors and links? Please, could you provide some simple examples?
Thanks VERY MUCH for your time and attention. I think you are doing a GREAT work! I know that version 4 of the Library is "in fieri", and I am looking forward your December release!
Many thanks again