Recursive URL retriever from Giovanni Vigna on 1995-12-01 (www-lib@w3.org from October to December 1995)

From: Giovanni Vigna <vigna@ipmel2.elet.polimi.it>
Date: Fri, 01 Dec 1995 17:33:13 +0100
To: www-lib@w3.org
Message-Id: <9512011633.AA05051@ipmel2.elet.polimi.it>

Dear WWW Library developers,
	I am writing an application using your (great) WWW library.
The application should retrieve recursively document referenced from a starting URL under some constraints in order to avoid downloading the entire Web :).

I read carefully the Library documentation, and examined your LineMode textual browser in order to understand how to implement my tool.

Unfortunately, due to MY limits I was not able to understand how to retrieve all the subparts of a document once I got the starting URL anchor.

Question: how can I retrieve all the document-related parts, including included images, icons used as bullets in dotted lists, dingbats of TITLE tags etc.? How can I retrieve referenced documents e.g. HREF="http://www.test.com/page.html"?

Can I retrieve this data starting from the anchor (maybe using child anchors and links) or I must parse the SGML tags?

Question: once i got tha anchor of a text/html document, how can I access the actual HTML data?

Question: which is the actual meaning of anchors and links? Please, could you provide some simple examples?



Thanks VERY MUCH for your time and attention. I think you are doing a GREAT work! I know that version 4 of the Library is "in fieri", and I am looking forward your December release!

Many thanks again

Giovanni Vigna
vigna@elet.polimi.it

Received on Friday, 1 December 1995 11:33:51 UTC