W3C home > Mailing lists > Public > www-lib@w3.org > October to December 1995

Recursive URL retriever

From: Giovanni Vigna <vigna@ipmel2.elet.polimi.it>
Date: Fri, 01 Dec 1995 17:33:13 +0100
Message-Id: <9512011633.AA05051@ipmel2.elet.polimi.it>
To: www-lib@w3.org

Dear WWW Library developers,
	I am writing an application using your (great) WWW library.
The application should retrieve recursively document referenced from a starting URL under some constraints in order to avoid downloading the entire Web :).

I read carefully the Library documentation, and examined your LineMode textual browser in order to understand how to implement my tool.

Unfortunately, due to MY limits I was not able to understand how to retrieve all the subparts of a document once I got the starting URL anchor.

Question: how can I retrieve all the document-related parts, including included images, icons used as bullets in dotted lists, dingbats of TITLE tags etc.? How can I retrieve referenced documents e.g. HREF="http://www.test.com/page.html"?

Can I retrieve this data starting from the anchor (maybe using child anchors and links) or I must parse the SGML tags?

Question: once i got tha anchor of a text/html document, how can I access the actual HTML data?

Question: which is the actual meaning of anchors and links? Please, could you provide some simple examples?

Thanks VERY MUCH for your time and attention. I think you are doing a GREAT work! I know that version 4 of the Library is "in fieri", and I am looking forward your December release!

Many thanks again

Giovanni Vigna
Received on Friday, 1 December 1995 11:33:51 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:33:46 UTC