Gecse Roland writes:
> In my robot I use the HTParse with PARSE_HOST to extract the
> starting hostname from a command line query. The robot takes everything
> under this URL. BUT, is a host has more than one names, and in the HTML
> are references to both of them, how can I know if this is the same host?
Good question. In the 4.0D vesion, the HTDNS module kept its own cache of DNS
entries in order to do the timing and to save DNS queries. After intense
discussion on the HTTP working group mailing list, it is enforced that _if_
you cache DNS entries then you _must_ honor the TTL for the DNS records.
Unfortunately, gethostbyname doesn't provide this information, so the current
version of HTDNS does not conform to this. However, as you can manually set
the timeout for entries and it automatically flushes the cache if an error
occurs, it does better than most other Web applications that I have seen.
In the next version, the DNS stuff will be separated out and eventually we
have to write our own DNS resolver that gives better information about:
- canonical names
- connect times
> 1) How can I get the the IP address of a host from the hostname using the
Currently there is no way (because of limitations in gethostbyname) to know
that www12.w3.org is the same as www.w3.org.
> 2) How can I get all the different names of a host?
Again, there is no way to do this. The best you can get is the IP addresses of
a multihomed host - not the alias names.
Henrik Frystyk Nielsen, <email@example.com>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA
- From: Gecse Roland <firstname.lastname@example.org>