HTTP HEAD not HTML <HEAD>...</HEAD>

I believe that the HTTP HEAD request only get you the RFC-822-ish
HTTP message head, not the HTML <HEAD>...</HEAD> block.  On the
other hand, Brian Behlendorf did coin the following tempting dodge:

----- Forwarded message from Brian Behlendorf -----

Date: Sun, 11 May 1997 17:25:44 -0700 (PDT)
From: Brian Behlendorf <brian@hyperreal.com>

[...]
Another response I'd give is that if I was writing the client, one thing I
could do would be to use the "byterange" part of the HTTP/1.1 spec and
only access the first, say 500 bytes of a document or object, on the
premise that any <TITLE> is likely to be there 90% of the time.  So long
as all the HTTP headers and the first chunk of text can fit inside 1000
bytes, then it'll just be one TCP/IP packet; the majority of delay when
connections start up come from waiting for a second or third packet, it
doesn't make much difference if the first packet is 5 bytes or 1000.  But
that's a nit I guess.

On a larger scale, the question is, is there a framework for general
metainformation about internet objects.  Some people are starting to make
noise that PICS could be used as a framework: http://www.w3.org/PICS/

------------------------------------------------------------------------
By sending a GET with a BYTERANGE restriction one could get a single-
packet sample, and test for <HEAD> closure in the fragment returned
(similarly for PNG if the description is in the head of the file).

One could iterate in cases where the GET of the <HEAD> failed to be
complete.  That would get all present <TITLE>, <META> and <LINK> 
information.

So not all the meta-data based methods require new filing conventions
or the registration of MIME headers.

--
Al Gilman

Received on Thursday, 10 July 1997 12:16:06 UTC