Re: HTTP HEAD request

Martian (abigail@mars.ic.iaf.nl)
Sun, 9 Apr 1995 02:49:42 +0200 (MET DST)


Message-Id: <m0rxlCC-0002FuC@mars.ic.iaf.nl>
From: abigail@mars.ic.iaf.nl (Martian)
Subject: Re: HTTP HEAD request
To: hurleyj@arachnaut.org (Jim Hurley)
Date: Sun, 9 Apr 1995 02:49:42 +0200 (MET DST)
Cc: www-html@www10.w3.org
In-Reply-To: <199504080744.AA10504@foxtrot.rahul.net> from "Jim Hurley" at Apr 8, 95 00:44:55 am

Once upon a time you, Jim Hurley, wrote:
--> I wrote:
--> >And according to the DTD:
--> >
--> ><!ELEMENT HEAD O O  (%head.content)>
--> ><!ELEMENT BODY O O  %body.content>
--> >
--> ><!ENTITY % html.content "HEAD, BODY+">
--> >
--> >The first  O indicates  the opening  tag is  optional, the  second one
--> >indicates the closing tag is optional.
--> 
--> Sorry.
--> 
--> >Every HTML document must have a head, and I did not say it should not.
--> >All  I said  is that  the <head>,  </head> *tags*  do not  have to  be
--> >present, as  confirmed by  the DTD. (Similar  for the  <body>, </body>
--> >tags.)  Apparently,  HTML parsers  are  smart  enough to  decided  for
--> >themselves what is the head and what is the body.
--> >
--> >--> >Returning an  error if  it encounters  EOF before  </head> would  be a
--> >--> >major design bug.
--> >--> 
--> >--> A major design bug of the HTML document, yes - but these are so
--> >--> commonly encountered.
--> >
--> >Nope, just like </p>, </li>, etc some tags are not required.
--> >
--> >
--> >Abigail
--> 
--> But this last part was about encountering a <head> but not getting
--> a matching </head>. Are you saying the <head> is terminated by
--> <body> or some body part?

All I said  is that all the  tags <head>, </head>, <body>  and </body>
are optional. One could have a document  with just the </head> tag, or
only <head> and </body>. It is all  legal according to the DTD. So, if
you want to grap  the head *section* (not the <head>  *tag*) you would
have to be a little smarter. However,  since there are only a few tags
part of the head section, it  is not difficult. Whenever you encounter
anything which is not  enclosed by any of the valid  head section tags
(like <title>,</title>) you have reached the body part.

However, the question  was originally raised asking a way  to get only
the  head of  a  document. This  means  the server  has  to parse  the
document  itself,   which  makes  servers   more  complex,   and  more
importantly, slower.


Abigail