Comments on HTTP/1.0 Draft 3

Hi,

Here are some comments on HTTP 1.0 Draft 3, referencing the PostScript
version. I've tried to sort them in 5 categories: <Problem>, <Why>,
<Edit>, <Comment> and <Pedantic> to help you browse through them.
Feedback is welcome !

## 1 ##
<Comment>
As a general rule, we should have less "should" and more "must" in this
spec. Otherwise, backward compatibility will be a nightmare with the
advent of HTTP/1.1 and HTTP/1.2.
Also, we should stick to the usual hierarchy {must, should, may} found
in RFCs, and avoid expressions like "are strongly encouraged to" wherever
possible.

## 2 ##
<Why>
Page 4, section 1.1, first paragraph:
"This specification is not intended to become an Internet standard".
Why ? FTP is a standard, Telnet is a standard, why shouldn't HTTP
become one ?

## 3 ##
<Pedantic>
Page 4, section 1.2, first paragraph:
"metainformation" should read "meta-information". The same correction
should be made page 5, definition of entity, page 19, section 5.2.2,
lines 2 and 4, page 22, last line before section 6.2.1, page 23, last
line but one, page 27, section 7.1, line 1.

## 4 ##
<Edit>
Page 5, definition of server:
"A program that accepts connections in order to service requests by
sending back responses". We should add after this sentence: "A server
can be an origin server, a proxy or both".

## 5 ##
<Edit>
Page 5, definition of proxy:
We should remove the last sentence: "Some proxy servers also act as origin
servers". It's confusing in this context, and #4 makes this point clear.

## 6 ##
<Pedantic>
Page 6, definition of #rule, line 3:
"and optional linear whitespace (LWS)". To be consistent with the rest of
section 2.1, this should read "and optional linear whitespace characters
(LWS)".  Same correction page 7, section implied*LWS, line 2.

## 7 ##
<Edit>
Page 7, section 2.2, line 2:
We should add at the end of the first paragraph something like:
"Throughout this document, BNF definitions are not given in the usual top
to bottom manner, i.e. definitions are not only based on reserved words
previously defined, but also use reserved words defined further on in the
text. This was deemed to improve the clarity of the whole document".

## 8 ##
<Why>
Page 7, section 2.2, definition of LWS:
Why do we have [CRLF] ? Are there > 80% of the WWW applications using
CRLF as a linear whitespace character ? I doubt it. Page 16 section 4.2,
it says: "Header fields can be extended over multiple lines by preceding
extra line with at least one LWS, though this is not recommended". Better
than not recommended, this should be forbidden by the protocol. Instead,
a mention to this isue on page 43 section B ("Tolerant applications")
would be a better idea. What is the rationale for allowing it ?

## 9 ##
<Why>
Page 7, section 2.2, definition of tspecials:
What does the "t" stand for in the reserved word "tspecials" ?

## 10 ##
<Pedantic>
Page 7, section 2.2, definition of tspecials:
Rather than 4 lines, this could be compressed in 2 lines. The same comment
applies to many other BNF definitions. This would make the whole document
more concise.

## 11 ##
<Pedantic>
Page 8, line 5:
"any text" is confusing: "text" is a reserved word here, and refers to the
BNF definition further on. It's not the common word "text". This could be
made more explicit by using a different typography for reserved words,
e.g. italics.

## 12 ##
<Edit>
Page 8, last but one line:
"Recipients... may assume that they represent ISO-8859-1 characters".
It's not "may", it's either "should" or "must". I'd vote for "must".

## 13 ##
<Edit>
Page 10, section 3.2.1, definition of scheme:
Do we really need to allow "+", "-" and "." in a scheme name ?
I don't think so. ALPHA and DIGIT are enough.

## 14 ##
<Edit>
Page 10, last but one line:
Typo: in "and HTTP proxies may receive requests for URLs", "proxies"
should clearly read "servers".

## 15 ##
<Edit>
Page 11, section 3.2.2:
The trailing slash to indicate the default page of a server (e.g.
http://www.w3.org/ as opposed to http://www.w3.org) should be mandatory
for clients, and servers should apply the robustness principle, i.e.
understand URIs without a trailing slash and add one. The current wording
is a bit vague on this issue.

## 16 ##
<Edit>
Page 11, section 3.3, second paragraph:
It should be stated more clearly that the 3rd format (asctime) is obsolete,
that neither clients nor servers should generate a date in asctime, but
both should be able to understand it (again, principle of robustness).
We should also add that the second format (RFC 850/1036) uses 2 digits for
the year, and with the year 2000 getting close, this format is likely to
be obsoleted by the next release of HTTP.

## 17 ##
<Edit>
Page 11, definitions of rfc1123-date and rfc850-date:
To be consistent, we should name them either rfc1123-date and rfc1036-date
or rfc822-date and rfc850-date. The current names are inconsistent.

## 18 ##
<Edit>
Page 12, last line before the definition of charset:
The line "and other names specifically recommended for use within MIME
charset parameters" should be deleted. This is not an exhaustive list
of all enabled charsets, but just "the preferred names for those character
sets most likely to be used".

## 19 ##
<Edit>
Page 12, definition of charset:
"token" should be replaced with "<IANA character set>". It is a bad idea
not to use the IANA character sets, I can't see the rationale for it. The
2 sentences following the definition of charset should be replaced with:
"Applications are encouraged to use the preferred character set names
listed above, and required to use a character set defined by the IANA
registry".

## 20 ##
<Edit>
Page 13, section 3.5, first line:
Typo: "that has been or can be applied to a resource" should be replaced
with "that has been applied to a resource". A content coding that "can be"
applied to a resource is meaningless.

## 21 ##
<Why>
Page 13, section 3.6, line 6-7:
"because it does not restrict itself to the official IANA and x-token
types". Why ? What's the rationale ?

## 22 ##
<Edit>
Page 13, section 3.6, after definition of subtype:
We should add the following before "Parameters may follow...":
"HTTP/1.0 uses media-type values in the Content-Type (Section 8.5) header
field".
This makes it consistent with section 3.5 and its reference to section 8.3.

## 23 ##
<Why>
Page 15, section 3.6.2, line 2:
"The multipart types registered by IANA [15] do not have any special
meaning for HTTP/1.0".
Why ? This section says that HTTP multipart messages are possible, and
says further on that "multipart body-parts may contain HTTP header fields
which are significant to the meaning of this part". But the definition
of Full-Request page 18 allows a single {Entity-Header, CRLF,
Entity-Body}, not multiple ones, so there seems to be a contradiction.
The multipart issue probably needs further clarification. Is this an
item for HTTP/1.1 ?

## 24 ##
<Pedantic>
Page 16, section 4.1, line 7:
Typo: "a.k.a." should be expanded to "also known as": this is a formal
spec !

## 25 ##
<Edit>
Page 16, section 4.2:
Several points are not covered here. We should add the following to this
section:
"No header field has a default value, except Date: (Section 8.6). If a
field-value is specified without a field-content, it should be ignored.
The field-name is case-insensitive. If a field-name appears in more
than one header field, then the whole message should be discarded and
a 4xx or 5xx error returned".

## 26 ##
<Edit>
Page 17, first line:
Typo: "The order in which header fields are received is not significant":
"received" should read "sent", cf next sentence.

## 27 ##
<Edit>
Page 17, lines 5-6:
If we trust section 3.6.2 that "multipart body-parts may contain HTTP
header fields", then this sentence is wrong: in each part of a
multipart message, we could have the same HTTP header field.

## 28 ##
<Problem>
Page 17, section 4.3:
"There are a few headers... which do not apply to the communicating
parties or the content being transferred". MIME-Version is surely
concerned in the content being transferred, so there's a problem here !

## 29 ##
<Problem>
Page 17:
Should a maximum length be defined for the HTTP header = General-Header
+ Response-Header + Entity-Header + {Request-Line|Status-Line}, say 4KB ?
That would ensure that a server cannot get stuck reading an infinite HTTP
header from a bogus client.

## 30 ##
<Why>
Page 18, section 5.2, line 2:
"The method is case-sensitive". Why ? Why couldn't we accept Get, Head
and Post for instance ? Almost everything else is case-insensitive, why
be more restrictive here ?

## 31 ##
<Edit>
Page 21, section 6.1, definition of Status-Line:
The Reason-Phrase should really be optional, i.e. the BNF should read:
Status-Line    = HTTP-Version SP Status-Code [SP Reason-Phrase] CRLF
This is even implied next page: "since that entity is *likely* to include
human-readable information".

## 32 ##
<Pedantic>
Pages 22-26:
Sections 6.2.1 through 6.2.5 should be moved to chapter 8: chapters
4-5-6-7 are not in-depth, chapter 8 is.

## 33 ##
<Edit>
Page 23, definition of 201, line 4:
"or within a clearly defined timeframe": how can the client learn from
the server what this "clearly defined timeframe" is ? This is wishful
thinking: even if it looks like a good idea initially, it's not feasible
in practice with HTTP/1.0. This should be removed from the spec. This
may be put back in a later version of HTTP if we add a header field
for the server to tell the client what this "clearly defined timeframe"
is at the same time it returns 201.

## 34 ##
<Edit>
Page 24, definition of 300, last line:
"user agents may use the Location value for automatic redirection".
Read this sentence twice, it's ambiguous and has 2 meanings: what you
mean is that it can get from the server a Location field. What you can
also understands is that the client may use the Location field in its
request, which is wrong. Let you native English speakers devise a new
unambiguous sentence to replace this one !

## 35 ##
<Problem>
Page 24, definition of 301, second paragraph:
The new URL is given twice, once in the Location header field, once in
the Entity-Body. This is redundant and a loss of bandwidth. It shouldn't
appear in the Entity-Body, IMHO. Ditto for 302.

## 36 ##
<Why>
Page 24, definition of 301, third paragraph:
"If the 301 status code is received in response to a request using the
POST method, the user agent must not automatically redirect the request
unless it can be confirmed by the user, since this might change the
conditions under which the request was issued."
What is the rationale ? What practical case do you have in mind ?
Ditto for 302.

## 37 ##
<Edit>
Page 25, section 6.2.4:
Line 2, in "should immediately cease", replace "should" with "must".
Line 3, in "the server is encouraged to include", replace "is
encouraged to" with "should".
In the definition of 400, in "The client is discouraged from
repeating", replace "is discouraged from" with "should not".

## 38 ##
<Edit>
Page 25, definition of 401:, line 3:
Just after "suitable Authorization", we should add "(Section 8.2)".

## 39 ##
<Comment>
Page 25, definition of 403:
We lack a status code whereby the server refuses to honor a request,
but is willing to say why, e.g. "this page is available to internal
users only". Should we create a new status code for this in HTTP/1.1 ?

## 40 ##
<Edit>
Page 26, line 4-5:
In "it should immediately cease", replace  "should" with "must".
In "the server is encouraged to include an entity", replace "is
encouraged to" with "should".

## 41 ##
<Comment>
Page 27, definition of Entity-Header:
Allow is considered as an Entity-Header. Isn't it server-specific rather
than URI-specific ? In other words, shouldn't it be a Response-Header
instead ?

## 42 ##
<Edit>
Page 27, section 7.1, last line:
Add "unmodified" as follows:
"Unknown header fields should be ignored by the recipient and forwarded
unmodified by proxies".

## 43 ##
<Pedantic>
Page 27, section 7.2, second paragraph:
Line 2, "in general" should be deleted, because it's always the case:
cf line 4 "must include".
Line 4, there's a typo: "request message header", singular.
Line 4 again, "HTTP/1.0 requests containing content" doesn't sound
great: "containing an entity-body" sounds better to me.

## 44 ##
<Edit>
Page 27, section 7.2, last line:
"The responses 204 (no content) and 304 (not modified)". Should also add
"and 403 (forbidden)".

## 45 ##
<Pedantic>
Page 28, line 4:
Delete "(i.e. the identity function)", it's dead wood and adds nothing.

## 46 ##
<Edit>
Page 28, lines 5-6:
Cf #25 for the default value. Replace lines 5-6 with:
"All HTTP/1.0 messages with an entity-body must have a Content-Type
header field. If and only if this header field is not specified, as is
always the case for HTTP/0.9 messages, the recipient may attempt to
guess the media type...".
Line 8, replace "the receiver should" with "the recipient must".

## 47 ##
<Pedantic>
Page 28, section 7.2.2, paragraph 2, line 2-3:
"containing content" should become "containing an entity-body".
End of line 3, "entity body" should become "entity-body" to be consistent
with the rest of the document.

## 48 ##
<Edit>
Page 28, last 2 lines:
"Unless the client knows that it is receiving a response from a compliant
server, it should not depend on the Content-Length value being correct".
Argh ! These servers aren't HTTP/1.0 compliant, let's not ask the clients
to break the protocol to accomodate them ! This sentence should be deleted.
Maybe replaced with a reminder that clients should be robust, but I doubt
it, as the rest of the note makes the point clear.

## 49 ##
<Edit>
Page 28, last line:
Add a second note:
"Note: The Content-Length header field must not be specified if there is
no Entity-Body in the message; in other words, 'Content-Length: 0' is
invalid."

## 50 ##
<Comment>
Pages 29-35:
It would be a good idea to add a line at the beginning of all 8.x sections
specifying whether this header field may be found in a request or a
response, e.g.:
Request: YES     Response: NO
For the moment, we need to edit the first sentence of most sections 8.x
to give this information (a few already have it).

## 51 ##
<Edit>
Page 29, section 8.1:
Line 1, add "response" after "Allow".
Line 4, in "thus should be ignored", replace "should" with "must".
Second paragraph, line 2, delete "This field has no default value
(cf #25).
Third paragraph, replace "the allow header" with "the Allow header
field", to remain consistent in terms of terminology.

## 52 ##
<Edit>
Page 29, section 8.2:
Beginning of line 1, add: "This is a request header field".
Last line, replace "Proxies must not cache the response to a request"
with "Proxies must not cache any HTTP/1.0 message".

## 53 ##
<Edit>
Page 29, section 8.3:
Line 1, add "(request or response)" after "header field".

## 54 ##
<Edit>
Page 30, lines 2-3:
Delete "or analogous usage". The sentence starts with "Typically", so
you don't need it.

## 55 ##
<Edit>
Page 30, section 8.4:
Line 1, add "(request or response)" after "header field".
The paragraph after the example is wrong, cf page 20 line 1 (mandatory
for POST) and page 28 line 17. This paragraph should read instead:
"Applications must use this field to indicate the size of the Entity-Body
to be transferred, regardless of the media type of the entity".
Next line, "greater than or equal to zero" should become "greater than
zero", cf #49.
In the Note, line 3, replace "should" with "must". The rationale is that
it's mandatory, but applications should be robust and not crash if it's
not there.

## 56 ##
<Edit>
Page 30, section 8.5:
Line 1, add "(request or response)" after "header field".
After the example, delete "The Content-Type header field has no default
value", cf #25.

## 57 ##
<Edit>
Page 30, section 8.6:
Line 1, replace "The Date header" with "The Date header field (request
or response)".
After the example, it states:
"If a message is received via direct connection with the user agent (in
the case of requests) or the origin server (in the case of responses),
then the default date can be assumed to be the current date at the
receiving end".
The presence of proxies is irrelevant here. This sentence should be
replaced with:
"If a message has no Date header field, then the recipient may assume
that the default date is the current date at the time the message is
received".
Last line of the page, in "origin servers should always include a Date
header", replace "should" with "must".

## 58 ##
<Edit>
Page 31, section 8.6:
Line 5, delete first sentence: "Only one Date header field is allowed
per message" (cf #25).

## 59 ##
<Edit>
Page 31, section 8.7:
Line 1, add "response header" after "Expires".
Lines 2-3, replace "Caching clients, including proxies" with "Caching
clients and proxies".
Second paragraph, i.e. after the example, delete sentence "The Expires
field has no default value" (cf #25).
End of second paragraph, after "dynamism", we should add: "The Expires
date should not be earlier than the Date date, but this is not mandatory."
This is to cope with bogus implementations as explained in the note of
section 8.7.
Third paragraph, "The Expires field" should become "The Expires header
field", for consistency.

## 60 ##
<Edit>
Page 32, section 8.8:
Line 1, add "request" after "From".
Line 3, "as updated" becomes "and updated".
End of note, add "(Section 8.14)".

## 61 ##
<Edit>
Page 32, section 8.9:
Line 1, add "request" after "If-Modified-Since".
In a) line 1, replace "200" with "2xx".
End of section, add:
"Note: Servers implementors are encouraged to return responses with a
status code of 304 quicker (i.e. higher priority) than responses to a
normal GET or an If-Modified-Since with another status code." Not sure
many servers already prioritize their responses, but sounds like a good
idea, as it encourages caching.

## 62 ##
<Edit>
Page 33, section 8.10:
Line 1, add "response" after "Last-Modified".
Line 1, delete "sender believes the", that's dead wood. Nothing is
"guaranteed" per se, it's always as the client or the server believes
it.
Line 3, replace "receiver" with "recipient" twice (cf terminology at
the beginning of section 8).
Replace last line of section 8.10 with:
"In such cases, where the resource's last modification time would
indicate some time in the future (e.g. due to time skew between the
origin server and a database accessed via a gateway), the server must
replace that date with the message origination date".

## 63 ##
<Pedantic>
Page 33, section 8.12:
Line 1: replace "MIME-conformant" with "MIME-compliant", to use the same
expression throughout the spec.
Line 1, add "(both requests and responses)" after "HTTP/1.0 messages".
Line 9, replace "intended to be MIME-conformant" with "fully
MIME-compliant".

## 64 ##
<Edit>
Page 34, section 8.13:
Line 1, replace "Pragma message" with "Pragma response".
Line 3, delete "intermediate" (cf terminology section 1.3).
First line after the definition of extension-pragma, replace
"intermediary" with "proxy" (cf terminology section 1.3).

## 65 ##
<Why>
Page 34, section 8.13, lines 4-5:
"All pragma directives specify optional behavior from the viewpoint of
the protocol": why optional rather than mandatory ?

## 66 ##
<Edit>
Page 34, section 8.14:
Last line, add "(cf Section 8.8)".

## 67 ##
<Comment>
Page 35, section 8.15:
After the example:
"If the response is being forwarded through a proxy, the proxy
application should not add its data to the product list".
In fact, the proxy mustn't overwrite any header field, except the
HTTP-Version in the Status-Line (cf page 9, last paragraph). So this
sentence should probably be removed.
Last line of the note: this is security though obscurity. It gives you
the illusion of being more secure, that's all. If you have a server
open to the world, it's open for hackers. If they know a loophole to
break in say NCSA httpd 1.3, they'll try it on your server, whatever
you return with Server. I don't think there's any point in encouraging
servers implementors to make Server a configurable option.

## 68 ##
<Edit>
Page 35, section 8.16:
Line 1: add "request header" after "User-Agent".
Line 4, replace "should always" with "should": again, let's stick to
the RFCs definitions of must, should and may.

## 69 ##
<Edit>
Page 35, section 8.17:
Line 1, add "response" after "WWW-Authenticate".

## 70 ##
<Edit>
Page 36, last but one paragraph:
The second sentence is ambiguous: you want to allow authentication and
encryption mechanisms which are not at the transport level as well, cf
IPv6. I suggest to rephrase it like this:
"Additional authentication and encryption mechanisms may be used, e.g.
at transport level via message encapsulation, and/or with additional
headers...".

## 71 ##
<Edit>
Page 36, last but one line:
Replace "and must not cache a request containing Authorization" with "and
must not cache a message containing an Authorization or WWW-Authenticate
header field".


## 72 ##
<Comment>
Page 39, lines 3-4:
"the Referer field may indicate a private document's URI whose
publication would be inappropriate".
If it's private, it's either not accessible to anyone except the owner,
or not accessible to external users. So there's little risk here, IMHO.
The real security problem is the weakness of the authentication and
autorization schemes available in HTTP/1.0, as stated earlier in section
10. Referer, From and Server are negligible in comparison, I see little
point in insisting so heavily on them.

## 73 ##
<Pedantic>
Page 40, line 10:
Replace "Jean Francois-Groff" with "Jean-Francois Groff".

## 74 ##
<Pedantic>
Page 43, section B, lines 5-6:
To be consistent with the rest of the document:
Replace "StatusLine" with "Status-Line".
Replace "RequestLine" with "Request-Line".

## 75 ##
<Pedantic>
Page 44, section C, line 4:
To be consistent with the rest of the document:
Replace "MIME-conforming" with "MIME-compliant".
Ditto page 45, line 2.

## 76 ##
<Comment>
Page 45:
The difference between Content-Encoding and Content-Transfer-Encoding is
not crystal clear (at least to me !). A few words explaining it would be
welcome in section C.4.

## 77 ##
<Comment>
Page 45:
A section C.5 about multipart could be added, cf #23 and #27.


Et voila ! Thanks for reading up to here, it was rather longish...

Jean-Philippe

Received on Wednesday, 20 September 1995 04:07:18 UTC