Re: Comments on HTTP/1.0 Draft 3 from Jean-Philippe Martin-Flatin on 1995-09-21 (ietf-http-wg@w3.org from July to September 1995)

From: Jean-Philippe Martin-Flatin <syj@ecmwf.int>
Date: Thu, 21 Sep 1995 22:30:43 +0100
To: Roy Fielding <fielding@beach.w3.org>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9509212230.ZM23273@helena>
Hi Roy,

Thanks for your detailed comments. I agree with about half of them which I
didn't include here. You're left with just the groaning, I'm afraid...

On Sep 20, 23:53, Roy Fielding wrote:
>
> >## 1 ##
> ><Comment>
> >As a general rule, we should have less "should" and more "must" in this
> >spec. ....
>
> *bzzzt* did I mention "general rules" are out of order for HTTP/1.0?
> However, this will be done for HTTP/1.1 (in fact, I did it last weekend).

I don't understand why HTTP/1.1 and not HTTP/1.0. Currently, you use
"recommended", "encouraged" and "should" all over the place. Either HTTP/1.1
will have to toughen up dozens of areas and make backward compatibilty real
fun, or HTTP will forever remain just a set of recommendations.

> >## 4 ##
> ><Edit>
> >Page 5, definition of server:
> >"A program that accepts connections in order to service requests by
> >sending back responses". We should add after this sentence: "A server
> >can be an origin server, a proxy or both".
>
> That would be an insufficent definition.  The word "server" is defined
> correctly.

Huh ? What can a server be if it's neither an origin server nor a proxy ?
My suggested addition makes it especially clearer for the newbie, who wants
to get the terminology right before reading any further.

> >## 6 ##
> ><Pedantic>
> >Page 6, definition of #rule, line 3:
> >"and optional linear whitespace (LWS)". To be consistent with the rest of
> >section 2.1, this should read "and optional linear whitespace characters
> >(LWS)".  Same correction page 7, section implied*LWS, line 2.
>
> No, LWS == linear whitespace;   LWS != linear whitespace characters.
> Linear whitespace characters are what make up the content of LWS.

You missed my point, I wasn't clear enough: page 7, just after the definition
of CRLF, you use "linear whitespace characters"; page 7, in the definition of
implied*LWS, you use "zero or more linear whitespace"; and here you use
"linear whitespace". We need a consistent terminology, because the number of
characters you're talking about isn't clear. The word "whitespace", like
"space", is not numerable in plain English; but many people use it in jargon
as the character that you get when you hit the space bar, and in that context
it becomes numerable. As it currently stands, the spec confuses people who
think "linear whitespace" means "1 linear whitespace character".

The modification I initially suggested isn't optimal, I'd prefer this:

- Page 6, definition of #rule:

    The full form is "<n>#<m>element" indicating at least <n> and at
    most <m> elements, each separated by one or more commas (",") and
    some optional linear whitespace (LWS).

- Page 7, definition of implied*LWS:

    Except where noted otherwise, some linear whitespace (LWS) may
    optionally be included between any two adjacent words

- Page 7, after the definition of CRLF:

    HTTP/1.0 headers may be folded onto multiple lines if the
    continuation lines begin with some linear whitespace. All linear
    whitespace, including folding, has the same semantics as SP.

> >## 7 ##
> ><Edit>
> >Page 7, section 2.2, line 2:
> >We should add at the end of the first paragraph something like:
> >"Throughout this document, BNF definitions are not given in the usual top
> >to bottom manner, i.e. definitions are not only based on reserved words
> >previously defined, but also use reserved words defined further on in the
> >text. This was deemed to improve the clarity of the whole document".
>
> BNF definitions are never given "in the usual top to bottom manner".
> BNF is a mathematical formulism, not a programming language, and is
> not order-dependent.

Forward references are forbidden in mathematics, full stop: you define and
prove based on the corpus of what you already postulated, defined or proved.
BNF definitions could be made "top to bottom" in this spec, but wouldn't help
its reading. Hence my point.

> >## 11 ##
> ><Pedantic>
> >Page 8, line 5:
> >"any text" is confusing: "text" is a reserved word here, and refers to the
> >BNF definition further on. It's not the common word "text". This could be
> >made more explicit by using a different typography for reserved words,
> >e.g. italics.
>
> It is already in the BNF, so it must be a BNF word.

No, the BNF definition of "text" is several lines further. Also, the word
"text" is used thoughout the spec as a plain English word, so this argument
has little weight.

> This is exactly
> how the words are used in RFC 822 definitions.  Besides, you cannot put
> italics in a text/plain document.

Good point. May I suggest to use TEXT instead of text for the BNF definition
?

> >## 12 ##
> ><Edit>
> >Page 8, last but one line:
> >"Recipients... may assume that they represent ISO-8859-1 characters".
> >It's not "may", it's either "should" or "must". I'd vote for "must".
>
> Nope, the correct word is "may" (i.e., it is optional).

Granted, HTTP/1.0 is a BCP, so "must" is a bad choice here. But why not at
least recommend it ? You already changed MIME's US-ASCII to ISO-8859-1 for
the entity body page 14 section 3.6.1, why not here for the HTTP header ?

> >## 14 ##
> ><Edit>
> >Page 10, last but one line:
> >Typo: in "and HTTP proxies may receive requests for URLs", "proxies"
> >should clearly read "servers".
>
> No, I meant proxies.  You cannot send a URN to just any server.

Can you elaborate on this ? You seem to imply more than is actually written
in the spec.

> >## 16 ##
> ><Edit>
> >Page 11, section 3.3, second paragraph:
> >We should also add that the second format (RFC 850/1036) uses 2 digits for
> >the year, and with the year 2000 getting close, this format is likely to
> >be obsoleted by the next release of HTTP.
>
> It already says that too.  However, HTTP/1.0 cannot prescribe HTTP/1.1.

It doesn't. It just says that the RFC 822/1123 format is "preferred", which
is about the weakest way you could have put it. It makes a significant
difference for an implementor if you add that the RFC 850/1036 format is
likely to be obsoleted by next release, IMHO. When I read this section, I see
little stress on using the 1st format.

> >## 19 ##
> ><Edit>
> >Page 12, definition of charset:
> >"token" should be replaced with "<IANA character set>". It is a bad idea
> >not to use the IANA character sets, I can't see the rationale for it. The
> >2 sentences following the definition of charset should be replaced with:
> >"Applications are encouraged to use the preferred character set names
> >listed above, and required to use a character set defined by the IANA
> >registry".
>
> This is a protocol definition, not a religious specification.

Why do you want to support character sets not registered by IANA at all ?
IANA already registered so many (17 pages in RFC 1700), don't tell me that
80% of existing applications support at least one character set not
registered by IANA ! You worked hard to adhere to all existing standards
throughout this spec, I don't understand why you're lax about character sets.

> >## 20 ##
> ><Edit>
> >Page 13, section 3.5, first line:
> >Typo: "that has been or can be applied to a resource" should be replaced
> >with "that has been applied to a resource". A content coding that "can be"
> >applied to a resource is meaningless.
>
> It is a leftover from Accept-Encoding, and I'd rather not remove it.

I don't see the reason why you want to keep it

> >## 25 ##
> ><Edit>
> >Page 16, section 4.2:
> >Several points are not covered here. We should add the following to this
> >section:
> >"No header field has a default value, except Date: (Section 8.6). If a
> >field-value is specified without a field-content, it should be ignored.
> >The field-name is case-insensitive. If a field-name appears in more
> >than one header field, then the whole message should be discarded and
> >a 4xx or 5xx error returned".
>
> Rubbish. Those additions are not true of of all HTTP/1.0 header fields.

The area of default values for header fields needs to be reviewed. You
scattered a few "there's no default value for this header field" in chapter
8, but forgot it for most header fields. That's why I thought it was
appropriate to put it here rather than in all but one header fields
definitions. Apart from Date, what header field has a default value ?

I made a typo in the 2nd case, what I meant is: "if a field-name is specified
without a field-value, then it should be ignored". Have I overlooked any case
when this is not true ? In fact, the BNF should be fixed to be:
       HTTP-header    = field-name ":" field-value CRLF

If a field-name appears more than once and this is not a multipart message,
then the message is malformed. Hence my suggested error codes. What's rubbish
in that ?

> >## 26 ##
> ><Edit>
> >Page 17, first line:
> >Typo: "The order in which header fields are received is not significant":
> >"received" should read "sent", cf next sentence.
>
> Nope, significance is based on receiving, whereas we can be prescriptive
> only on sending.

In an application-layer protocol definition, you don't specify what is
received but what is sent by each party. What is received depends on
sub-layers: TCP re-ordering fragments, etc...

You may have a point in your comment about the word "significant", so I
suggest to rephrase this sentence as follows:

   "The order in which header fields are sent is not constrained".

If you don't like "constrained", you can also use "imposed".

> >## 31 ##
> ><Edit>
> >Page 21, section 6.1, definition of Status-Line:
> >The Reason-Phrase should really be optional, i.e. the BNF should read:
> >Status-Line    = HTTP-Version SP Status-Code [SP Reason-Phrase] CRLF
> >This is even implied next page: "since that entity is *likely* to include
> >human-readable information".
>
> The SP is required, and Reason-Phrase is already = *<...>,
> which is the same as being optional.

Come on, this is silly ! Why make the second SP required when Reason-Phrase
can be empty ? And why not make Reason-Phrase clearly optional in the BNF
definition of Status-Line, if it can be empty ? I know by now that you try
hard not to change a comma in the spec, which I understand, but my argument
is just common sense !

> >## 32 ##
> ><Pedantic>
> >Pages 22-26:
> >Sections 6.2.1 through 6.2.5 should be moved to chapter 8: chapters
> >4-5-6-7 are not in-depth, chapter 8 is.
>
> I don't see any advantage to doing that, particularly not now.

Currently, all details about header fields are in chapter 8, except status
codes which are in 6.2. The way you structured the RFC looks good to me:
4-5-6-7 general, 8 in-depth. Section 6.2 is misplaced in this structure.

> >## 33 ##
> ><Edit>
> >Page 23, definition of 201, line 4:
> >"or within a clearly defined timeframe": how can the client learn from
> >the server what this "clearly defined timeframe" is ? This is wishful
> >thinking: even if it looks like a good idea initially, it's not feasible
> >in practice with HTTP/1.0. This should be removed from the spec. This
> >may be put back in a later version of HTTP if we add a header field
> >for the server to tell the client what this "clearly defined timeframe"
> >is at the same time it returns 201.
>
> The "clearly defined timeframe" can be included in the response body.

OK, then let's replace the 3rd and 4th sentences of the 201 definition with
this:

   "The origin server is encouraged, but not obliged, to actually create
   the resource before using this Status-Code. If the action cannot be
   carried out immediately, then it is encouraged, but not obliged, to
   specify in the response body when this resource will be available
   (the format of this timeframe is not defined). Otherwise, the server
   should respond with 202 (accepted) instead."

> >## 34 ##
> ><Edit>
> >Page 24, definition of 300, last line:
> >"user agents may use the Location value for automatic redirection".
> >Read this sentence twice, it's ambiguous and has 2 meanings: what you
> >mean is that it can get from the server a Location field. What you can
> >also understands is that the client may use the Location field in its
> >request, which is wrong. Let you native English speakers devise a new
> >unambiguous sentence to replace this one !
>
> No -- there is absolutely nothing ambiguous about it, as would be obvious
> if you hadn't quoted only half the sentence.

You wrote it, so you read what you mean, not what is written. I suggest to
replace the last sentence with this:

    If the server has a preferred choice, it should include the URL in
    a Location field; user agents may use this field value for
    automatic redirection.

> >## 39 ##
> ><Comment>
> >Page 25, definition of 403:
> >We lack a status code whereby the server refuses to honor a request,
> >but is willing to say why, e.g. "this page is available to internal
> >users only". Should we create a new status code for this in HTTP/1.1 ?
>
> You don't need a new status code for that -- just include the reason
> in the response body.  The current wording is too restrictive.

OK, let's replace the whole 403 definition with this:

   The server understood the request, but is refusing to perform the
   request. Authorization will not help and the request should not be
   repeated. If the server wants to reveal to the client why it refused
   the request, then the reason should be specified in the entity body
   (the format of this text is undefined). Otherwise, the server
   should leave the entity body empty, and the client should assume
   that the server does not want to reveal why the request has not
   been fulfilled.

> >## 42 ##
> ><Edit>
> >Page 27, section 7.1, last line:
> >Add "unmodified" as follows:
> >"Unknown header fields should be ignored by the recipient and forwarded
> >unmodified by proxies".
>
> No, that is not possible in some circumstances.  Unknown header fields
> can always be folded and unfolded.

Could you elaborate on this ?

> >## 43 ##
> ><Pedantic>
> >Page 27, section 7.2, second paragraph:
> >Line 4 again, "HTTP/1.0 requests containing content" doesn't sound
> >great: "containing an entity-body" sounds better to me.
>
> Well, it isn't supposed to be a novella.

Sure, but it wouln't hurt !

> >## 45 ##
> ><Pedantic>
> >Page 28, line 4:
> >Delete "(i.e. the identity function)", it's dead wood and adds nothing.
>
> In this case, it defines the encoding in mathematical terms.

You're pushing a bit far, aren't you ?  ;-)
If it really looks like maths to you, then let it in...

> >## 46 ##
> ><Edit>
> >Page 28, lines 5-6:
> >Cf #25 for the default value. Replace lines 5-6 with:
> >"All HTTP/1.0 messages with an entity-body must have a Content-Type
> >header field. If and only if this header field is not specified, as is
> >always the case for HTTP/0.9 messages, the recipient may attempt to
> >guess the media type...".
>
> No, that does not match current practice, and would introduce
> unnecessary broken behavior.

Still the must vs encouraged_to debate... Don't you agree that the use of
Content-Type should be at least "recommended" or "encouraged", whatever you
consider the least committing ? I suggest this instead:

    Implementors are encouraged to provide a Content-Type header field
    in all HTTP/1.0 messages with an entity body. If and only if this
    header field is not specified, as is always the case for HTTP/0.9
    messages, the recipient may attempt to guess the media type...

> >## 49 ##
> ><Edit>
> >Page 28, last line:
> >Add a second note:
> >"Note: The Content-Length header field must not be specified if there is
> >no Entity-Body in the message; in other words, 'Content-Length: 0' is
> >invalid."
>
> Not true for methods like HEAD.  Not true in any case, since 0 is valid.

OK, let's make the second note look like this then:

    "Note: For the GET and POST methods, the Content-Length header field
     should not be specified if there is no Entity-Body in the message."

"should", not "must"  ;-)

> >## 50 ##
> ><Comment>
> >Pages 29-35:
> >It would be a good idea to add a line at the beginning of all 8.x sections
> >specifying whether this header field may be found in a request or a
> >response, e.g.:
> >Request: YES     Response: NO
> >For the moment, we need to edit the first sentence of most sections 8.x
> >to give this information (a few already have it).
>
> Those that apply only to request have "request"; those that apply only
> to responses have "response"; all the rest have neither.

This is certainly not true in Draft 3 ! My suggestion would help the reader
more than this convention, but your idea is not bad.

> >## 52 ##
> ><Edit>
> >Page 29, section 8.2:
> >Beginning of line 1, add: "This is a request header field".
>
> Unnecessary -- it says that already in the first line.

True, but it's implied. Cf #50: I try to standardize this information for all
header fields, so that the reader knows at once whether this header field may
be found in a request, a response or both (which is the single most important
thing for a header field). You answer it can be implied by reading the whole
herder field definition. Is it what the reader wants ?

> >## 55 ##
> ><Edit>
> >Page 30, section 8.4:
> >The paragraph after the example is wrong, cf page 20 line 1 (mandatory
> >for POST) and page 28 line 17. This paragraph should read instead:
> >"Applications must use this field to indicate the size of the Entity-Body
> >to be transferred, regardless of the media type of the entity".
> >Next line, "greater than or equal to zero" should become "greater than
> >zero", cf #49.
>
> No.  It is not mandatory, as is painfully evident on
> any server known to exist.  It is only required for requests containing
> an entity body, and that is adequately covered elsewhere.

No, you can't let a contradiction in a protocol spec. This section must be
amended. Here are the full references, to make the point clearer:

Page 20 line 1:

    A valid Content-Length is required on all HTTP/1.0 POST requests.

Page 28 line 17:

    Therefore, HTTP/1.0 requests containing content must include a
    valid Content-Length header field.

Page 30, section 8.4, line 7:

    Although it is not required, applications are strongly encouraged
    to use this field to indicate the size of the Entity-Body to be
    transferred, regardless of the media type of the entity.

> >## 57 ##
> ><Edit>
> >Page 30, section 8.6:
> >After the example, it states:
> >"If a message is received via direct connection with the user agent (in
> >the case of requests) or the origin server (in the case of responses),
> >then the default date can be assumed to be the current date at the
> >receiving end".
> >The presence of proxies is irrelevant here. This sentence should be
> >replaced with:
> >"If a message has no Date header field, then the recipient may assume
> >that the default date is the current date at the time the message is
> >received".
>
> On the contrary, it is entirely relevant. The date cannot be assumed if
> the message was not sent directly from the originator.

Who invented that rule ? It postulates that the network latency is negligible
compared to the proxy forwarding latency, i.e. there's at least one order of
magnitude between the 2. Network latency is typically about 250 ms between
the US and UK, and about 500 ms between Australia and UK. Luckily enough,
proxies generally do not have a forwarding latency of 2.5 to 5 seconds !

This rule is nonsense, it should be abandoned asap. The date needs to be
guessed if and only if the sender is not smart. If there's no date, let's at
least always allow the recipient to guess it !

> >## 59 ##
> ><Edit>
> >Page 31, section 8.7:
> >Lines 2-3, replace "Caching clients, including proxies" with "Caching
> >clients and proxies".
>
> No, proxies are clients.

Please stick to the terminology defined in section 1.3. Yes, a proxy
receiving a message with an Expires header is a client compared to the origin
server, but let's stick to the simple HTTP paradigm:

    client----proxy----origin_server

You defined "caching proxy" in section 1.3, so let's use it here:

    "Caching clients and caching proxies must not cache this copy of the
     resource beyond the date given..."

> >End of second paragraph, after "dynamism", we should add: "The Expires
> >date should not be earlier than the Date date, but this is not mandatory."
> >This is to cope with bogus implementations as explained in the note of
> >section 8.7.
>
> That would be incorrect.

Could you elaborate on this ? Just mention the remote possibility of using
'Expires: 01 Jan 1990 00:00:00 GMT' on www-talk and http-wg and you'll start
a religious war.

> >## 61 ##
> ><Edit>
> >Page 32, section 8.9:
> >In a) line 1, replace "200" with "2xx".
>
> No, 200 is correct.

Are you sure ? What about the other 2xx codes ?

> >End of section, add:
> >"Note: Servers implementors are encouraged to return responses with a
> >status code of 304 quicker (i.e. higher priority) than responses to a
> >normal GET or an If-Modified-Since with another status code." Not sure
> >many servers already prioritize their responses, but sounds like a good
> >idea, as it encourages caching.
>
> "Sounds like a good idea"?  No.

Could you elaborate on this ?

> >## 62 ##
> ><Edit>
> >Page 33, section 8.10:
> >Line 1, delete "sender believes the", that's dead wood. Nothing is
> >"guaranteed" per se, it's always as the client or the server believes
> >it.
>
> No, the distinction is important.

Could you elaborate on this ?

> >## 63 ##
> ><Pedantic>
> >Page 33, section 8.12:
> >Line 1: replace "MIME-conformant" with "MIME-compliant", to use the same
> >expression throughout the spec.
>
> RFC 1521 uses both, though the former more often.  Which is better?

I've read *-compliant 1000s of time and *-conformant 10s only, hence my
comment. Moreover, on this side of the Atlantic, "to conform" is often
pejorative while "to comply" isn't, so people tend to avoid using the former.
I don't know about American usage.

> >## 64 ##
> ><Edit>
> >Page 34, section 8.13:
> >Line 3, delete "intermediate" (cf terminology section 1.3).
> >First line after the definition of extension-pragma, replace
> >"intermediary" with "proxy" (cf terminology section 1.3).
>
> Some intermediaries are not proxies.  I have already defined
> this better for HTTP/1.1.

Yes, but you don't define it in HTTP/1.0, so please don't use it if you don't
need it. I don't think there's a consensus either as to what you mean with
intermediary, if I recall from recent discussions.

> >## 67 ##
> ><Comment>
> >Page 35, section 8.15:
> >After the example:
> >"If the response is being forwarded through a proxy, the proxy
> >application should not add its data to the product list".
> >In fact, the proxy mustn't overwrite any header field, except the
> >HTTP-Version in the Status-Line (cf page 9, last paragraph). So this
> >sentence should probably be removed.
>
> It's there for historical reasons.

Historians will see it in the drafts next century... Do you need it in the
spec ?

> >Last line of the note: this is security though obscurity. It gives you
> >the illusion of being more secure, that's all. If you have a server
> >open to the world, it's open for hackers. If they know a loophole to
> >break in say NCSA httpd 1.3, they'll try it on your server, whatever
> >you return with Server. I don't think there's any point in encouraging
> >servers implementors to make Server a configurable option.
>
> This opinion has been proven false on any number of occasions.
> Knowing the exact version makes it easier (and thus faster)
> for a cracker, and thus makes it easier for them to take advantage
> of your server without (or before) being detected.

Your opinion that you buy more security with obscurity has also been proven
wrong on many occasions. Read firewalls, bugtraq, comp.security.* or Cheswick
& Bellovin's bible and you'll soon be convinced. The sheer fact that you can
change the value of Server means a potential intruder won't trust it. Rather
than getting your Server field, he'll probe your server straight away for the
backdoor he knows.

> >## 70 ##
> ><Edit>
> >Page 36, last but one paragraph:
> >The second sentence is ambiguous: you want to allow authentication and
> >encryption mechanisms which are not at the transport level as well, cf
> >IPv6. I suggest to rephrase it like this:
> >"Additional authentication and encryption mechanisms may be used, e.g.
> >at transport level via message encapsulation, and/or with additional
> >headers...".
>
> No, you are misreading the paragraph.  All three mechanism are not defined
> by this spec.

No, you are misreading me: you want to give examples of additions, but your
wording actually restricts the additions to just "transport level" and
"additional header fields". IPv6 e.g. introduces extra security at the
routing level. With my correction, this restriction is lifted.

> >## 72 ##
> ><Comment>
> >Page 39, lines 3-4:
> >The real security problem is the weakness of the authentication and
> >autorization schemes available in HTTP/1.0, as stated earlier in section
> >10. Referer, From and Server are negligible in comparison, I see little
> >point in insisting so heavily on them.
>
> Others differ with your opinion.

Could you elaborate on this ?


Again, many thanks for all your comments.

Jean-Philippe
Received on Thursday, 21 September 1995 14:34:47 UTC