Comments on HTTP draft [of 23 Nov 1994] from Mike Cowlishaw on 1994-11-29 (ietf-http-wg@w3.org from October to December 1994)

From: Mike Cowlishaw <mfc@vnet.ibm.com>
Date: Tue, 29 Nov 94 16:14:04 GMT
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9411291613.AA02557@hplb.hpl.hp.com>
I was very pleased to see the new HTTP draft; it's a major improvement
on previous versions!  Here are some comments on the new draft which I
hope will be useful.

I am writing these from the perspective of an implementer of software
for a reasonably general-purpose HTTP server, so I am especially looking
for a definition of the HTTP which

 a) allows a server to be implemented using the definition (and
    referenced documents) and specifically without reference to specific
    clients or other implementations

 b) makes it absolutely clear what is required for a server to be stated
    as conforming to the definition.

Most of these comments therefore seek clarification in these two areas.

I'm sorry I won't be able to attend the IETF meeting next week--a
long-standing commitment has me on the wrong coast of the USA.

Mike Cowlishaw
IBM Fellow, IBM UK Laboratories, Winchester, UK

- - - - - - - - -

2.1 The '#' rule implies that no whitespace is allowed after (or before)
    the commas in a list.  Is this correct?  For example, in the example
    in 7.1 there is a space after each comma (which certainly aids
    readability).

2.2 linear-white-space rule: I didn't understand the comment "CRLF =>
    Folding".  I think this rule allows whitespace (but non-null) lines
    in headers etc.?

3.1 Header fields:

    (a) "However ... use of comments is discouraged".  This seems rather
    outside the scope of a definition such as this; at most it should be
    a informational note, and explain why the note is there (historical
    or client incompatibility, performance, reduced net traffic?).

    (b) [nit] the second open quote in the comment rule should be a
    close quote.

    (c) The ctext rule seems to be missing some characters (there's an
    open quote, followed by an open single quote, but neither is
    closed).  Also, shouldn't LF be excluded too?

3.2 Object body:

    (a) This is my 'biggest' question -- I don't understand from the
    second paragraph how to determine when to stop reading the data on a
    request.  If the headers are only 'similar' to those defined by
    MIME, then the MIME definition may or may not be relevant.
    Moreover, a "heuristic function of the Content-Type and
    Content-Encoding" would appear to be unimplementable, as new Types
    (especially application/xxx) seem to spring up daily.

    It would seem to be appropriate that the HTTP protocol specify that
    Content-Length, in bytes, be Required--at least for Requests.

    (b) Does the server have to read the headers (and data, if any) on a
    request?  For example, if the customizing filter/script doesn't need
    the information in order to determine the response, is the server
    permitted to leave the data unread, or could this embarrass some
    client(s) or TCP/IP stack(s)?

4.1 Date/Time stamps:

    (a) I'm a little disturbed that time is only permitted to be
    specified to the second, given that most server hardware will be
    able to handle many more than one request per second, and network
    transit time is often sub-second.  If this is a limitation of RFC
    1123, perhaps an additional HTTP header for sub-second time
    information should be specified.

    (b) Since this is a new standard/document, surely it should specify
    a single Date/Time format, and only mention the others for
    compatibility/historical information?

    (c) [aside] I wish, oh wish, that Longitude/Latitude information
    were a recommended header.

4.2 Multipart types: From the text, I infer that a server/script is
    not Required to respond with a multipart type when a client has
    indicated that it can accept them.  It might be worth an explicit
    statement to that effect.

4.3.1 Date Header Field:

    (a) [nit] should refer to RFC 1123 rather than 822, or both?

    (b) It's not clear what time the header should refer to.  For a
    response, is it the time when the request was accepted, or when the
    response line was generated, or when the first line of the header
    was transmitted, or when the 'Date:' header line was generated?

    (c) [nit] 'of' is missing between 'creation date' and 'the
    enclosed'.

4.3.3 Message-ID: [suggestion] Although the example shows a unique ID,
    it might be nice to encourage via the example a form of ID that
    includes the port number (if not 80), and even follows URL format.
    Perhaps:

      Message-ID: <http://info.cern.ch:8080/9411251630.4256>

4.3.4 MIME version: It is not at all clear why this is useful and
    strongly recommended if it is not an indication of full compliance
    with MIME.  One might argue that it should *not* be included unless
    MIME-compliance of the remainder of the header and data is
    guaranteed?

5.  Request: The 0.9 requirement here (Simple Request must have Simple
    Response) is somewhat onerous on a server.  Is it possible to relax
    or remove this requirement yet, or are there still 0.9-only clients
    in use?  I've noticed that at least some Simple Responses will not
    go through some proxies transparently.

5.2 Method and 5.2.2 Head: I'm surprised that the HEAD method *must* be
    supported, as it is ill-defined.  5.2.2 simply says that there must
    be no Object-Body; it seems that the header may or may not be
    related to the header that would be sent if the method were GET, and
    in particular, HEAD may as well just return an empty (null) header.

    Further, in many cases the cost of determining, building and sending
    the header is going to be the major part of many transactions, so
    should clients or proxies be encouraged to use this Method?

5.2.1 Get: [nit] The first paragraph should have the suffix: "(unless
    that is the produced data)."

5.2.3 Post:

    (a) Some clarification seems to be needed here; there's an
    assumption that Form data is used by some gateway program rather
    than the server/script directly, but in the latter case the
    specification (the paragraph starting "If the URI does not refer to
    a gateway...") implies that the Form data must be retrievable at
    some later date.

    (b) Can the URI returned via a URI-header be a partial URI as
    described in 5.4?  Or does it have to be a full URI?  [I infer the
    latter.]

5.2.3.1 [nit] Change 'references' to 'refers to'?

5.3 HTTP Version: Given that this definition is more rigorous than
    earlier documents, and hence must be more constraining, it would
    seem to be necessary to change the version number (perhaps to 1.1)
    to reflect the stricter conditions for compliance.  If the version
    number is not changed, then the date of the relevant HTTP 1.0
    document would have to be specified at every reference.

5.4 URI Note 1: What's 'default escaping'?  What characters may be
    'considered unsafe'?  The "should" (probably meant to be "shall"?)
    implies that a server must comply with these conditions, but they do
    not seem to be well defined.

5.5.2 If-Modified-Since: This section implies that servers *must*
    implement this feature.  However, the last-modified-date might be
    unavailable, unreliable, or not applicable for some URIs.  In these
    cases (or indeed in any case), is the server permitted to return the
    object, despite the presence of the I-M-S header?

5.5.4 Authorization: UU-encoding: if defined by RFC 1421, then this
    should appear in Section 13 (References), and Appendix 15 should go
    away (as it does not appear to apply, in any case)?

6.3 Status Codes and Reason Phrases: The rule for Reason-Phrase does not
    allow spaces, but several of the phrases specified later do include
    a space.

6.3.1 201 Created: Also possible following PUT, presumably.

6.3.1 202 Accepted: "delay header line" is what?

6.3.1 204 No Response: Allowed for POST, too?  (For a Form.)

6.3.2 301 & 302 Moved: Allowed for POST, and others, too?

6.3.3 401 Unauthorized: [nit] change first 'a' to 'an'.

6.3.3 404 Method Not Allowed: is this only for the defined methods, or
    should this also be used for a misspelled or unrecognized method
    name?

6.3.4 500 Server Error: 502 (twice) and 504 call this "500 Internal
    Error".  All should say "Server Error"?

6.3.4 503 Service Unavailable: Does this imply that a server is not
    permitted to refuse to accept a connection?  [Presumably not, though
    it could be read that way.]

6.4 paragraph 1: [nit] change 'a Object-Body' to 'an ...'

6.4.2 Version: since this refers to an object and not the server,
    shouldn't it be in section 7?

7.2 Content-Length Note 2: [nit] 'wherever' has only three 'e's.

7.4 Content Encoding:

    (a) [nit] this heading (and 7.5 too) needs a hyphen.

    (b) [nit] change 'method' to 'mechanism' in the first paragraph to
    avoid confusion with use of the term elsewhere?

7.5 C-T-E: The rule omits the token and colon before the type.

7.7 Expires: Are there any constraints on the Date and Time specified?
    Specifically, may they refer to a time earlier than or the same as
    that in the Date: header?

7.9 URI First example: [nit] Change close quote to open quote.

7.9 URI Second example: [nit] Semicolon missing.

7.12 Title: "isomorphic" here implies that the Title follows SGML
    syntax, and hence depends on the HTML DTD (including the
    Declaration), and is allowed any valid entities and shortrefs
    within, etc.  This probably isn't intended (I hope!).

7.13 Link: [nit] both examples have unmatched quotes.  '//' missing
    after 'mailto:'?

8 Neg. algorithm para 4: [nit] change 'between 0 and 1' to 'in the range
    0 through 1'?  (0 and 1 are allowed values.)

8 Neg. algorithm 'bs' definition: [nit] change 'send' to 'sent'

9 Authentication, paragraph 1: Here is perhaps the strongest statement
    in the document about conformance.  Yet, surely, if the server would
    never return "401 Unauthorized" (because all its data are public)
    there is no need for it to implement the Basic Access Authentication
    Scheme?

9 Authentication, fourth bullet: [nit] change second 'a' to 'an'.

11.3 Abuse, para 1: While I strongly support the intent behind the last
    sentence here, this document is a definition of the HTTP protocol,
    not people using it.  It cannot impose requirements on, or define,
    people.  (Does my server become non-conforming because someone using
    my server abused his or her collected data?)  Also, am I (as the
    writer, and hence provider, of a server) responsible for the actions
    of other people *using* my server to provide data?  Thin ice...

    Perhaps it should read something like: "People using the HTTP
    protocol to provide data are responsible for...".

11.3 Abuse, final para: This reads as though the user must be prompted
    with the From field to be sent before sending every request.
    Probably not the intent.

16  Server Tolerance, para 2: Time to make this a Requirement?

17  Bad servers: The first paragraph here sounds like a Compliance
    statement.  As such, it should be in the body of the document, not an
    appendix?   The document certainly needs a Compliance section.

17.1 Back compatibility, para 2: This doesn't seem to reflect current
    practice (inline <img href="xxx"> requests for .GIF files do seem to
    appear as images, not as HTML documents).

mfc/29 Nov 1994
Received on Tuesday, 29 November 1994 08:16:57 UTC