Re: Comments on HTTP draft [of 23 Nov 1994] from Roy T. Fielding on 1994-11-30 (ietf-http-wg@w3.org from October to December 1994)

From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
Date: Wed, 30 Nov 1994 04:43:07 -0800
To: Mike Cowlishaw <mfc@vnet.ibm.com>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9411300443.aa27136@paris.ics.uci.edu>
> I was very pleased to see the new HTTP draft; it's a major improvement
> on previous versions!  Here are some comments on the new draft which I
> hope will be useful.

Thanks, these comments are definitely useful.  Unfortunately, they refer
to the pre-Internet-Draft version, so I'll try to point out any differences
in the section numbering to avoid confusion.

> I am writing these from the perspective of an implementer of software
> for a reasonably general-purpose HTTP server, so I am especially looking
> for a definition of the HTTP which
> 
>  a) allows a server to be implemented using the definition (and
>     referenced documents) and specifically without reference to specific
>     clients or other implementations

That's the ideal case, yes, but there is still no substitute for
experimentation.  In fact, I was tempted to put in a section on "how to
experiment with the protocol", but had no time for that.  Maybe I should
write a book. ;-)

>  b) makes it absolutely clear what is required for a server to be stated
>     as conforming to the definition.

Yes.  Unfortunately (or fortunately, depending on how you look at it),
almost all of the protocol is optional, and thus all the strict conformance
areas must be surrounded by lots of IFs, HOWEVERs, and UNLESSes.
But, getting it right is still the goal.

> Most of these comments therefore seek clarification in these two areas.
> 
> I'm sorry I won't be able to attend the IETF meeting next week--a
> long-standing commitment has me on the wrong coast of the USA.
> 
> Mike Cowlishaw
> IBM Fellow, IBM UK Laboratories, Winchester, UK
> 
> - - - - - - - - -
> 
> 2.1 The '#' rule implies that no whitespace is allowed after (or before)
>     the commas in a list.  Is this correct?  For example, in the example
>     in 7.1 there is a space after each comma (which certainly aids
>     readability).

One major problem with the augmented BNF used by RFC822 (and MIME)
is that it allows any amount of linear white space between tokens,
but never specifies that in the rule definitions (it is treated as
a general assumption for all rules).  I can understand why -- it is
much easier to read the rules without the extra clutter -- but it runs
against my desire for formality.

What do people think?  Define a new rule, e.g.

    ESP = 1*( [CRLF] LWSP-char )

and insert it everywhere that zero or more linear-white-space is allowed?
Or, just stick with the current method but add an explanation in 2.1.

> 2.2 linear-white-space rule: I didn't understand the comment "CRLF =>
>     Folding".  I think this rule allows whitespace (but non-null) lines
>     in headers etc.?

The comments will be clarified.

> 3.1 Header fields:
> 
>     (a) "However ... use of comments is discouraged".  This seems rather
>     outside the scope of a definition such as this; at most it should be
>     a informational note, and explain why the note is there (historical
>     or client incompatibility, performance, reduced net traffic?).

Yep, will do.

>     (b) [nit] the second open quote in the comment rule should be a
>     close quote.

That's a bug in FrameMaker's smart-quote feature.  I really should replace
all the smart-quotes in the BNF with normal double-quotes.

>     (c) The ctext rule seems to be missing some characters (there's an
>     open quote, followed by an open single quote, but neither is
>     closed).  Also, shouldn't LF be excluded too?

Hmmm... should comments be allowed to fold (i.e. extend over multiple lines)?
If so, then CR should not be excluded.  If not, then LF needs to be excluded.

> 3.2 Object body:
> 
>     (a) This is my 'biggest' question -- I don't understand from the
>     second paragraph how to determine when to stop reading the data on a
>     request.  If the headers are only 'similar' to those defined by
>     MIME, then the MIME definition may or may not be relevant.
>     Moreover, a "heuristic function of the Content-Type and
>     Content-Encoding" would appear to be unimplementable, as new Types
>     (especially application/xxx) seem to spring up daily.

Henrik asked me to explain that as well, but I ran out of time.
It goes something like this:

   a) If message includes Content-Length, use it.

   b) If message uses an as-yet-undefined packetized Content-Transfer-Encoding,
      then that encoding may define an EOF marker.

   c) If message uses an as-yet-undefined packetized Content-Encoding,
      then that encoding may define an EOF marker.

   d) If message is of type multipart/*, the effective object body ends
      when the boundary close-delimiter is reached.

   e) If the connection closes, the object body has ended.

Part (b) is along the lines of Dan Connolly's www-talk proposal of 27 Sep 1994
(Message-Id: <9409271503.AA27488@austin2.hal.com>).

Question for HTTP/1.1: Should we attempt to extend MIME C-T-E to be something
useful, or just change Content-Encoding to be an ordered list of encodings
rather than the current single-token?

>     It would seem to be appropriate that the HTTP protocol specify that
>     Content-Length, in bytes, be Required--at least for Requests.

I'd rather not go that far, but we may have to for HTTP/1.0.

>     (b) Does the server have to read the headers (and data, if any) on a
>     request?  For example, if the customizing filter/script doesn't need
>     the information in order to determine the response, is the server
>     permitted to leave the data unread, or could this embarrass some
>     client(s) or TCP/IP stack(s)?

The Request-Headers must be read, though implementations may want to optimize
and assume that if the Request-Headers have not completed within the first
read buffer, then the rest are not important.  Since that is only a problem
for SOME BROKEN CLIENTS, most servers could successfully pretend to be
conformant while implementing the optimization.  Obviously, if there is
any data in the request, the server had better read it (unless the request
line has already errored-out).

Leaving request data unread will not be a problem for the client, though
it may cause a problem for the server.  Simon could explain that better
than I.

> 4.1 Date/Time stamps:
> 
>     (a) I'm a little disturbed that time is only permitted to be
>     specified to the second, given that most server hardware will be
>     able to handle many more than one request per second, and network
>     transit time is often sub-second.  If this is a limitation of RFC
>     1123, perhaps an additional HTTP header for sub-second time
>     information should be specified.

Not a limitation of 1123, but a limitation of many operating system
date routines.  Besides, there is no useful purpose for that level
of granularity in an application-level protocol.

>     (b) Since this is a new standard/document, surely it should specify
>     a single Date/Time format, and only mention the others for
>     compatibility/historical information?

The document has a bit of history associated with it, but perhaps a stronger
statement should be made.  I didn't want to be too forceful, however, since
I am biased about the date format and wanted to test the water first.

>     (c) [aside] I wish, oh wish, that Longitude/Latitude information
>     were a recommended header.

[aside] it's too inefficient to include that in every response.  A standard
URL for site info is more appropriate.

> 4.2 Multipart types: From the text, I infer that a server/script is
>     not Required to respond with a multipart type when a client has
>     indicated that it can accept them.  It might be worth an explicit
>     statement to that effect.

Hmmm, I guess that sentence (4.2.1) could be misinterpreted.  I'll clarify it.

> 4.3.1 Date Header Field:
> 
>     (a) [nit] should refer to RFC 1123 rather than 822, or both?

No, the semantics are defined in RFC 822 -- RFC 1123 just updates the format
(and other things unrelated to origin-date).

>     (b) It's not clear what time the header should refer to.  For a
>     response, is it the time when the request was accepted, or when the
>     response line was generated, or when the first line of the header
>     was transmitted, or when the 'Date:' header line was generated?

That is implementation-specific and, in any case, is assumed to be
all within the same second (+/- 1).  In theory, it is the moment just before
the status/request-line is generated (i.e. it is the time at which the 
origin made the determination of what the request/response should be).
In practice, it makes no difference.

>     (c) [nit] 'of' is missing between 'creation date' and 'the
>     enclosed'.

AFID (Already fixed in the I-D)

> 4.3.3 Message-ID: [suggestion] Although the example shows a unique ID,
>     it might be nice to encourage via the example a form of ID that
>     includes the port number (if not 80), and even follows URL format.
>     Perhaps:
> 
>       Message-ID: <http://info.cern.ch:8080/9411251630.4256>

The above is not a valid syntax for Message-IDs (no "@").  Even with it,
the primary purpose of the Message-ID is to enable messages (particularly
POSTs) via gateways.  Using the same format as many (most?) E-mail and
USENET postal services can be very handy.

> 4.3.4 MIME version: It is not at all clear why this is useful and
>     strongly recommended if it is not an indication of full compliance
>     with MIME.  One might argue that it should *not* be included unless
>     MIME-compliance of the remainder of the header and data is
>     guaranteed?

I share that opinion, but then we run into legacy issues.

> 5.  Request: The 0.9 requirement here (Simple Request must have Simple
>     Response) is somewhat onerous on a server.  Is it possible to relax
>     or remove this requirement yet, or are there still 0.9-only clients
>     in use?  I've noticed that at least some Simple Responses will not
>     go through some proxies transparently.

I believe proxies require Full-Requests.  Ari?
I do not believe it is possible to remove that requirement, though perhaps
we should require that HTTP/1.0 clients never generate a Simple-Request.

> 5.2 Method and 5.2.2 Head: I'm surprised that the HEAD method *must* be
>     supported, as it is ill-defined.  5.2.2 simply says that there must
>     be no Object-Body; it seems that the header may or may not be
>     related to the header that would be sent if the method were GET, and
>     in particular, HEAD may as well just return an empty (null) header.

The definition of all the request methods need fleshing-out, but support
for HEAD will still be required.

>     Further, in many cases the cost of determining, building and sending
>     the header is going to be the major part of many transactions, so
>     should clients or proxies be encouraged to use this Method?

That only reflects the cost for the server.  The cost to the rest of the
network is significantly less, and that is what is important here.

> 5.2.1 Get: [nit] The first paragraph should have the suffix: "(unless
>     that is the produced data)."

Hmmmmmmmmmmm.......I'll think of some other way to clarify it.

> 5.2.3 Post:
> 
>     (a) Some clarification seems to be needed here; there's an
>     assumption that Form data is used by some gateway program rather
>     than the server/script directly, but in the latter case the
>     specification (the paragraph starting "If the URI does not refer to
>     a gateway...") implies that the Form data must be retrievable at
>     some later date.

I think the confusion lies in the term "gateway", which in this case is
referring to a mail or USENET gateway, rather than a CGI script. 

>     (b) Can the URI returned via a URI-header be a partial URI as
>     described in 5.4?  Or does it have to be a full URI?  [I infer the
>     latter.]

Many people have requested the former, but I have yet to see an
implemented example of it.  Now that I have separated the URI-header
definition from Location, we could define it as being allowed for
URI but not for Location.

> 5.2.3.1 [nit] Change 'references' to 'refers to'?

okay

> 5.3 HTTP Version: Given that this definition is more rigorous than
>     earlier documents, and hence must be more constraining, it would
>     seem to be necessary to change the version number (perhaps to 1.1)
>     to reflect the stricter conditions for compliance.  If the version
>     number is not changed, then the date of the relevant HTTP 1.0
>     document would have to be specified at every reference.

That is a philosophical issue that will be discussed in San Jose.
If necessary, a subset of this spec will be assigned "HTTP/1.0" and
work will continue on HTTP/1.1.  That decision will have to be made
eventually, but not right away.

> 5.4 URI Note 1: What's 'default escaping'?  What characters may be
>     'considered unsafe'?  The "should" (probably meant to be "shall"?)
>     implies that a server must comply with these conditions, but they do
>     not seem to be well defined.

This section needs work, and may end up as separate sections for
describing HTTP URLs, relative URLs, and URI's.

> 5.5.2 If-Modified-Since: This section implies that servers *must*
>     implement this feature.  However, the last-modified-date might be
>     unavailable, unreliable, or not applicable for some URIs.  In these
>     cases (or indeed in any case), is the server permitted to return the
>     object, despite the presence of the I-M-S header?

Yes, but proper implementation (where appropriate) of the conditional
GET protocol will be strongly recommended and required for all new servers.

> 5.5.4 Authorization: UU-encoding: if defined by RFC 1421, then this
>     should appear in Section 13 (References), and Appendix 15 should go
>     away (as it does not appear to apply, in any case)?

The reference to 1421 was removed from the I-D version, as uuencoding
does not appear to have anything to do with that RFC.  However, I agree
that what is now appendix A should be rewritten to specify the uuencoding
of a string, not a file.

> 6.3 Status Codes and Reason Phrases: The rule for Reason-Phrase does not
>     allow spaces, but several of the phrases specified later do include
>     a space.

Hmmmm, I could have sworn that I has changed "1*token" to "phrase".
In any case, the augmented BNF allows spaces between any tokens.

> 6.3.1 201 Created: Also possible following PUT, presumably.

AFID

> 6.3.1 202 Accepted: "delay header line" is what?

AFID

> 6.3.1 204 No Response: Allowed for POST, too?  (For a Form.)

AFID

> 6.3.2 301 & 302 Moved: Allowed for POST, and others, too?

AFID

> 6.3.3 401 Unauthorized: [nit] change first 'a' to 'an'.

AFID

> 6.3.3 404 Method Not Allowed: is this only for the defined methods, or
>     should this also be used for a misspelled or unrecognized method
>     name?

Only for methods implemented by the server, but not allowed for the object
requested.

> 6.3.4 500 Server Error: 502 (twice) and 504 call this "500 Internal
>     Error".  All should say "Server Error"?

doh!  Actually, I think I'll change it again to "Server Internal Error".
BTW, these are only recommended phrases, and may even be translated to
regional (human) languages, if desired.

> 6.3.4 503 Service Unavailable: Does this imply that a server is not
>     permitted to refuse to accept a connection?  [Presumably not, though
>     it could be read that way.]

No such implication was intended.

> 6.4 paragraph 1: [nit] change 'a Object-Body' to 'an ...'

AFID

> 6.4.2 Version: since this refers to an object and not the server,
>     shouldn't it be in section 7?

AFID -- wow, your version must be at least two days old.  ;-)

> 7.2 Content-Length Note 2: [nit] 'wherever' has only three 'e's.

doh!

> 7.4 Content Encoding:
> 
>     (a) [nit] this heading (and 7.5 too) needs a hyphen.

AFID

>     (b) [nit] change 'method' to 'mechanism' in the first paragraph to
>     avoid confusion with use of the term elsewhere?

AFID

> 7.5 C-T-E: The rule omits the token and colon before the type.

AFID

> 7.7 Expires: Are there any constraints on the Date and Time specified?
>     Specifically, may they refer to a time earlier than or the same as
>     that in the Date: header?

No constraints.

> 7.9 URI First example: [nit] Change close quote to open quote.

AFID

> 7.9 URI Second example: [nit] Semicolon missing.

AFID

> 7.12 Title: "isomorphic" here implies that the Title follows SGML
>     syntax, and hence depends on the HTML DTD (including the
>     Declaration), and is allowed any valid entities and shortrefs
>     within, etc.  This probably isn't intended (I hope!).

Yes it is, though no translation is performed by the sender -- it is
assumed to have already been performed (or just ignored) by whatever
process is getting the metainfo.

> 7.13 Link: [nit] both examples have unmatched quotes.  '//' missing
>     after 'mailto:'?

AFID, and there is no "//" in mailto URIs

> 8 Neg. algorithm para 4: [nit] change 'between 0 and 1' to 'in the range
>     0 through 1'?  (0 and 1 are allowed values.)

okay

> 8 Neg. algorithm 'bs' definition: [nit] change 'send' to 'sent'

okay

> 9 Authentication, paragraph 1: Here is perhaps the strongest statement
>     in the document about conformance.  Yet, surely, if the server would
>     never return "401 Unauthorized" (because all its data are public)
>     there is no need for it to implement the Basic Access Authentication
>     Scheme?

Right, the requirement should only apply to user agents.

> 9 Authentication, fourth bullet: [nit] change second 'a' to 'an'.

AFID

> 11.3 Abuse, para 1: While I strongly support the intent behind the last
>     sentence here, this document is a definition of the HTTP protocol,
>     not people using it.  It cannot impose requirements on, or define,
>     people.  (Does my server become non-conforming because someone using
>     my server abused his or her collected data?)  Also, am I (as the
>     writer, and hence provider, of a server) responsible for the actions
>     of other people *using* my server to provide data?  Thin ice...
> 
>     Perhaps it should read something like: "People using the HTTP
>     protocol to provide data are responsible for...".

okay

> 11.3 Abuse, final para: This reads as though the user must be prompted
>     with the From field to be sent before sending every request.
>     Probably not the intent.

will be clarified.

> 16  Server Tolerance, para 2: Time to make this a Requirement?

Nope.

> 17  Bad servers: The first paragraph here sounds like a Compliance
>     statement.  As such, it should be in the body of the document, not an
>     appendix?   The document certainly needs a Compliance section.

Hmmmm, seems a bit overboard to me.  Henrik?

> 17.1 Back compatibility, para 2: This doesn't seem to reflect current
>     practice (inline <img href="xxx"> requests for .GIF files do seem to
>     appear as images, not as HTML documents).

This applies only to unexpected Simple-Responses, but perhaps should
not require any default and just allow the client to use a heuristic.
In any case, it doesn't belong in the appendix.

Thanks again for the comments,


......Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                                     <fielding@ics.uci.edu>
                     <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Received on Wednesday, 30 November 1994 04:49:35 UTC