Comments on draft-ietf-http-v10-spec-00.txt from Rich Salz on 1995-05-26 (ietf-http-wg@w3.org from April to June 1995)

From: Rich Salz <rsalz@osf.org>
Date: Fri, 26 May 1995 01:09:19 -0400
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9505260509.AA10435@sulphur.osf.org>
I recently finished reading the HTTP V10 spec and have a number of comments
and questions.  (Interesting trivia:  this message is one-sixth the size
of the Draft10 document.)

Overall:  I found the document hard to read.  I think this is primarily
because of the way it is structured but I'm not sure.  It seems to me
that such a simple request/response protocol (with only one transaction
per session!) should be easier to understand in a straight read-forward
mode.  I wish I had more constructive comments to make here.

>1.2  Overall Operation
>   In any case, the closing of the connection by
>   either or both parties always terminates the current request,
>   regardless of its status.
Has any thought been given to handling more than one exchange per sesssion?
Are there common extensions for doing this?  (I know that Content-Length
becomes much more important :-).

>2.1  Augmented BNF
>   N rule
Shouldn't this be "<N> rule" or "<> rule"?

>   implied *LWS
>       However, applications should attempt to follow "common form"
Definition of "common form"? Also, do you REALLY want to treat
		Date   :  xxx
    and		Date:xxx
    and		Date :xxx
as legal headers?  I think that "no space before the colon and at least
one space after the colon" is a better rule and matches current practice..

>3.1  HTTP Version
>   This document defines both the 0.9 and 1.0 versions of the HTTP
>   protocol. Applications sending Full-Request or Full-Response
>   messages, as defined by this specification, must include an
>   HTTP-Version of "HTTP/1.0".
Can I claim to be a conformant server if I don't support 0.9? Do the
examples below, and the text in this section, follow the IETF standard
use of the terms "may" "should" and "must"?

>   HTTP servers are required to be able to recognize the format of the
                 ^^^^^^^^^^^^ must
There are probably other places where the formal/official words should
be used.

>4.2  Message Headers
>   HTTP header fields, which include General-Header (Section 4.3),
>   Request-Header (Section 5.4), Response-Header (Section 6.3), and
>   Entity-Header (Section 7.1) fields, follow the same generic format
I think the General-Header Request-Header and Response-Header differentiations
are too confusing.  I would leave it as Protocol-Header and Entity-Header.
This should simplify the structure of the document, making it more readable.

>   as that given in Section 3.1 of RFC 822 [8]. Each header field
>   consists of a name followed by a colon (":") and the field value.
See comment above about "Date :" vs. "Date:".  As I read this text,
it violates the EBNF.

>   The field value may be preceded by any amount of LWS, though a
>   single SP is preferred. Header fields can be extended over multiple
>   lines by preceding each extra line with at least one LWS.
             ^^^^^^^^^ starting

>       HTTP-header    = field-name ":" [ field-value ] CRLF
>       field-value    = *( field-content | comment | LWS )
Oh hell.  It it really absolutely necessary to support nested comments?
The only argument for them that I can see is email gateways.  However,
the specification here doesn't support the full comment syntax provided
by 822, such as
	From: "Richard E. Salz" rsalz@osf.org (INN author)
What would break if nested comments -- or perhaps comments altogether --
were dropped?

>4.3  General Message Header Fields
>       General-Header = Date
>                      | Forwarded
>                      | Message-ID
>                      | MIME-Version
>                      | extension-header	; forward section reference
Since extension-header isn't defined in this section need a forward reference.

>4.3.2 Forwarded
>       Forwarded      = "Forwarded" ":" "by" URI [ "(" product ")" ]
>                        [ "for" FQDN ]
Need forward reference for product, for same reason.

>5.1  Request-Line
>       Request-Line   = Method SP Request-URI SP HTTP-Version CRLF
Need to clarify this is an exception to implied LWSP rule and
forward reference to appendix.

>5.2  Method
>   The Method token indicates the method to be performed on the
>   resource identified by the Request-URI. The method is case-
>   sensitive and extensible.
Delete "and extensible".  case-sensitive!?? CASE-SENSITIVE?  WHY? This is
stupid and should be changed.  Are there any servers that distinguish
between get Get and GET?

>   In order to maintain compatibility, the semantic
>   definition for extension methods should be registered with the
>   IANA [17].
We should guarantee that no standard method will ever start with X or
perhaps X-.

>5.2.1 GET
>   The semantics of the GET method changes to a "conditional GET" if
>   the request message includes an If-Modified-Since header field. A
                                                     ^^^^^^^^^^^^^
						     request header.
(See what I mean about not making the distinction? :-)

>      c)   If the resource has not been modified since the If-Modified-
>           Since date, the server shall return a "304 Not Modified"
                                                   ^^^^^^^^^^^^^^^^
As the text after the response code is not formally specified.  This
should say "return a '304' response."  I didn't mark all the places
where this type of change should be made, but it should be done
throughout the document.

>5.2.3 POST
>      o Providing a block of data (usually a form) to a data-handling
                                    ^^^^^^^such as
>        process;

>   including a URI-header field in the request. However, the server
>   should treat that URI as advisory only and may store the entity
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    may treat the URI as advisory and
>   under a different URI or without any URI.

>5.2.4 PUT
>   The fundamental difference between the POST and PUT requests is
>   ...
>   .... The requestor of a
>   PUT knows what URI is intended and the receiver must not attempt to
>   apply the request to some other resource. If the receiver desires
>   that the request be applied to a different URI, it must send a
>   "301 Moved Permanently" response; the requestor may then make its
>   own decision regarding whether or not to redirect the request.
Replace requester/receiver with client/server.  There are probably other
places in the document that need to have the same (or similar) change
made.  Use one terminology.

>   The actual method for determining how the resource is placed, and
>   what happens to its predecessor, is defined entirely by the origin
>   server. If version control is implemented by the origin server
Sidebar:  Has anyone ever integrated RCS into their server?

>5.4.1 Accept
>   The field may be folded onto several lines and more than one
>   occurrence of the field is allowed (with the semantics being the
>   same as if all the entries had been in one field value).
Oh, ick.  Is this the only header that has this concat rule?
(Does the spec say anywhere what to do if presented with, e.g., two
If-Modified-Since headers?)

Also a document meta-comment:  use parentheses only to mark off additional
information that can safely be ignored.  They are not interchangeable
with commas.  The "(with the semantics..." part above, for example, should
most definitely NOT be in parens.  Some text, of course, that isn't in
parens could be (but that's probably too much work).

>5.4.2 Accept-Charset
>       However, they must be able to interpret their meaning to
>       whatever extent is required to properly handle messages in
>       that character set..
                          ^^ .

>5.4.3 Accept-Encoding
>   The Accept-Encoding header field is similar to Accept, but lists
>   the encoding-mechanisms and transfer-encoding values which are
>   acceptable in the response.
Is there a definition or registry anywhere?

>   The field value should never include the identity transfer-encoding
>   values ("7bit", "8bit", and "binary") since they actually represent
           ^                            ^ remove these

>   If no Accept-Encoding field is present in a request,
>   it must be assumed that the client does not accept any encoding-
>   mechanism and only the identity transfer-encodings.
I don't understand this sentence.

>5.4.6 From
>    The address should, if
>   possible, be a valid Internet e-mail address, whether or not it is
>   in fact an Internet e-mail address or the Internet e-mail
>   representation of an address on some other mail system.
I don't understand this sentence.

>5.4.7 If-Modified-Since
Last sentence of last paragraph in this section can be put in parens.
And should. :-)

>5.4.8 Pragma
>   Although multiple pragma directives can be listed as part of the
>   request, HTTP/1.0 only defines semantics for the
             ^^^^^^^^ this document
Can additional pragamas be defined with IANA and still call it HTTP/1.0?

>       pragma-directive = "no-cache" | extension-pragma
Has anyone given thought to token@host or token/host as a way of transiting
multiple proxies?

>   insist upon receiving an authoritative response to its request. It
>   also allows a client to refresh a cached copy which has become
                                                        ^^^^^^^^^^
>   corrupted or is known to be stale.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	"is known to be corrupted or stale."

>5.4.9 Referer
>       Note: Because the source of a link may be considered private
>       information or may reveal an otherwise secure information
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nope.  If the source is secure then revealing its name can't be a
compromise.  Change secure to unpublicized or private.

>5.4.10 User-Agent
>   Product tokens should be short and to the point -- use of this field
>   for advertizing or other non-essential information is explicitly
>   deprecated and will be considered as non-conformance to the
You cannot both deprecate and declare it non-conforming.  Last sentence
of this paragraph can be put in parens.

>6.1  Status-Line
>   (e.g. "HTTP/1.0"), the presence of that expression is considered
                                                          ^^^^^^^^^^
Delete that word

>6.2  Status Codes and Reason Phrases
Remove "303" since it is not defined by this standard.  Perhaps a "Note:"
that it is obsolete.  Same thing, I think, for 402.

>       Reason-Phrase  = token *( SP token )
Why can't this be "text"?

>   202 Accepted
>
>   The request has been accepted for processing, but the processing
>   has not been completed. The request may or may not eventually be
>   acted upon, as it may be disallowed when processing actually takes
>   place. There is no facility for re-sending a status code from an
                                    ^^^ remove the "re-" part.

>   The "202 Accepted" response is intentionally non-committal. Its
>   ...
>   estimate of when the user can expect the request to be enacted.
                                                           ^^^^^^^
							   fulfilled
>   204 No Content
I would like to see an example of when this would be used -- I'm confused
how this compares with 304.

>6.2.2 Redirection 3xx
>   only takes place if the method used in the request is idempotent
>   (GET or HEAD).
Is it a requirement that GET be idempotent?  Obviously not, else there would
be no need for if-modified-since.

>   301 Moved Permanently
>      o Following:        any method
>      o Required headers: URI-header, Location
Would like a forward reference for Location header.  I got really
confused (my first notes said to delete that word :)

>6.2.3 Client Error 4xx
>   The 4xx class of status codes is intended for cases in which the
>   client seems to have erred. If the client has not completed the
>   request when a 4xx code is received
Hunh?  Asynchronous?  I though this was a lock-step request/response
protocol.

>   402 Payment Required
Like 304, remove from the spec.

>   403 Forbidden
>   The request is forbidden because of some reason that remains
>   unknown to the client.
	"...because of an unspecified reason."

>   408 Request Timeout
Any guidelines for how long timeouts should be?

>6.2.4 Server Errors 5xx
>    If the client has not completed the request
>   when a 5xx code is received
Some asynchronous comment/query as above.

>   502 Bad Gateway
>   The server received an invalid response from the gateway or
>   upstream server it accessed in attempting to complete the request.
                                                 ^^^^^^^^fulfill
>6.3.3 Server
>       Server         = "Server" ":" 1*( product )
Again, why can't this just be a text field? Also shouldn't it be "Server:"
?  (See my questions/comments on header/whitespace syntax above.)

>7.1  Entity Header Fields
>       examples of this. Base will be used in future versions of
                               ^^^^may
>       HTTP.
Can't predict the future. :)

>   If a response passes through a proxy which does not understand one
>   or more of the methods indicated in the Allow header, the proxy
>   must not modify the Allow header; the user agent may have other
>   means of communicating with the origin server.
"A proxy must not modify the allow header even if it does not understand
all the methods specified as the client may have other..."

>7.1.3 Content-Language
>   language. Thus, if the body content is intended only for a Danish
>   audience, the appropriate field is
A Danish-speaking audience.

>7.1.4 Content-Length
>   The Content-Length header field indicates the size of the Entity-
>   Body (in decimal number of octets) sent to the recipient or, in the
>   case of the HEAD method, the size of the Entity-Body that would
>   have been sent had the request been a GET.
It's the length after any C-T-E's have been applied, right?

>7.1.5 Content-Transfer-Encoding
>    Gateways are the only HTTP applications that would
>   generate a CTE.
I'm not sure this is true.  Imagine two 64bit machines transferring
floating point numbers?

>7.1.7 Derived-From
>       Derived-From: 2.1.1
Turn that into 2.1.  Ignore the idea of branches and stick with
the simple, albeit complex-enough-already direct chain.

>   A longer example of version control is included in Appendix C.
Appendix E.

>7.1.8 Expires
>       Expires        = "Expires" ":" HTTP-date
Being able to put a date-delta would be useful.  Any reason why not?
(I'm thinking in particular of servers for export-controlled software.)

>       should be considered expired. Likewise, a value of zero (0)
>       or an invalid date format may be considered equivalent to an
>       "expires immediately."
May, should, or must?  Probably should.

>7.1.9 Last-Modified
>    In any case, the recipient should only know (and
>   care) about the result -- whatever gets stuck in the Last-Modified
>   field -- and not worry about how that result was obtained.
Delete this whole sentence.

>7.1.10 Link
>       Link: <mailto:timbl@w3.org>; rev="Made"; title="Tim Berners-Lee"
>   The first example indicates that the entity is previous to chapter2
>   in a logical navigation path. The second indicates that the
>   publisher of the resource is identified by the given e-mail address.

I don't understand the second example. Does "rev=Made" have some
semantics I don't know?  (If so, where defined?)  Or are the rev and
title superfluous here?

>7.2.1 Type
>       entity-body <-
>          Content-Transfer-Encoding( Content-Encoding( Content-Type ) )
Need to describe this syntax.   Is it algebraic f(x) notation?

>   The Content-Type header field has no default value. If and only if
>   the media type is not given by a Content-Type header (as is always
>   the case for Simple-Response messages), the receiver may attempt to
>   guess the media type via inspection of its content and/or the name
>   extension(s) of the URL used to access the resource. If the media
                                    ^^^^^^specify (access makes no
sense if its a PUT or POST).  What's a name extension?  Need a
definition.  (I assume you mean the ".gz" suffix)

>7.2.2 Length
>   When an Entity-Body is included with a message, the length of that
>   body may be determined in one of several ways. If a Content-Length
>   header field is present, its value in bytes (number of octets)
                                                ^^^^^^^^^^^^^^^^^^delete this
>   represents the length of the Entity-Body. Otherwise, the body
>   length is determined by the Content-Type (for types with an
>   explicit end-of-body delimiter), the Content-Transfer-Encoding (for
>   packetized encodings), or the closing of the connection by the
>   server. Note that the latter cannot be used to indicate the end of
>   a request body, since it leaves no possibility for the server to
>   send back a response.
This isn't true since in some network implementations I can get two
half-duplex descriptors for each direction of the client/server channel.
>       Note: Some older servers supply an invalid Content-Length
>       when sending a document that contains server-side includes
>       dynamically inserted into the data stream.
When sending a dynamic document.  (I.e., CGI has this same problem, right?)

>8.1  Media Types
>   HTTP uses Internet Media Types [15], formerly referred to as MIME
>   Content-Types [6], in order to provide open and extensible data
Put the "formerly referred to" phrase in paren's or delete it.

>8.4  Encoding Mechanisms
>   Encoding mechanism values are used to indicate an encoding
>   transformation that has been or can be applied to a resource.
>   Encoding mechanisms are primarily used to allow a document to be
>   compressed or encrypted without losing the identity of its
>   underlying media type. Typically, the resource is stored with this
                                                             ^^^^in
>   encoding and is only decoded before rendering or analogous usage.
                                ^^decoded by the client before display or
>8.5  Transfer Encodings
>   The "quoted-printable" and "base64" values indicate that the
>   associated encoding (as defined in MIME [6]) has been applied to
>   the body. These encodings consist entirely of 7-bit US-ASCII
                              ^^^^^^^^^^^^^^^^^^^result in
>   characters.

>9.  Content Negotiation
I'm a little leery of calling this negotiation since there is only one
step, no real "bargaining" going on.  Perhaps preferences or selection
is a better term.

>   Content negotiation is an optional feature of the HTTP protocol. It
>   is designed to allow for preemptive selection of a preferred
                             ^^^^^^^^^^
I don't understand the use of the word preemptive in this section.
Delete it?

>      mxb  The maximum number of bytes in the Entity-Body accepted by
>           the client. The default value is mxb=undefined
>           (i.e. infinity).
The maximum number of bytes in the Entity-Body that the client will
accept.  The default value is infinite.

>   by following an exact link) or of some type that would allow the
>   user agent to perform the selection automatically (no such type is
>   available at the time of this writing).
"(No such type is currently defined.)
A separate sentence, in paren's.  (Since reading the whole spec would
give you that info.)

>10.  Access Authentication
This section needs to be reviewed by security people to get
the terminology right, and consistent with other IETF documents (cf.,
GSSAPI):
    Instead of "scheme" use "mechanism"
    Use "method" or "style" instead of current use of "mechanism"
    Instead of "authorization space" use "protection domain"
Are auth-scheme and realm values case-sensitive?  auth-scheme should NOT be.

>       auth-scheme    = "Basic" | token
Shouldn't this be as follows?
       auth-scheme    = "Basic" | extension-scheme
       extension-scheme = token

>       challenge      = auth-scheme 1*LWS realm [ "," 1#auth-param ]
                                     ^^^^^delete this

>   The realm attribute is required for all access authentication
>   schemes which issue a challenge. The realm value, in combination
>   with the root URL of the server being accessed, defines the
>   authorization space. These realms allow the protected resources on
>   a server to be partitioned into a set of authorization spaces, each
>   with its own authentication scheme and/or database.
I cannot understand this.  Even after I try to substitute the right
terminology I just don't see what's being said.

>       credentials    = auth-scheme [ 1*LWS encoded-cookie ] #auth-param
Shouldn't this be written like this?
       credentials    = basic-credentials | extension-credentials
       extension-credentials = auth-scheme [ encoded-cookie ] #auth-param

>   Proxies must be completely transparent regarding user agent access
Delete the word access at the end of this line.

>   Authorization headers untouched. HTTP/1.0 does not provide a means
>   for a client to be authenticated with a proxy -- this feature will
                                                                  ^^^^may
>       Note: The names Proxy-Authenticate and Proxy-Authorization
>       have been suggested as headers, analogous to WWW-Authenticate
>       and Authorization, but applying only to the immediate
>       connection with a proxy.
This is a bad way of doing it since it doesn't allow for multiple proxies.

>10.1  Basic Authentication Scheme
There should be an appendix that shows the protocol transactions for a
client attempting to GET, beind told to authenticate, and then doing an
authenticated GET.

>       basic-cookie      = <base64 encoding of userid-password>
>       userid-password   = [ token ] ":" *text
I don't believe it.  Absolutely amazing.  The only thing the base64 does
is make it very difficult to test your server via telnet.  I think there
should be a "simple" (or is that Simple?) scheme that is exactly like
basic without the base64 gunk.

>   The basic authentication scheme is a non-secure method of filtering
                                         ^^^^^^^^^^weak
>   unauthorized access to resources on an HTTP server.

>11.2  Idempotent Methods
I think idempotent is being used incorrectly.

>   The writers of client software should be aware that the software
>   represents the user in their interactions over the net, and should
>   be careful to allow the user to be aware of any actions they may
>   take which may have an unexpected significance to themselves or
>   others.
This paragraph seems unrelated to the section heading.  Or is it just
the last text line that is significant?

>   In particular, the convention has been established that the GET and
>   HEAD methods should never have the significance of taking an
>   action. The link "click here to subscribe"--causing the reading of a
>   special "magic" document--is open to abuse by others making a link
>   "click here to see a pretty picture." These methods should be
>   considered "safe" and should not have side effects.
This last sentence is a repeat of the first one; delete it.

>11.3  Abuse of Server Log Information
>   information contained in the Referer. Even when the personal
>   information has been removed, the Referer field may have indicated
>   a secure document's URI, whose revelation itself would be a breach
      ^^^^^^private                                           ^^^^^^^^
>   of security.
    ^^^^^^^^^^^inappropriate.
That is, "a private document's URI whose publication would be inappropriate."
NAming an article CANNOT make it insecure.  Stronger mechanisms are
needed.

>   disable, enable, and modify the contents of the field. The user
>   must be able to set the active contents of this field within a user
                            ^^^^^^delete this word
>   preference or application defaults configuration.

I look forward to discussion based on these comments.
	/r$
Received on Thursday, 25 May 1995 22:15:56 UTC