WGLC review of p2-semantics (editorial stuff) from Dan Winship on 2012-10-30 (ietf-http-wg@w3.org from October to December 2012)

From: Dan Winship <dan.winship@gmail.com>
Date: Tue, 30 Oct 2012 07:19:46 -0400
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <508FB7D2.6070807@gmail.com>
> 3.1. Representation Metadata

>    | Expires           | Section 7.3 of [Part6] |

If "Expires" is considered "representation metadata", then it seems
like "ETag" and "Last-Modified" should be as well. But I think it
would make more sense to just remove "Expires" from the list; it's
clearly the odd man out here.



> 3.1.1.2. Character Encodings (charset)

>    Implementers need to be aware of IETF character set requirements
>    [RFC3629] [RFC2277].

It's not clear what requirements this is referring to; RFC 2277 places
requirements on protocol authors, not on implementors, and RFC 3629 is
just the definition of UTF-8. If the requirement is "implementations
MUST support UTF-8" then we should say that.



> 3.1.1.4. Multipart Types

>    In general, HTTP treats a multipart message body no differently than
>    any other media type: strictly as payload.  HTTP does not use the
>    multipart boundary as an indicator of message body length.  In all
>    other respects, an HTTP user agent SHOULD follow the same or similar
>    behavior as a MIME user agent would upon receipt of a multipart type.

That last part seems completely wrong; a web browser is not expected
to handle multipart/alternative or multipart/related in the way a mail
reader would. (This requirement came from RFC 2616, but... it was
wrong then too.)

>    The MIME header fields within each body-part of a multipart message
>    body do not have any significance to HTTP beyond that defined by
>    their MIME semantics.

This is not true of multipart/byteranges; in RFC 2616 that was
explained separately, but that explanation got lost in httpbis
rewrites at some point.

Suggested rewrite for the second and third paragraphs:

   In general, HTTP treats a multipart message body no differently
   than any other media type: strictly as payload.  The one exception
   is the "multipart/byteranges" type (Appendix A of [Part5]) when it
   appears in a 206 (Partial Content) response.  In all other cases,
   the MIME header fields within each body-part of a multipart message
   body do not have any significance at the HTTP level; they are
   just part of the representation data.

(This drops the newly-added "HTTP does not use the multipart boundary
as an indicator of message body length", but that is already implied
by the removal of 2616's prohibition on epilogue data; if the
multipart is allowed to have an epilogue, then the final boundary
doesn't indicate the end of the body anyway. It also drops the
"unrecognized multipart subtype" text, which was already irrelevant
given the "strictly as payload" rule anyway.)



> 3.1.3.1. Language Tags

>    In summary, a language tag is composed of one or more parts: A
>    primary language subtag followed by a possibly empty series of
>    subtags:
>
>      language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

Kinda weird... the text sets you up to expect an actual grammar for
language-tag, but then you just get a cross-reference. I'd rearrange
stuff to:

   ... HTTP uses language tags within the Accept-Language and
   Content-Language fields.

     language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

   A language tag is composed of one or more parts: A primary language
   subtag followed by a possibly empty series of subtags.  White space
   is not allowed within the tag and all tags are case-insensitive.
   Example tags include:

     en, en-US, es-419, az-Arab, x-pig-latin, man-Nkoo-GN

   See [RFC5646] for further information.

(also dropping the language-subtag-registry ref, since that's covered
by the "See [RFC5646]")



> 3.4. Content Negotiation

>    (such as when many different formats are supported by a user-agent),

no hyphen



> 3.4.1. Proactive Negotiation
>
>
>    If the selection of the best representation for a response is made by
>    an algorithm located at the server, it is called proactive
>    negotiation.

That text doesn't motivate the new name. How about:

   If the selection of the best representation for a response is made
   by the server based on preferences indicated by the user agent in its
   initial request for the resource, it is called proactive negotiation.

>    4.  It might limit a public cache's ability to use the same response
>        for multiple user's requests.

users' not user's

>    For example, the origin server might not implement proactive
>    negotiation, or it might decide that sending a response that doesn't
>    conform to them is better than sending a 406 (Not Acceptable)
>    response.

Not clear what "them" is. "...that doesn't conform to the user agent's
preferences..."



> 3.4.2. Reactive Negotiation

>    This specification defines the 300 (Multiple Choices) and 406 (Not
>    Acceptable) status codes for enabling reactive negotiation when the
>    server is unwilling or unable to provide a varying response using
>    proactive negotiation.

406 doesn't really "enable reactive negotiation". It just fails to do
proactive negotiation.

Also, should we mention how reactive negotiation is *actually* done?

   This specification defines the 300 (Multiple Choices) status code
   for enabling reactive negotiation. However, in practice, Web sites
   wanting to do reactive negotiation will just return a successful
   response containing a "default" (or proactively negotiated)
   representation of the resource, which includes within it links that
   the user can follow to reach other representations.



> 4. Product Tokens

>    By convention, the products are listed in order of their
>    significance for identifying the application.

"...in *decreasing* order of...", or something like that. (likewise in
the description of User-Agent in 6.5.3 and Server in 8.4.2)



> 5.2.2. Idempotent Methods

Section 6.2.2.1 of Part1 implies that the concept of "idempotent
sequences of request methods" (as opposed to merely "idempotent
methods") will be discussed here, but it's not. I'm not sure if it
should be added here or there.



> 5.3.1. GET

>    The semantics of the GET method change to a "partial GET" if the
>    request message includes a Range header field ([Part5]).

"a Range or If-Range header field"



> 5.3.6. CONNECT

Though obvious, it seems like for consistency's sake, this should end
with:

   Responses to the CONNECT method are not cacheable.



> 5.3.7. OPTIONS

>    If no payload body is included, the response MUST include a
>    Content-Length field with a field-value of "0".

Does this actually mean to prohibit servers from using chunked
encoding (or "Connection: close" with no Content-Length) in that case?
Or is it just supposed to be a reminder that "empty message body" is
different from "no message body"?

(Section 9.1.2 has basically the same text.)

>    If no Max-Forwards field is present in the request, then the
>    forwarded request MUST NOT include a Max-Forwards field.

"If no Max-Forwards field is present in the upstream request, then the
downstream request MUST NOT include a Max-Forwards field."



> 6.2. Conditionals

>    The HTTP/1.1 conditional request mechanisms are defined in
>    [Part4].

"and [Part5]" (If-Range)



> 6.3. Content Negotiation

6.1 and 6.2 had some introductory text before the table, and it seems
weird to not have that here.

(6.4 and 6.5 have the same problem)



> 6.3.1. Quality Values

Should this section be called "Weight" now?



> 6.3.5. Accept-Language

>    would mean: "I prefer Danish, but will accept British English and
>    other types of English". (see also Section 2.3 of [RFC4647])

Capitalize "See"



> 7. Response Status Codes

>    The status-code element is a 3-digit integer result code of the
>    attempt to understand and satisfy the request.

"...a 3-digit integer code giving the result of the attempt..."

>    o  2xx (Successful): The action was successfully received,
>       understood, and accepted

"The *request* was successfully..."



> 7.1. Overview of Status Codes

>    The reason phrases listed here are only recommendations -- they can
>    be replaced by local equivalents without affecting the protocol.

That suggests you can/should translate them into other languages,
which isn't really what they're for and kind of contradicts p1 3.1.2's
"A client SHOULD ignore the reason-phrase content."

>    | 415         | Unsupported Media Type       | Section 7.5.13       |
>    | 416         | Requested range not          | Section 3.2 of       |
>    |             | satisfiable                  | [Part5]              |
>    | 417         | Expectation Failed           | Section 7.5.14       |

The capitalization of "Requested range not satisfiable" is
inconsistent with the rest of the table.



> 7.2. Informational 1xx

>    A client MUST be prepared to accept one or more 1xx status responses
>    prior to a regular response, even if the client does not expect a 100
>    (Continue) status message.

No reason to call out 100 Continue specifically here... "A client MUST
be prepared to accept one or more 1xx status responses prior to a
regular response, even if the client does not expect one."



> 7.3.2. 201 Created

>    If the newly created resource's URI is the same as the Effective
>    Request URI, this information can be omitted

"effective request URI" is not capitalized like that anywhere else.
(Well, except for once more later on in this section which should also
be fixed.)

>    If the action cannot be carried out immediately, the server
>    SHOULD respond with 202 (Accepted) response instead.

"with *a* 202 (Accepted) response"



> 8.1.1.2. Date

>    1.  If the response status code is 100 (Continue) or 101 (Switching
>        Protocols), the response MAY include a Date header field, at the
>        server's option.

Is that really supposed to be limited to 100 and 101, and not other
1xx codes?



> 8.1.3. Retry-After

>    This field MAY also be used with any 3xx (Redirection) response
>    to indicate the minimum time the user-agent is asked to wait

No hyphen in "user agent"



> 8.4.1. Allow

>      Allow = #method

Should that be 1#method? If not, it should explain what an empty
"Allow" header means.



> 9.1.1. Procedure

>    HTTP method registrations MUST include the following fields:

Should "cacheability" be an explicit field (rather than just a
required part of the specification text)?



> 9.3. Header Field Registry

It seems weird to have this in p2 since p1 defines headers too...



> 9.3.1. Considerations for New Header Fields

>    o  Whether it is appropriate to list the field-name in the Connection
>       header field (i.e., if the header field is to be hop-by-hop, see
>       Section 6.1 of [Part1]).

should have a semicolon rather than comma after "hop-by-hop". (So that
it doesn't read like it's telling you to only follow the xref if the
header field is hop-by-hop.)



> 10.1. Transfer of Sensitive Information

>    Four header fields are worth special mention in this context:
>    Server, Via, Referer and From.

"Via" is in p1 though, so the Via bits should be moved to p1's
Security Considerations? (Or maybe if we end up with a p0, all of the
security considerations should be consolidated there.)

>    The information sent in the From field might conflict with the user's
>    privacy interests or their site's security policy, and hence it
>    SHOULD NOT be transmitted without the user being able to disable,
>    enable, and modify the contents of the field.  The user MUST be able
>    to set the contents of this field within a user preference or
>    application defaults configuration.

Do any browsers actually ever send the "From" header? If not, should
we just say "From is for robots, not browsers"?



> Appendix C. Changes from RFC 2616

>    Remove base URI setting semantics for "Content-Location" due to poor
>    implementation support, which was caused by too many broken servers
>    emitting bogus Content-Location header fields, and also the
>    potentially undesirable effect of potentially breaking relative links
>    in content-negotiated resources.  (Section 3.1.4.2)

That would parse better if the "which was..." clause was parenthesized
rather than just set off by commas.

>    Failed to consider that there are many other request methods that are
>    safe to automatically redirect, and further that the user agent is
>    able to make that determination based on the request method
>    semantics.

This is written in the opposite style from the rest of the list (it
describes the problem with 2616 rather than the solution in httpbis).
Should be something like:

   Allow automatic redirection of all "safe" methods, not just GET and
   HEAD, and give the user agent more latitude in redirecting unsafe
   methods. (Section 7.4)
Received on Tuesday, 30 October 2012 11:21:28 UTC