Re: WGLC review of p1-messaging (editorial stuff) from Julian Reschke on 2012-11-03 (ietf-http-wg@w3.org from October to December 2012)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sat, 03 Nov 2012 15:39:10 +0100
To: Dan Winship <dan.winship@gmail.com>
CC: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <50952C8E.3050303@gmx.de>
Hi Dan,

first of all: Thanks a lot for your review; this is very, very helpful.

I'll try to fix the obvious stuff right away, have a few questions, and 
will open tickets for everything else. When commenting about something 
that got a ticket number *please* start a new thread and put the ticket 
number into the subject line.

See more comments in-line.

On 2012-10-30 12:15, Dan Winship wrote:
>> 1. Introduction
>
>>     The Hypertext Transfer Protocol (HTTP) is an application-level
>>     request/response protocol that uses extensible semantics and
>>     MIME-like message payloads for flexible interaction with
>>     network-based hypertext information systems.
>
> Removing the word "hypertext" from the last line would make this a
> more accurate description of present-day HTTP. Likewise a bit later
> on:
>
>>     HTTP proxies and gateways can provide access to alternative
>>     information services by translating their diverse protocols into
>>     a hypertext format that can be viewed and manipulated by clients
>>     in the same way as HTTP services.

The "H" in HTTP stands for "Hypertext". As such I'd prefer to keep this, 
but defer this to Roy.

>> 2.2. Implementation Diversity
>
>>     Likewise, requirements that an automated action be confirmed by
>>     the user before proceeding can me met via advance configuration
>
> typo: "can *be* met"

Indeed; fixed earlier on.

Note that we have the latest edits on line (see 
<http://greenbytes.de/tech/webdav/#wg-httpbis> and 
<http://trac.tools.ietf.org/wg/httpbis/trac/browser/draft-ietf-httpbis/latest>; 
diffs of the text versions are can be found at 
<http://greenbytes.de/tech/webdav/#wg-httpbis> as well, for instance 
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-latest-from-previous.diff.html> 
for this part.

>> 2.3. Intermediaries
>
>>     Many implementations depend on HTTP's stateless design in order
>>     to reuse proxied connections or dynamically load balance requests
>>     across multiple servers.
>
> "load-balance" should have a hyphen (otherwise it's easy to misparse
> as being about loading "balance requests")

Done.

>> 2.4. Caches
>
>>     There are a wide variety of architectures and configurations of
>>     caches and proxies deployed across the World Wide Web and inside
>>     large organizations. These systems include national hierarchies
>>     of proxy caches to save transoceanic bandwidth, systems that
>>     broadcast or multicast cache entries, organizations that
>>     distribute subsets of cached data via optical media, and so on.
>
> Since this section is entitled "Caches" and all of the examples are
> about caches, it might make sense to remove the words "and proxies" in
> the first sentence.

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/389>

>> 2.6. Protocol Versioning
>
>>     However, the minor version was not incremented for the changes
>>     introduced between [RFC2068] and [RFC2616], and this revision is
>>     specifically avoiding any such changes to the protocol.
>
> Change tense? ("has specifically avoided")

Done.

>> 2.7.1. http URI scheme
>
>>     If the host identifier is provided as an IP literal or IPv4
>>     address,
>
> We don't import the "IP-literal" and "IPv4address" rules from 3986, so
> this would make more sense as just "provided as an IP address".

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/390>

>>     received with an empty host, then it MUST be rejected as invalid.
>>     If the port subcomponent is empty or not given, then TCP port 80 is
>>     assumed (the default reserved port for WWW services).
>
> should "WWW" be "HTTP"?

Dunno. Roy?

>>     Senders MUST NOT include a userinfo subcomponent (and its "@"
>>     delimiter) when transmitting an "http" URI in a message.
>
> would read better if the "and" was "or" IMHO

"and" sounds right to me.

> Also, taken literally, this rule prohibits even transmitting an HTML
> page containing a link with a userinfo subcomponent. Maybe:
>
>     Senders MUST NOT include a userinfo subcomponent (or its "@"
>     delimiter) when transmitting an "http" URI in the request-line or
>     header section of a message.
>
> [Mark just commented on this in his review too but suggested "in a
> request-target". Not sure if the restriction was meant to include
> headers as well...]

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/391>

>>     the deprecated subcomponent is being used to obscure the authority
>>     for the sake of phishing attacks.
>
> Should "phishing" be defined/cited?

Do we have something we could cite?

>> 2.7.2. https URI scheme
>
>>     distinct origin servers. However, an extension to HTTP that is
>>     defined to apply to entire host domains, such as the Cookie
>>     protocol [RFC6265], can allow information set by one service to
>>     impact communication with other services within a matching group
>>     of host domains.
>
> "can allow" sounds like permission. I'd say "might allow" (and maybe
> even explicitly state that it really SHOULD NOT?).

"might" works for me. What do others think?

>> 2.7.3. http and https URI Normalization and Comparison
>
>>     Likewise, an empty path component is equivalent to an absolute
>>     path of "/", so the normal form is to provide a path of "/"
>>     instead.
>
> Except that an empty path is not equivalent to a path of "/" in an
> OPTIONS request sent to a proxy...

That's the "authority-form", right? That's not an HTTP(s) URI anyway.

>> 3.1. Start Line
>
>>     In theory, a client could receive requests and a server could
>>     receive responses, distinguishing them by their different
>>     start-line formats, but in practice servers are implemented to
>>     only expect a request (a response is interpreted as an unknown or
>>     invalid request method) and clients are implemented to only
>>     expect a response.
>
> This is new text, and given that there will not be an HTTP/1.2, is
> there really any reason to add it?

I think it's useful information.

>> 3.2. Header Fields
>
>>        Note: The "Set-Cookie" header field as implemented in practice
>>        can occur multiple times, but does not use the list syntax,
>>        and thus cannot be combined into a single line ([RFC6265]).
>>        (See Appendix A.2.3 of [Kri2001] for details.) Also note that
>>        the Set-Cookie2 header field specified in [RFC2965] does not
>>        share this problem.
>
> The Kri2001 citation doesn't add anything now that we have a canonical
> Set-Cookie spec in RFC 6265. Likewise, there's no point in citing RFC
> 2965 since it's now obsolete.

[Kri2001] provides useful historical background that unfortunately is 
not in RFC 6265; as such I'd like to keep the reference.

I have removed the statement about Set-Cookie2, though.

>> 3.2.1. Whitespace
>
>>       OWS            = *( SP / HTAB )
>>                      ; "optional" whitespace
>>       RWS            = 1*( SP / HTAB )
>>                      ; "required" whitespace
>>       BWS            = OWS
>>                      ; "bad" whitespace
>
> The quotes around "optional" and "required" don't serve any purpose.
> (The ones around "bad" can either go or stay, depending on whether you
> think BWS is bad or just "bad".)

Done.

>> 3.2.2. Field Parsing
>
>>     A field value MAY be preceded by optional whitespace (OWS); a
>>     single SP is preferred.
>
> "MAY" and "optional" are redundant. "A field value is preceded by
> optional whitespace..."

Indeed. OWS can be empty anyway, so all we need is a statement of fact. 
Done.

>> 3.2.4. Field value components
>
>>       qdtext         = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
>>       ctext          = OWS / %x21-27 / %x2A-5B / %x5D-7E / obs-text
>
> It seems weird to use "OWS" here rather than "HTAB / SP".

Done. (I wonder what made me put it there)


>> 3.3.3. Message Body Length
>
>>         If this is a response message received by a user-agent, it
>>         MUST be treated as an error by discarding the message and
>>         closing the connection.
>
> "user agent" does not have a hyphen.

We now use "user agent" throughout.

>>     5.  If a valid Content-Length header field is present without
>>         Transfer-Encoding, its decimal value defines the message body
>>         length in octets. If the actual number of octets sent in the
>>         message is less than the indicated Content-Length, the
>>         recipient MUST consider the message to be incomplete and
>>         treat the connection as no longer usable.
>
> The only way the sender could have sent fewer than the indicated
> number of octets would be if it closed the connection early, in which
> case telling the receiver to "treat the connection as no longer
> usuable" is hardly necessary. Maybe:
>
>       If the sender closes the connection or the recipient times out
>       before the indicated number of octets are received, then the
>       recipient MUST consider the message to be incomplete and treat
>       the connection as no longer usable.
>
>>         If the actual number of octets sent in the message is more
>>         than the indicated Content-Length
>
> This is impossible by definition; any octets after the indicated
> Content-Length are not part of the message.
>
>>         the recipient MUST only process the message body up to the
>>         field value's number of octets; the remainder of the message
>>         MUST either be discarded or treated as the next message in a
>>         pipeline. For the sake of robustness, a user-agent MAY
>>         attempt to detect and correct such an error in message
>>         framing if it is parsing the response to the last request on
>>         a connection and the connection has been closed by the
>>         server.
>
> As above, "user agent" shouldn't have a hyphen, but it should be
> "client" here anyway, shouldn't it?
>
> Maybe replace the whole paragraph with:
>
>      For the sake of robustness, a client MAY assume that any
>      additional octets after the indicated Content-Length in the
>      response to the last request on a connection were intended to be
>      part of the message body.

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/392>

>> 5.3. Request Target
>
> We explicitly say not to include userinfo when using origin-form:
>
>>     A Host header field is also sent, as defined in
>>     Section 5.4, containing the target URI's authority component
>>     (excluding any userinfo).
>
> or authority-form:
>
>>     When making a CONNECT request to establish a tunnel through one
>>     or more proxies, a client MUST send only the target URI's
>>     authority component (excluding any userinfo) as the
>>     request-target.
>
> but we don't say anything about userinfo when using absolute-form.
> I guess 2.7.1 already forbids sending a userinfo there but it still
> seems inconsistent to not say it here. (Alternatively, if 2.7.1's
> restriction is only supposed to apply to request-target, then we could
> remove it there and specify it in each case here.)

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/393>

>> 5.4. Host
>
>>     If the authority component is missing or undefined for the target
>>     URI, then the Host header field MUST be sent with an empty
>>     field-value.
>
> 5.1 defines "target URI" as being an absolute URI, and http URIs
> always have an authority component, so this requirement seems
> meaningless.

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/394>

>> 5.7. Via
>
>>       received-protocol = [ protocol-name "/" ] protocol-version
>
> protocol-name and protocol-version aren't defined yet. Forward ref?

I added a forward as ABNF comment for now.

>> 6.1. Connection
>
>>     When a header field is used to supply control information for or
>>     about the current connection, the sender SHOULD list the
>>     corresponding field-name within the "Connection" header field.
>
> Isn't that a MUST?
>
>>       Connection: close
>>
>>     in either the request or the response header fields indicates that
>>     the connection SHOULD be closed after the current request/response
>>     is complete (Section 6.2.5).
>
> Likewise. (6.2.5 seems to agree)

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/395>

>> 6.2. Persistent Connections
>
> I'd mentioned this before on the list, but all the intro text here
> except for the final "HTTP implementations SHOULD implement persistent
> connections" is really only of historical interest and could just go
> away. The "SHOULD implement" requirement could be moved into "6.2.2
> Reuse", and then each of the subsections of 6.2 could be promoted up
> to become a direct child of section 6. So you'd get:
>
>    6.  Connection Management
>    6.1.  Connection
>    6.2.  Establishment
>    6.3.  Reuse
>    6.3.1.  Pipelining
>    6.3.2.  Retrying Requests
>    6.4.  Concurrency
>    6.5.  Failures and Time-outs
>    6.6.  Tear-down
>    6.7.  Upgrade
>
>
>
>> 6.2.2. Reuse
>
>>     A server MAY assume that an HTTP/1.1 client intends to maintain a
>>     persistent connection until a close connection option is received
>>     in a request.
>
> SHOULD?

<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/396>

>> 6.2.2.2. Retrying Requests
>
>>     Confirmation by user-agent software with semantic understanding of
>>     the application MAY substitute for user confirmation.  The
>>     automatic
>
> "user agent" shouldn't have a hyphen. (This one is inherited from
> 2616; 2068 was consistent, but the hyphenated form started creeping in
> in 2616.)

Done.

>> 6.2.5. Tear-down
>
>>     A server that receives a close connection option MUST initiate a
>>     lingering close of the connection after it sends the final
>>     response
>
> This could use a "(see below)" since "lingering close" hasn't been
> defined yet.

Done.

>> 7.6. Upgrade Token Registry
>
>>     The HTTP Upgrade Token Registry defines the name space for
>>     protocol-name tokens used to identify protocols in the Upgrade
>>     header field.
>
> They're used in the Via header field too... (Should the registry be
> renamed to "Protocol-name Token Registry"?)

Is it worth the trouble? (The registry is already there, after all).


>> 8.4. DNS-related Attacks
>
>>     Clients need to be cautious in assuming the validity of an IP
>>     number/DNS name association unless the response is protected by
>>     DNSSec
>
> "DNSSEC", not "DNSSec"

Done.

>> A.1.3. Introduction of Transfer-Encoding
>
>>     Proxies/gateways MUST remove any transfer-coding
>>     prior to forwarding a message via a MIME-compliant protocol.
>
> This seems like it would fit better in p2's "Appendix A. Differences
> between HTTP and MIME".

We already have 
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p2-semantics-21.html#rfc.section.A.5>

>> A.2. Changes from RFC 2616
>
> We probably want to double-check these sections against the issue
> tracker before final publication, but one particular thing that stuck
> out to me as missing is the addition of the CONNECT rule and removal

You mean the inclusion of CONNECT? We already mention that in P2. Should 
we mention here as well?

> of the multipart/byteranges rule from Section 3.3.3.

Added.

Best regards, Julian
Received on Saturday, 3 November 2012 14:39:50 UTC