Comments on the HTTPbis draft, v20

Hello everyone,


This mail contains some initial editorial comments, questions, and 
recommendations for the httpbis draft 
(http://tools.ietf.org/wg/httpbis/), version 20.

This work arose from a desire to develop a set of conformance tests for 
our web services establishing them as HTTP/1.1 conformant 'origin 
servers.' The analysis was derived from ignoring the bulk of the 
standard and focusing only on the formal injunctions, the sentences with 
the words 'MUST' or 'SHOULD,' with a focus on those affecting origin 
servers.

While I had hoped to go through the whole document, I have stopped at 
section 3.3 of part 1. The review of the remaining injunctions will 
proceed in much the same way as for those presented here, asking similar 
questions of each one: is the target clearly identified, is the target 
one in the list given in section 2.6, is the injunction clear, if the 
injunction uses "SHOULD" is the conditionality of when an implementation 
can avoid the injunction clear, do we understand the consequences of 
violating the injunction, and so on. So the authors and editors, if they 
find the review worthwhile, should be able to make their own progress on 
the rest of the document.




In reading your draft, two central questions arose that might 
fundamentally alter this analysis depending on your response.

First, are 'targets' intended to be the only entities enjoined by the 
requirements of your standard?

These are introduced, in section 2.6 in part 1 and in section 1.1 of 
part 2, apparently to define the entities for which rules will be made. 
However, the rules have not all been written to explicitly constrain 
instances of those targets. The formal adoption of a constrained set of 
elements would be very useful for readers of your standard, however, 
first, the list will probably need to be updated after formal review of 
all the injunctions and, second, the rules will need to be rewritten to 
focus explicitly on these targets only.


Second, are the requirements (aka 'injunctions' or 'rules'), that is the 
sentences with the capitalized 'MUST' and 'SHOULD,' intended to be the 
only source of normative text or are they merely intended to complement 
and stress certain aspects of the text?

The role of your requirements should be clear to you as authors and to 
us as readers. For instance, I have yet to find any requirement which 
demands that 'recipients of inbound messages' respond to each message; 
the protocol is described as a request/response protocol (vis. sec 2.1, 
part 1) but this pattern is never mandated in a formal requirement from 
what I can tell. From a systematic point of view, this seems to be the 
most fundamental requirement so its absence as a formal injunction 
suggests your standard treats injunctions merely as stressing some parts 
of the protocol. (This is proposed as an example; any actual requirement 
would surely have to be qualified at least for pipelined requests since 
an error early in the pipelined stream, say a GET request declaring a 
body of a certain length without actually having any body, might prevent 
a recipient from being able to distinguish each incoming request as a 
separate request.)



The rest of this analysis proceeds assuming that you intend 'targets' to 
be the only entities constrained and that you intend the sentences with 
'MUST' and 'SHOULD' to be the only normative text. The Open Geospatial 
Consortium (OGC) (http://www.opengeospatial.org) recently adopted 
similar rules for its own work on specification documents along with two 
other principles: requirements should be visually isolated from the rest 
of the text and requirements should be testable and therefore 
accompanied by a formally written set of tests. The former could be 
adopted by HTTPbis with little effort. The latter principle, however, 
despite making conformance testing much simpler, would have a major 
impact on your work and is probably better left for the future.






EDITORIAL ISSUES:
----------------

* The requirements should adopt the boring but clear structure:
   $target MUST action
or
   $target SHOULD, $condition, action
in the active form to avoid making injunctions of things that are not 
targets and to avoid using the passive voice and thereby failing to 
define any target at all.


* Could the lower case 'must' in the 'Copyright section' be changed to 
some other phrasing so as to keep that word for formal injunctions?


* Is section 2, part 1 merely a description of the architecture of the 
HTTP protocol or the beginning of the formal requirements?

Section 2 is primarily descriptive, introducing the terminology of the 
standard and the targets of the requirements. However, section 2 also 
introduces a few requirements which are not the overarching, fundamental 
requirements of the standard (each request gets a response, all messages 
must follow the syntax, inbound messages must be requests and outbound 
messages must be responses) but are highly specific issues. It would be 
best to move the specific issues elsewhere in the document. If would 
also be best to ensure the 'targets' are defined before any requirement 
is made of those targets, i.e. adopt for part 1 an introduction section 
like that of part 2.


* Should section 3, part 1 define only the syntax rules?

Currently section 3 mixes the definition of the syntax rules for the 
<HTTP-message> elements and its sub-elements along with requirements for 
handling communication messages that match or do not match the syntax. 
It would seem cleaner to separate out these two discussions.


* Use of angle brackets around syntax elements

The current text does not use angle brackets to isolate and label 
protocol elements as against natural language ideas within the text 
writeup but it would be useful to distinguish the two. For example, 
'HTTP message' should refer to the byte stream exchanged over a 
connection and '<HTTP-message>' to the syntax definition to which the 
byte stream must conform. Making the distinction explicit would clarify 
section 3 of part 1.





REVIEW OF EXISTING REQUIREMENTS:
-------------------------------



Part I

2.4

    However, an HTTP-to-HTTP gateway that wishes to interoperate with
    third-party HTTP servers MUST conform to HTTP user agent requirements
    on the gateway's inbound connection and MUST implement the Connection
    (Section 6.1) and Via (Section 6.2) header fields for both
    connections.

This requirement does not apply to 'origin servers' but is discussed 
because it seems out of place. Does this need to be a requirement? If 
so, could it at least be moved to some other section?

If an HTTP-to-HTTP gateway is *defined* as acting as a user-agent 
inbound and acting as an origin-server outbound then the first part of 
this requirement seems superfluous. Furthermore, I do not see the 
benefit of the dependent clause "that wishes to ..."; does it mean that 
some HTTP-to-HTTP gateways do not need to be user-agents on the inbound 
connection?

The second part raises lots of questions. What does 'implement the ... 
header fields' imply? As I read sections 6.1 and 6.2, those rules apply 
to all "proxy or gateway" instances making this second part superfluous, 
so that it should be changed to a descriptive statement; if this second 
part is actually needed, then we need some explanation for when a 
gateway must implement those rules and when it does not.

I suspect both parts of this statement should be changed from 
requirements to descriptions, something like:
"HTTP-to-HTTP gateways, beyond conforming to all the requirements of 
gateways, also necessarily conform to all the requirements applicable to 
HTTP user agents on the inbound connections and to all requirements 
applicable to HTTP origin servers on the outbound connection and 
HTTP-to-HTTP gateways are obliged, by sections 6.1 and 6.2, to ..."





    Hence,
    servers MUST NOT assume that two requests on the same connection are
    from the same user agent unless the connection is secured and
    specific to that agent.

This requirement is also tackled because it seems out of place. Does 
this need to be a requirement? If it does, could it at least be moved 
elsewhere?

While, indeed making this assumption would be wrong, I am not sure this 
makes sense as a requirement---it seems more a global statement of fact: 
since HTTP is stateless, a server cannot really ever assume that 
subsequent requests are coming from the same 'user-agent.' Even with 
HTTPS connections, while the proximate 'user-agent' might be the same, 
that might merely be a gateway aggregating more distant, ultimate 
'user-agents' so even a pipelined set of requests over an HTTPS 
connection might be requests from totally different users. (I return to 
this below discussing caching of 'private' https responses.) If this is 
to remain a requirement, perhaps it should be turned around and 
constructed as a statement of when a server *can* assume that two 
requests on a given connection are from the same user agent.

 From the context of the surrounding discussion, it appears this 
requirement is actually a recommendation for writers of other standards 
to not make incorrect assumptions; in that case, language of the 
statement should be changed to reflect that and not be a formal 
requirement of HTTP/1.1.





2.6

    In addition to the prose requirements placed upon them, senders MUST
    NOT generate protocol elements that do not match the grammar defined
    by the ABNF rules for those protocol elements that are applicable to
    the sender's role.

This is great as one of the initial, core requirements and, therefore, 
should not be buried here nor repeated in other parts (e.g. sec 1.1, 
part 2) but be a highly visible requirement of the standard. However, it 
could also be written more clearly.

The first clause is superfluous since this is merely one more 'prose 
requirement' is it not? Also, 'prose requirement' seems to be trying to 
distinguish from other kinds of requirements but my understanding is 
that the only requirements are sentences with "MUST" or "SHOULD".

"...MUST only generate ...that match..." would avoid the double negative.

What is a 'protocol element,' is it a 'message' sent over the 
communication channel, or an 'element of the ABNF syntax,' or something 
else entirely? It seems worth formally defining somewhere. It is first 
used in the third paragraph of this section.

Would it not be clearer to spell this out and say something like:
"All senders of messages in the HTTP protocol MUST only send messages 
which conform with the syntax of <HTTP-message>, with the <start-line> 
either <request-line> for client generated messages or <status-line> for 
server generated messages"
using the angle brackets for formal syntax elements? (Instead of 'client 
generated' the text might read 'inbound' and similarly for outbound 
messages.)




    If a received protocol element is processed, the
    recipient MUST be able to parse any value that would match the ABNF
    rules for that protocol element, excluding only those rules not
    applicable to the recipient's role.

The implications of this requirement are unclear. What does 'is 
processed' imply? Are all messages sent to an origin server 'processed' 
or only some? What does 'able to parse' really imply?

This requirement might be rephrased to state that recipients are 
expected to be able to break down the message into all the elements 
defined in the ABNF syntax so that other requirements can be made 
relative to elements of the syntax. Perhaps this could be:
"All recipients of messages in the HTTP Protocol MUST be able to parse 
an HTTP message which matches the syntax of the <HTTP-message> element 
into the constituent sub-elements defined by the HTTP syntax."
However, perhaps this is also discussing parsing the header values into 
their parts, not just the header field into its header name and header 
value; if so, that should be explained and 'protocol element' should not 
be reused both in the sense of the message exchanged and in the sense of 
a header value.


2.7

    When an implementation receives an
    unrecognized header field, the recipient MUST ignore that header
    field for local processing regardless of the message's HTTP version.

Are any header fields required to be 'recognized'? If so, where is this 
requirement stated? Are all implementations expected to 'not ignore' all 
of the headers defined in the HTTPbis spec?

Note that 'implementation' is not in the list of targets of section 2.6: 
either it should be added to the list or the text modified.




    An HTTP server SHOULD send a response version equal to the highest
    version to which the server is conformant and whose major version is
    less than or equal to the one received in the request.

This would be better phrased as sending a 'response message whose 
<http-version> is ...'

Under what circumstances is this not expected of HTTP/1.1 servers? The 
SHOULD allows for exceptions so it is useful to state the conditions 
where violating this rule is allowed.




    An HTTP
    server MUST NOT send a version to which it is not conformant.

Again the phrasing could be clearer since servers send *messages* not 
"versions."

The 'to which it is not conformant' might be 'to which it does not claim 
conformance' given that conformance is never actually determined by 
anyone; this requirement really is doing something else--allowing 
clients that get a message with a particular version to assume that the 
server is conformant with that version.




    An HTTP server MAY send an HTTP/1.0 response to an HTTP/1.0 request
    if it is known or suspected that the client incorrectly implements
    the HTTP specification and is incapable of correctly processing later
    version responses, such as when a client fails to parse the version
    number correctly or when an intermediary is known to blindly forward
    the HTTP-version even when it doesn't conform to the given minor
    version of the protocol.  Such protocol downgrades

This is probably supposed to say:
    "An HTTP server MAY send an HTTP/1.0 response to an *HTTP/1.1*
    request ..."
right? The client would be expected, not permitted, to answer a 1.0 
request with a 1.0 response.



2.8

    The host MUST NOT be empty; if an "http" URI is
    received with an empty host, then it MUST be rejected as invalid.

This requirement does not target any of the 'participant{s} in HTTP 
communication' of section 2.6 so this should not be a requirement 
phrased this way. Instead, it should be clearly stated as an additional 
syntax rule.

A more formal approach would define an <http-authority> syntax element, 
redefined from the <authority> element in RFC 3986 but including the 
element <non-empty-reg-name> instead of <reg-name>, since the <reg-name> 
is the only element of the current syntax allowed to be empty.

What does "receiving" an http URI mean? The URI could be in a 
<request-target> or in a header or in the body. Also, if 'an "http" URI' 
is the thing that matches the syntax of the <http-URI> element, then it 
will never have an empty <host> element.

How does the 'reject as invalid' happen? Is this merely repeating that 
there is a 'non-empty' syntax rule or are we actually talking about HTTP 
requests and 4xx level responses? It is unclear if we are talking about 
determining a violation of the syntax or about responding with an HTTP 
response.




    Senders MUST NOT
    include a userinfo subcomponent (and its "@" delimiter) when
    transmitting an "http" URI in a message.

Again we seem to be tripping up simply because we are trying to reuse 
the syntax elements of RFC 3986. Why not simply define the <http-URI> a 
little deeper in syntax and avoid the <authority> of RFC 3986 altogether?




    Recipients of HTTP messages
    that contain a URI reference SHOULD parse for the existence of
    userinfo and treat its presence as an error, likely indicating that
    the deprecated subcomponent is being used to obscure the authority
    for the sake of phishing attacks.

What does 'treat its presence as an error' imply? Are 'origin-servers' 
expected to respond with a 4xx level message or can we 'recover' from 
the erroneous URI and handle the message anyhow? And 'contain a URI 
reference' seems a bit broad since it could apply to a message body as 
well: are we not concerned only with the <request-target> and the 
message headers?




    the TCP connection MUST be secured for privacy through the use of
    strong encryption prior to sending the first HTTP request.

This is out of place and badly phrased.

The section is apparently defining a particular form of Uniform Resource 
Identifier (the title of section 8.2) relative to an https schema but 
the requirement is talking about message exchange rules. It should 
really be in a separate paragraph clearly discussing *usage* not *syntax*.

The passive voice hides the actual target of the injunction; the rule is 
really an injunction for the sender of an HTTP request to ensure the 
connection is in a particular state prior to transmission. The 
requirement should be rewritten to state exactly which target must make 
what happen or not happen (and ideally state what response comes when 
the requirement is not followed).




    Unlike the "http" scheme, responses to "https" identified requests
    are never "public" and thus MUST NOT be reused for shared caching.
    They can, however, be reused in a private cache...

This requirement reveals my own lack of understanding so I am not sure 
how to address it.

I presume the 'public' and 'private' refer to the particular session. 
However, does this really work? Does this mean that HTTP is stateless 
except for the duration of a TCP connection? I am not sure this is 
possible but it seems, with the proper certificates that two clients 
could be connected via HTTP/TLS to a proxy while that proxy was 
connected by HTTP/TLS to the origin-server. If the server is really 
allowed to cache, does this not mean that the wrong client might get the 
cached response from the server unless the proxy makes new connections 
to the server for each client? There seem to be major security 
implications of this requirement that ought to be spelled out for 
everyone. Also, this requirement seems linked to the rule from section 
2.4 above,
    servers MUST NOT assume that two requests on the same connection are
    from the same user agent unless ...
but the dependent clause, that the connection is secured, does not 
determine the ultimate user agent of the communication when an 
intermediary is involved.

I am surely misunderstanding the security design here but that suggests 
that this 'public' and 'private' caching system need more explanation.

The passive voice once again hides the 'target' of the injunction.




3
    Recipients MUST parse an HTTP message as a sequence of octets in an
    encoding that is a superset of US-ASCII [USASCII].

This requirement seems misplaced and is insufficiently specified.

Section 3 claims to be defining the formal syntax of a valid 'HTTP 
message' therefore these paragraphs on the parsing of binary messages 
seem out of place. Received messages may well be invalid so that 
processing is a bigger picture than just processing valid HTTP messages. 
(Note that we have an issue distinguishing what is meant by an "HTTP 
message," is it a stream of integers which conforms to <HTTP-message> 
syntax element or the binary stream communicated over the connection, 
which might be total garbage?) These processing requirements should be 
split out into a  section separate from the syntax rules and expanded.


A discussion about processing needs to describe the transition between 
the binary stream sent over inter-machine connections and the 
sub-elements from the <HTTP-message> syntax, passing through the 
sequence of integers which are the terminal values of the ABNF system 
defined by RFC 5234.

This discussion should probably start by requiring that all generated 
communication messages be encoded correctly since this is relatively 
simple and sets up the processing rules. This requirement would 
constrain all senders to only generate streams created by taking a valid 
<HTTP-message> and encoding that with an encoding in which the octet 
%x00 is never used and all octets in the range from %x01 to %x7F always 
represent the integers from 1 to 127. This spells out explicitly the 
'superset of US-ASCII' of the requirement above which is vague: UTF-16 
is often thought of as a superset of US-ASCII since it includes all the 
same characters at all the same code points plus some more.

The discussion could then turn to the processing expectations. The 
phrasing 'Recipients MUST parse' is unfortunate since User-Agents can do 
whatever they want with the server response, they are not required to 
parse them at all. However, the rules of the standard will be based on 
the ability of recipients to process the messages so this would be 
better written as
"Recipients are expected to be able to process all messages received 
over the communication connection according to the rules given here." 
(However, this should be compared with the requirement discussed above 
in section 2.6.)

The rest might be phrased as
"Recipients MUST assume, when parsing a message received over the 
communication connection into the elements of an <HTTP-message>, that in 
the initial part of the message up to the first occurrence of either the 
octet doublet %x0A %x0A, the octet doublet %x0D %x0D or the octet 
triplet %x0A %x0D %x0A, the occurrence of any octet from %x00 to %x7F 
stands for the equivalent integer value (from 0 to 127) and encodes the 
character at that code point in the US-ASCII character set."
     (The latter part is redundant with RFC 5234 but harmless.)

Probably, the handling of the higher octet values should be described 
here as well. I think there is a requirement elsewhere that these be 
treated as opaque binary values.

Finally, the discussion of processing can get to parsing and should 
expand on the 'normal procedure' in the current draft. Parsing should 
set out looking for the end-of-line sequences up to the blank line. 
Having found the end of the headers, we can translate all the octets in 
the ASCII range to their ASCII equivalents and then parse the result 
into the elements of the ABNF syntax.

Probably, the discussion should also explain what is expected of a 
recipient getting a message which does not match the syntax, namely that 
origin servers send back a message in the 4xx range, gateways and 
proxies I am not sure, and user-agents can decide for themselves.




3.1

    Implementations MUST NOT send whitespace between the start-line and
    the first header field.

To the extent that senders are required to follow the ABNF syntax, this 
requirement is redundant.

"Implementations" should be "Senders" since "Implementations" are not in 
the list of targets in section 2.6.

Based on the security discussion after the requirement, this requirement 
may be trying to eliminate messages of this kind entirely. If so, 
perhaps you need to require recipients to reject such messages outright, 
rather than attempt to recover.




    A server MUST be able to parse any received message that begins with
    a request-line and matches the ABNF rule for HTTP-message.

For the reasons explained above, I would separate out this processing 
requirement away from the syntax rules. Also 'be able to parse' is 
really 'be able to identify the sub-elements of the <HTTP-message> 
syntax element,' that is 'able to parse into $something.'




    Recipients of an invalid request-line SHOULD respond
    with either a 400 (Bad Request) error or a 301 (Moved Permanently)
    redirect with the request-target properly encoded.

    Recipients SHOULD
    NOT attempt to autocorrect and then process the request without a
    redirect, since the invalid request-line might be deliberately
    crafted to bypass security filters along the request chain.


These rules should be subsumed to a general processing system.

"Recipients" is too broad since user-agents are 'recipients' but clearly 
should not 'respond'. This could be clarified with 'Recipients of 
inbound messages' except that we probably want to exclude tunnels as well.

It would be better to state that
"Recipients of inbound messages which do not match the syntax of the 
<HTTP-message> element SHOULD respond, unless they can correct the 
syntax of the <request-target> element, with a a 400 (Bad Request)."
This might be extended with
"Recipients of inbound messages with invalid <request-line> elements 
which can be corrected by properly encoding the <request-target> MAY 
respond with a 301 (Moved Permanently) response with a "Location:" 
header containing the properly encoded target reconstructed into a URI 
{or a <path>?} but MUST NOT process the request without a redirect since 
..."
although this might only apply to origin servers and might apply only if 
they can ascertain that the corrected resource really exists.



    A server that receives a method longer than any that it
    implements SHOULD respond with either a 405 (Method Not Allowed), if
    it is an origin server, or a 501 (Not Implemented) status code.


The clause "if it is an origin server" seems strange here. Why does it 
apply only to the former response? It would seem 'not implemented' would 
be a characteristic of the origin server as well.



    A
    server MUST be prepared to receive URIs of unbounded length and
    respond with the 414 (URI Too Long) status code if the received
    request-target would be longer than the server wishes to handle (see
    Section 4.6.12 of [Part2]).

This requirement talks of URI but I presume it is really discussing the 
<request-target> element. Also, why can the server not simply be ready 
to receive a <request-target> up to its maximum allowable length, rather 
than speaking of 'unbounded length'? It seems this is really a note to 
implementors: "watch out <request-targets> may be realy long, deal with 
them if they are too big by ..." which is not really a requirement.
"Since <request-target> elements can be quite large, servers MAY respond 
to a request with a <request-target> element which is too large for the 
implementation with an HTTP response message using the 414 (URI Too 
Long) status code."



    A client MUST be able to parse any received message that begins with
    a status-line and matches the ABNF rule for HTTP-message.

This repeats the earlier requirement from section 2.6 unless I 
misunderstood that earlier requirement. Since 'parse' just means 'cut 
up' so this requirement needs to specify into what clients must be able 
to parse, i.e. into the HTTP syntax elements defined earlier.



3.2

    New HTTP header fields SHOULD be registered with IANA according to
    the procedures in Section 3.1 of [Part2].

This is not a requirement of any of the targets laid out in section 2.6 
but a different level of 'should' applicable to the community. This 
requirement should be rewritten as a separate paragraph discussing 
general organization around HTTP. This also does not apply to 
experimental header names.




    Unrecognized
    header fields SHOULD be ignored by other recipients.

If this is a conditional requirement, the condition needs to be spelled 
out: when may recipients freak out due to the presence of unrecognized 
headers?




    A server MUST
    wait until the entire header section is received before interpreting
    a request message, since later header fields might include
    conditionals, authentication credentials, or deliberately misleading
    duplicate header fields that would impact request processing.

The justification should be split into a separate sentence. Also, the 
language makes it seem like a good idea to wait until misleading 
information is received so the text needs to be cleared up.



    Multiple header fields with the same field name MUST NOT be sent in a
    message unless the entire field value for that header field is
    defined as a comma-separated list [i.e., #(values)].

The language should be flipped around to have
"Senders MUST NOT send HTTP messages with multiple header fields that 
have the same name unless ..."




    OWS SHOULD either not be produced or be produced as a
    single SP. Multiple OWS octets that occur within field-content
    SHOULD either be replaced with a single SP or transformed to all SP
    octets (each octet other than SP replaced with SP) before
    interpreting the field value or forwarding the message downstream.

These rules are trying to constrain a more general syntax without 
actually altering the syntax grammar itself which is a silly way to 
proceed. This is probably necessary due to backwards compatibility 
issues; if so, that should be made clear.

Since this is a SHOULD, when would it make sense for senders to generate 
messages with multiple space OWS?

The 'produced' is not the same language as in section 2.6 which talked 
of 'generate' versus 'send'.

The 'multiple OWS octets' probably needs a notion of contiguity. Again, 
when would it make sense not to perform this transformation, that is 
does this injunction not really mean that the interpretation of the 
message must be identical to the interpretation if the substitution were 
performed?

Note that these issues also apply to the next paragraphs, on RWS and BWS.




    Any received request message that contains whitespace between a
    header field-name and colon MUST be rejected with a response code of
    400 (Bad Request).

Again the language should be flipped around to identify the target (and 
exclude a client getting a request by mistake.)
"Inbound Recipients MUST reject an HTTP request message that contains 
... colon with a response code of ..."



    A field value MAY be preceded by optional whitespace (OWS); a single
    SP is preferred.

This is redundant with the syntax rules and therefore confusing: if the 
syntax allows it, an implementation MAY produce it. Therefore it seems 
this injunction is merely stating a preference for this OWS to be 
present and be a single space; that does not need an injunction.



    The field value does not include any leading or
    trailing white space: OWS occurring before the first non-whitespace
    octet of the field value or after the last non-whitespace octet of
    the field value is ignored and SHOULD be removed before further
    processing (as this does not change the meaning of the header field).


This mixes a rule for syntax with a processing rule: that would be less 
likely to happen with a formal approach to injunctions in which the 
target were the subject of the sentences. The requirement should be split.

Note that this definition of the field value is much narrow than what is 
allowed by the definition of the <field-value> element in the syntax, so 
that needs to be clarified. (See my proposal in a separate email.)





    HTTP senders MUST NOT produce messages that include
    line folding (i.e., that contain any field-value that matches the
    obs-fold rule) unless the message is intended for packaging within
    the message/http media type.

The "intended for packaging" is a little indirect. Is this really saying 
that the folding is fine if the message will be the body of another message?





    HTTP recipients SHOULD accept line
    folding and replace any embedded obs-fold whitespace with either a
    single SP or a matching number of SP octets (to avoid buffer copying)
    prior to interpreting the field value or forwarding the message
    downstream.

The first part of this 'SHOULD accept line folding' is redundant to the 
requirement that all recipients MUST be able to parse messages with 
valid syntax.

If the SHOULD applies to the "...replace any ..." as well, then it needs 
to specify under what conditions a recipient would be allowed to not 
replace? Would it be better to phrase this part to state that the 
interpretation must be equivalent to an interpretation obtained if the 
OWS were replaced?



    Newly defined header fields SHOULD limit their
    field values to US-ASCII octets.

Header fields are not 'targets' of this specification. Indeed this is a 
requirement for the procedure of defining new headers rather than for 
actors of the HTTP protocol. This should either become a recommendation 
to those defining new field values or a constraint on the generators of 
messages with new field values.



    Recipients SHOULD treat other (obs-
    text) octets in field content as opaque data.

This seems strange in that header fields are sent to recipients to 
convey meaning so some recipients are going to have to actually use the 
octets in some way meaning they will not treat them as opaque but take 
action based on their content. This injunction is probably intended to 
apply to the original parsing of octets by recipients which do not 
recognize the header field.

The use of 'other' means that the injunction does not stand alone as a 
statement but requires its context. It would be better if each 
injunction were understandable on its own outside of its context.




    A server MUST be prepared
    to receive request header fields of unbounded length and respond with
    a 4xx (Client Error) status code if the received header field(s)
    would be longer than the server wishes to handle.

Again, this turn of phrase seems unfortunate: the phrase "prepared to 
receive .... of unbounded length" makes it seem that servers need to 
handle what cannot be handled. This could be
"A server MUST be prepared to receive a header field longer than the 
server can handle, in which case the server MUST respond with ..."
or
"A server MUST respond, when receiving a header field longer than the 
server wishes to handle, with an HTTP response using a 4xx level (Client 
Error) status code."

Note that since this is not the only injunction mandating a return type, 
this injunction probably needs a clause "unless another error response 
is also required for the messsage" and an overarching rule for servers 
required by multiple rules to respond with an exception code.


    These special
    characters MUST be in a quoted string to be used within a parameter
    value (as defined in Section 4).

This is another case of a requirement not being made of a 'target'. It 
is also another case of a requirement being redundant to the syntax. 
This is merely a description of the consequences of the syntax not a new 
requirement in its own right.



    Recipients that process the value of the quoted-string MUST handle a
    quoted-pair as if it were replaced by the octet following the
    backslash.

ok



    Senders SHOULD NOT escape octets in quoted-strings that do not
    require escaping (i.e., other than DQUOTE and the backslash octet).

ok






I am stopping this review here, although this only gets us to Section 
3.3 'Message Body' of part 1.

 From this review, it seems that all of the injunctions merit a careful 
re-reading. Also, if these injunctions are to be the canonical source of 
normative text for the standard, then a formal review of the content of 
these sentences would be useful. In particular, I imagine there is a 
need for some extra, core injunctions to the standard such as:
* all inbound recipients MUST respond to all received HTTP requests with 
an HTTP response,
* HTTP responses SHOULD, unless some new use pattern has been 
discovered, use error codes in the 1xx range to indicate ..., in the 2xx 
range for ...,
and others.

I hope this is not overwhelming. I have been undertaking similar work 
for the standards at the OGC and, while it takes a large effort to 
complete, it seems worth the while since it makes for much better documents.


sincerely,

   ~Adrian Custer

Received on Wednesday, 5 September 2012 17:13:56 UTC