Re: Header field-name token and leading spaces

The production rules forbid a leading space because you can't
disambiguate it from (obsolete) line folding. Nothing is said about
parsing it because as long as the leading space isn't on the first
header sent, the parser assumes its part of the previous header value.

On Sun, Mar 3, 2013 at 11:03 AM, Karl Dubost <karl@la-grange.net> wrote:
> Hi,
>
> This is a bit long but it came because we were trying to fix a bug into python library for HTTP headers and production rules.
>
> request.add_header('foo', 'bar')
> → "foo:bar"
> request.add_header(' foo', 'bar')
> → " foo:bar"
> request.add_header('foo ', 'bar')
> → "foo :bar"
>
> What I gathered from the spec:
>
> In 3.2.  Header Fields,
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2
>
> the header production rules are defined as:
>
>      header-field   = field-name ":" OWS field-value BWS
>      field-name     = token
>      field-value    = *( field-content / obs-fold )
>      field-content  = *( HTAB / SP / VCHAR / obs-text )
>      obs-fold       = CRLF ( SP / HTAB )
>                     ; obsolete line folding
>                     ; see Section 3.2.4
>
>    The field-name token labels the corresponding field-value as having
>    the semantics defined by that header field.
>
> So far so good, but we do not know what are the production rules for "field-name     = token". It might come later. Let's read a bit more.
>
> In 3.2.3, Whitespace, there are production rules for OWS, BWS, and RWS:
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.3
>
> This defines at least the rules for
>
>      header-field   = field-name ":" OWS field-value BWS
>
> which says basically.
>
> --------------------------------------------------
> OK    "Foo: bar"       (1 space or more)
> OK    "Foo:     bar"   (1 tab or more)
> OK    "Foo:bar"        (no space)
> --------------------------------------------------
> AVOID "Foo: bar "      (1 trailing space or more)
> AVOID "Foo: bar "      (1 trailing tab or more)
> --------------------------------------------------
>
> AVOID means:
>
> * senders SHOULD NOT generate it in messages.
> * recipients MUST accept such bad optional whitespace and remove it
>   before interpreting the field value or forwarding the message
>   downstream.
>
> ok cool. Let's go on.
>
> In 3.2.4 Field Parsing
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.4
>
>    No whitespace is allowed between the header field-name and colon.
>
> So In production rules, we can't do:
>
> -------------------------------------------------------
> BAD  "Foo :bar"    (1 or more space/tab before the ":")
> -------------------------------------------------------
>
>    In
>    the past, differences in the handling of such whitespace have led to
>    security vulnerabilities in request routing and response handling.  A
>    server MUST reject any received request message that contains
>    whitespace between a header field-name and colon with a response code
>    of 400 (Bad Request).
>
>
> OK. This is clear too. Tested on W3C Server,
>
> → curl -I -H "foo :bar" --trace-ascii - http://www.w3.org/
>
> W3C server sent back
>
>     HTTP/1.0 400 Bad request
>
> Though not all servers do that:
>
> → curl -I -H "foo :bar" --trace-ascii - http://www.ietf.org/
> HTTP/1.1 200 OK
>
>
> There is a rule also for proxies:
>
>   A proxy MUST remove any such whitespace from a
>    response message before forwarding the message downstream.
>
> -------------------------------------------------------
> "foo :bar" → "foo:bar"
> -------------------------------------------------------
>
> MY QUESTION (finally) :)
>
> Nothing is said about
> -------------------------------------------------------
> " foo:bar"   (1 or more space/tab before the fied-name)
> -------------------------------------------------------
>
> In appendix C, the ABNF defines token for:
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#appendix-C
>
> The section of the spec saying
>
>      field-name     = token
>
> with
>
>    token = 1*tchar
>
> and tchar as
>
>    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
>     "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
>
>
> So the production rules forbid a leading space, but nothing is said about parsing this leading space.
>
> * Should it say something?
> * If yes, what?
> * If not, why?
>
>
> --
> Karl Dubost
> http://www.la-grange.net/karl/
>
>

Received on Sunday, 3 March 2013 19:17:33 UTC