Header field-name token and leading spaces

Hi,

This is a bit long but it came because we were trying to fix a bug into python library for HTTP headers and production rules.

request.add_header('foo', 'bar')
→ "foo:bar"
request.add_header(' foo', 'bar')
→ " foo:bar"
request.add_header('foo ', 'bar')
→ "foo :bar"

What I gathered from the spec:

In 3.2.  Header Fields, 
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2 

the header production rules are defined as:

     header-field   = field-name ":" OWS field-value BWS
     field-name     = token
     field-value    = *( field-content / obs-fold )
     field-content  = *( HTAB / SP / VCHAR / obs-text )
     obs-fold       = CRLF ( SP / HTAB )
                    ; obsolete line folding
                    ; see Section 3.2.4

   The field-name token labels the corresponding field-value as having
   the semantics defined by that header field.

So far so good, but we do not know what are the production rules for "field-name     = token". It might come later. Let's read a bit more. 

In 3.2.3, Whitespace, there are production rules for OWS, BWS, and RWS:
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.3

This defines at least the rules for 

     header-field   = field-name ":" OWS field-value BWS

which says basically.

--------------------------------------------------
OK    "Foo: bar"       (1 space or more)
OK    "Foo: bar"   (1 tab or more)
OK    "Foo:bar"        (no space)
--------------------------------------------------
AVOID "Foo: bar "      (1 trailing space or more)
AVOID "Foo: bar "      (1 trailing tab or more)
--------------------------------------------------

AVOID means:

* senders SHOULD NOT generate it in messages.
* recipients MUST accept such bad optional whitespace and remove it
  before interpreting the field value or forwarding the message
  downstream.

ok cool. Let's go on.

In 3.2.4 Field Parsing
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.4

   No whitespace is allowed between the header field-name and colon.  

So In production rules, we can't do:

-------------------------------------------------------
BAD  "Foo :bar"    (1 or more space/tab before the ":")
-------------------------------------------------------

   In
   the past, differences in the handling of such whitespace have led to
   security vulnerabilities in request routing and response handling.  A
   server MUST reject any received request message that contains
   whitespace between a header field-name and colon with a response code
   of 400 (Bad Request).


OK. This is clear too. Tested on W3C Server, 

→ curl -I -H "foo :bar" --trace-ascii - http://www.w3.org/

W3C server sent back 

    HTTP/1.0 400 Bad request

Though not all servers do that:

→ curl -I -H "foo :bar" --trace-ascii - http://www.ietf.org/
HTTP/1.1 200 OK


There is a rule also for proxies:

  A proxy MUST remove any such whitespace from a
   response message before forwarding the message downstream.

-------------------------------------------------------
"foo :bar" → "foo:bar"
-------------------------------------------------------

MY QUESTION (finally) :) 

Nothing is said about 
-------------------------------------------------------
" foo:bar"   (1 or more space/tab before the fied-name)
-------------------------------------------------------

In appendix C, the ABNF defines token for:
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#appendix-C

The section of the spec saying

     field-name     = token

with

   token = 1*tchar

and tchar as 

   tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
    "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA


So the production rules forbid a leading space, but nothing is said about parsing this leading space. 

* Should it say something? 
* If yes, what? 
* If not, why?


-- 
Karl Dubost
http://www.la-grange.net/karl/

Received on Sunday, 3 March 2013 19:03:58 UTC