Re: Multiple Host Ambiguity

On Fri, May 6, 2016 at 2:29 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

>
> > On 5 May 2016, at 21:32, Jian Jiang <ottojiang@gmail.com> wrote:
> >
> > Dear all,
> >
> > We recently found that HTTP implementations vary largely in handling
> host in a crafted request with multiple Host headers, and/or
> whitespace-preceded/-succeeded Host headers, and/or absolute request-URL.
> We have found some vulnerabilities due to inconsistencies between different
> implementations in a HTTP-processing chain.
> >
> > We would like to discuss this problem here. I have some initial
> thoughts/questions:
> >
> > 1). Whitespace is a major source of multiple Host ambiguity. My
> understanding is that only until RFC 7230, whitespace around field-name is
> explicitly forbidden. But the message is somewhat confusing. For whitespace
> between field-name and colon, the rule in RFC 7230 is clear: rejection with
> 400. But for whitespace before field-name, the main body of RFC 7230
> (section 3) only says if whitespace appears before the first header field,
> either the request should be rejected or the header should be ignored. The
> clear rule is located at Appendix A.2: "invalid whitespace around
> field-names is required to be rejected ...". An uncareful read of the
> document would have missed this message. We have seen that implementations
> in general are more tolerant with whitespace-preceded Host header than
> whitespace-succeeded Host header.
>
> Does RFC 7230 say that whitespace before the first header field should be
> rejected or ignored? All I can find in Section 3 is that RFC 7230 is about
> ons-fold, which cannot apply to the first header field because obs-fold is
> grammatically defined to trail the field-value rather than to lead the
> field-name.
>
> I’d say, then, that a space preceding the Host header should lead to one
> of two behaviours: either rejecting the request if that’s the first header
> field (it’s ill-formed), rejecting the request if the implementation
> rejects obs-fold, or folding it into the preceding header field (leading
> the implementation to conclude that no Host header is present).
>

RFC 7230 also says it is okay to ignore whitespace-preceded first header
field.

Apache ignores whitespace-preceding first header, folding others. Nginx
just ignores whitespace-preceding headers.


>
> Note that Appendix A.2 is non-normative: it simply notes the changes that
> were made. In this case, that appendix note applies to the requirement to
> 400 invalid whitespace between a field name and the colon.
>
> Did you confirm that implementations treated the whitespace-preceded Host
> header as host header, rather than folding it into the prior header? If
> they did that, I’d say those implementations were being over-generous with
> their parsing.
>

We found some implementations accepting whitespace-preceded Host header:
IIS, Squid, and some CDNs like Akamai.


> > 2). Host in absolute request-URL is another major source of ambiguity.
> Both RFC 2616 and RFC 7230 state that host in absolute request-URL should
> "override" Host header. We see some implementations follow, but some don't.
> RFC 7230 additionally states (section 5.4) client must send a Host header
> that is identical with host in request-URL, which (indirectly) requires
> server to reject a request that has inconsistent hosts in its request-URL
> and header field. But only a few implement this rule. None of RFC 2616 and
> RFC 7230 have explicit description about scheme in request-URL. Some
> implementations accept any scheme like "unknown://“.
>
> This seems unambiguous to me: if the Host header and authority portion of
> the request URL conflict, that’s a client error that needs a 4XX response.
>

RFC 7230 should be sufficient to prevent inconsistency between request-URL
and Host header, but nobody follows. We found some cases inconsistency
could happen. For example, for a request like

    GET unknown://a.com/ HTTP/1.1
    Host: b.com

Varnish does not recognize "unknown://" scheme, understands it as b.com
request, and forward it as is to upstream server. A Nginx at upstream
understands it as a.com request because Nginx takes any scheme in
request-line,


>
> > 3). Multiple Host header fields is explicitly forbidden in RFC 7230 (not
> in RFC 2616). But again only a few follow this requirement. I tried to look
> at the archive messages to understand why this is added in RFC 7230, but I
> couldn't find any discussion. Does anyone know the context around this rule
> ? (I found some discussions around whitespace in header field, which is
> very helpful)
>
> Multiple Host header fields were implicitly forbidden in RFC 2616 by the
> definition of the Host header field (‘host [ “:” port]’) combined with RFC
> 2616 Section 4.2’s text that says that "Multiple message-header fields with
> the same field-name MAY be present in a message if and only if the entire
> field-value for that header field is defined as a comma-separated list.”.
> That iff criterion doesn’t apply to Host, so multiple Host headers were
> forbidden implicitly by that requirement.
>
> The change in RFC 7230 is therefore editorial only and no discussion would
> have been required: it made explicit a requirement that was already
> implicit in the rest of the text.
>
> It is not uncommon for implementations to be *somewhat* lenient on this
> rule in some cases: most generally, in cases with Host where the duplicate
> header fields are identical. That said, RFC 7230 is now much clearer about
> the requirement to 4XX.
>

Thanks for clarifying that. An explicit rule is much clearer. I didn't get
the implication by reading RFC 2616.

We only found Varnish and IIS have implemented this rule.


>
> > 4). My general feeling is that RFC 7230 is clear in how host should be
> parsed from a request. But these rules are located in different places,
> quite easy to miss when doing implementation.
>
> I don’t know that I agree with this concern.
>
> There are no special rules for parsing the Host header field from a
> request header block: it is parsed exactly like all other headers. Anyone
> implementing a HTTP/1.1 implementation has to know the rules for parsing
> header fields, and they do not need to special-case Host.
>

I am not saying Host header should be specially treated. We found some
problems with Host header, other headers probably also have parsing
ambiguities, but we are not familiar with their semantics to think about
the consequences.

My impression is that implementations are often mixes of RFC 2616, RFC
7230, and some in-house rules. I don't know how (and if possible) this
situation can be improved in practice.


>
> The only extra wrinkle is around the handling of hosts in absolute-URIs in
> the request line, and the rule there is exactly where I’d expect to see it:
> in the text about absolute-URIs in the request line. There doesn’t seem to
> be much ambiguity here.


RFC 7230 is not ambiguous in handling absolute-URL, but implementations do
not follow, and we found quite some parsing inconsistencies here.

I saw a previous discussion about Host header, I agree with one point and
had similar thoughts before: from protocol design point of view, Host
header can and should be replaced by absolute-URL in request line.


>
> Cory
>
>


-- 
Jian

Received on Saturday, 7 May 2016 11:39:42 UTC