Re: HPACK analysis from 山本和彦 on 2014-02-07 (ietf-http-wg@w3.org from January to March 2014)

From: 山本和彦 <kazu@iij.ad.jp>
Date: Fri, 07 Feb 2014 16:02:04 +0900 (JST)
To: w3c@adambarth.com, Rob.Trace@microsoft.com, ietf-http-wg@w3.org, mnystrom@microsoft.com, ekr@rtfm.com, Michael.Bishop@microsoft.com
Message-Id: <20140207.160204.1686272593015001017.kazu@iij.ad.jp>
Hi,

Since the space of HPACK encoding is large, I think we can consider an
encoding which is free from this kind of brute force attack. For
instance, "NaiveH" described in the following page is resist because
it does not use the header table (nor reference set).

 http://d.hatena.ne.jp/kazu-yamamoto/20140129/1391057824


Unfortunately, the compression ratio of NaiveH is low (0.86). So, I
considered another encoding, say "StaticH". 

HaiveH does not make use of the static table at all. StaticH takes
advantage of it. That is:

- A header name is indicated by an index of the static table if possible.
  If not, the literal is encoded by huffman.
- A header value is encoded by huffman.

I implemented StaticH and calculated its compression ratio with the
same data set. The result is 0.63.

If we prepare better encoding schemes for each header value, the ratio
could be improved. For instance, we can prepare another static table
for the values of Content-Type:.

Regards,

--Kazu

> I’ll correspond with your preface by saying I’m not a security expect, but how would the Basic-vs.-OAuth attack work?  You have to have an exact match to the header for HPACK to compress it; that means that “Authorization:  Bearer <blahblah>” and “Authorization:  Basic <blahblah>” will never compress against each other even with the same value.  The attacker would be able to tell whether the client had ever sent a header by that name for headers not in the static header table, but Authorization is.  Or is the attack that the CORS response enables the Basic header to get into the compressor, at which point it’s attackable?
> 
> I see your larger point, that an attacker can guess-and-check for arbitrary header values in the history, which was originally considered an advantage of HPACK -- that an attacker could only guess-and-check entire header values (rather than pieces of a header value as with GZip).
> 
> An attacker with access to a warm compressor can learn:
> 
>   *
> If any header name not in the static table has ever been sent with any value
> 
>   *
> If a specific header name/value combination has been previously sent
> 
> Sent from Windows Mail
> 
> From: Adam Barth<mailto:w3c@adambarth.com>
> Sent: ‎Friday‎, ‎January‎ ‎31‎, ‎2014 ‎10‎:‎12‎ ‎PM
> To: Rob Trace<mailto:Rob.Trace@microsoft.com>
> Cc: HTTP?Group<mailto:ietf-http-wg@w3.org>, Magnus Nystrom<mailto:mnystrom@microsoft.com>, Eric Rescorla<mailto:ekr@rtfm.com>
> 
> I'd like to preface my message by saying that I haven't read the original HPACK proposal.  I'm relying upon Eric's description below.
> 
> Eric's message is a bit abstract, but I believe there are some security issues to be concerned about here.  My main concern is that HPACK weakens security because it requires downstream technologies to maintain more invariants in order to avoid leaking sensitive information.
> 
> As an example, consider a web server that implements OAuth, specifically RFC6750.  The web server might be expecting requests similar to the example given in http://tools.ietf.org/html/rfc6750#section-2.1:

> 
>      GET /resource HTTP/1.1
>      Host: server.example.com<http://server.example.com>
>      Authorization: Bearer mF_9.B5f-4.1JqM
> 
> To let other web sites access this resource from the user's browser, the server might use the Access-Control-Allow-Headers header [1] from CORS to allow other web sites to send Authorization headers.  Up until this point, nothing untoward has occurred.
> 
> However, once you add HPACK, an active network attacker has gained a powerful ability.  The attacker can now validate whether the client has ever sent an Authorization header with a given value by issuing an HTTP request via XMLHttpRequest because the server has whitelisted the Authorization header via CORS.
> 
> If another service running on the same server uses Basic authentication (i.e., RFC2617), the attacker can brute force the user's password because Basic authentication shares the Authorization header with OAuth.  Notice that this brute force attack is unlikely to be detected because the attacker is querying the compression table in the client and can block the requests from ever reaching the server.  If, instead, the attacker tried to brute force the user's password by querying the server directly, the server would be able to detect the attack due to a large number of failed authentication requests.
> 
> The situation gets worse if we consider non-standard web technology, such as Flash.  For example, Flash's URLRequest API lets the attacker set a wide variety of headers because it uses a header blacklist rather than a whitelist [2].  Worse, Flash permits the attacker to issue such requests across origins via the navigateToURL API.  It just so happens that the Authorization header is on Flash's header blacklist, but we need to consider the possibility that web sites will store sensitive information in headers that aren't on Flash's blacklist.
> 
> One reaction I can imagine to this issue is to blame Flash and decry its use of a blacklist rather than a whitelist for security, but that misses the larger point that HPACK weakens security because it requires all downstream technologies to maintain more invariants in order to avoid leaking sensitive information out of an otherwise secure channel.
> 
> Adam
> 
> [1] http://www.w3.org/TR/cors/#access-control-allow-headers-response-header

> [2] http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/URLRequestHeader.html

> [3] http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/package.html#navigateToURL()<http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/package.html#navigateToURL%28%29>
> 
> 
> On Mon, Jan 27, 2014 at 8:20 AM, Rob Trace <Rob.Trace@microsoft.com<mailto:Rob.Trace@microsoft.com>> wrote:
> This is the start of a threat analysis on HPACK from the TLS WG chair. Please take a look at this and let me or Brian know if we need to have a threat model meeting on this to provide a response when this is posted to the mailing list.
> 
> Thanks!!
> 
> -Robt
> 
> -----Original Message-----
> From: Eric Rescorla [mailto:ekr@rtfm.com<mailto:ekr@rtfm.com>]
> Sent: Sunday, January 26, 2014 5:02 PM
> To: Roberto Peon; Rob Trace; mt; Patrick McManus; jpinner@twitter.com<mailto:jpinner@twitter.com>; Adam Barth; William Chan; Mark Nottingham; Sean Turner
> Subject: HPACK analysis
> 
> Martin and I took a first look at the security of HPACK and produced the writeup below. We wanted to give you guys a first look before posting to the mailing list, especially since abarth pointed out that maybe there is a problem from Flash...
> 
> Any comments? Objections to us posting?
> 
> -Ekr
> 
> HPACK THREAT BACKGROUND
> 
> HPACK [0] is a compression scheme for HTTP headers that is intended to resist being used as an oracle by attackers.
> 
> The general idea behind HPACK is that each side maintains a table of known header name-value pairs. In order to transmit a set of headers, the sender encodes each member of the set in one of three ways:
> 
> - As an integer index to an existing header pair already in the
>   table.
> - As an integer reference to an existing header name already in
>   the table plus a literal value.
> - As a literal name/value pair.
> 
> An arbitary number of each header name can exist. E.g., there can be two "cookie" headers, one for each cookie.
> 
> Literal values are sent directly or encoded using Huffman coding with a fixed code table. The values are then padded out to the next byte boundary. Each header is individually encoded.
> 
> Note that the table may be (and probably is) larger than the set of headers to be encoded at any given time. I.e., the table is a (size-limited) list of every header that has been encoded but a given message may only contain some smaller set of headers.
> 
> We should analyze HPACK under two threat models, a generalized threat model and a Web-specific threat model.
> 
> 
> GENERALIZED THREAT MODEL
> We can start by analyzing the most general form of the threat model.
> We assume that the attacker has an oracle O that he can query. The oracle is primed with a known set of headers where at least one of the header values V is unknown (though the attacker may know the unencoded size). The attacker's job is to extract V.
> 
> The attacker can access the oracle as follows:
> 
> - Ask for the length of the encoding of the given set of headers.
> - Add a new header with an arbitrary name and value (so that if
>   that header exists already, there are now multiples).
> - Replace any header with a new header name and value.
> 
> The table size is infinite, so that every header name/value pair ever used is remembered.
> 
> 
> WEB THREAT MODEL
> The Web Threat model is somewhat more limited (and more complicated).
> 
> In particular:
> 
> - It is only possible to add certain headers (See the XHR and CORS
>   specifications at [2] and [3] and the browser security handbook at [4]).
> - The attacker does not have an unlimited number of queries.
> - The table has a bounded size so the attacker needs to worry about
>   pushing entries out of the table.
> 
> To get the full picture, you will need to read the HPACK, HTTP/2.0, XHR, and CORS specifications.
> 
> 
> KNOWN ATTACKS
> We have already done some preliminary analysis and know about the following attacks:
> 
> - It is possible to verify the exact value of a header if the
>   attacker can inject that header. The idea here is that you
>   add the header and look to see if the size of the encoded set
>   increases by the literal value or by the indexed value.
>   This is an advantage over simple guessing attacks against
>   the server because the attacker can divert the requests so
>   that the server never sees a failed guess.
> 
> - Because not all the symbols for characters are a given
>   length, it is possible to learn something about a given
>   value by observing its length. Unfortunately, since each
>   field is separately padded, you only get to learn the
>   sum of the symbol lengths rounded up to the next byte,
>   which doesn't tell you much.
> 
> We do not currently know how to guess anything other than a full value, which obviously limits the utility of attacks to low-entropy values. Our best attack currently is against cookies, which can be set by the attacker. We are still looking at how to attack Basic authentication passwords. We haven't yet figured out how to get the attacker to inject them using standardized Web technologies. (The obvious avenues don't seem to work, but we are still checking.) It appears that it may, however, be possible to do so with Flash (thanks to Adam Barth for this suggestion).
> 
> 
> 
> [0] http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-05

> [1] http://tools.ietf.org/html/draft-ietf-httpbis-http2-09

> [2] http://www.w3.org/TR/XMLHttpRequest/

> [3] http://www.w3.org/TR/cors/

> [4] https://code.google.com/p/browsersec/wiki/Main

> 
> 
> 
> ACKNOWLEDGEMENTS
> Thanks to Martin Thomson, Patrick McManus, and Adam Barth for discussions about this.
> 
>
Received on Friday, 7 February 2014 07:02:46 UTC