RE: QPACK and the Static Table

While I've tried to limit the bias introduced by their single browser and selection of sites, I share those concerns about using the HTTP Archive as a single source of data.  However, what I've heard from one HTTP API deployment and suspect is true across CDNs is that sharing request header values is problematic.

Maybe if we locally applied the filtering of not sharing values unless a single value represents more than 5% of header occurrences, that would be sufficiently anonymized for folks to be willing to share.

-----Original Message-----
From: Mark Nottingham <mnot@mnot.net> 
Sent: Wednesday, May 23, 2018 4:48 PM
To: Mike Bishop <mbishop@evequefou.be>
Cc: HTTP Working Group <ietf-http-wg@w3.org>; quic@ietf.org
Subject: Re: QPACK and the Static Table

Hey Mike,

My .02 (I also left some notes in the PR) -

The static table is most useful on request headers, so that's what we should be focusing on. If it were me, I'd drop all of the response header fields except the most common ones (say 10 or so), and focus on request headers.

In fact, I'd look at paring down the number of entries in total for *just* the initial requests on a connection -- putting too many things in the static table might influence how implementations emit other headers, and that's not the intent here.

The HTTP Archive is a bit problematic; not only is it focused on "big" sites (albeit a lot of them), but it's also AIUI pretty homogenous on the client side, so it's not going to be very representative for request headers.

I think it would be better to get a sample of request headers seen by a couple of sites, a CDN or two, and at least one "HTTP API" type deployment, and see where that leads -- if we can find someone willing to do the work.

Cheers,


> On 24 May 2018, at 9:16 am, Mike Bishop <mbishop@evequefou.be> wrote:
> 
> Wanted to get a sense of the affected working groups on two issues in QPACK (header compression for HTTP/QUIC).
>  
> Rather than indexing the tables together and having the static table at 1-61, QPACK uses a bit to indicate static vs. dynamic.  Since the field is seven bits long, the performance is comparable for the dynamic table (you can access 63 entries in one byte, 190 in two), but you can increase the size of the static table without hurting the dynamic table.  As a result, we’re building a fresh static tablebased on queries against HTTPArchive data.
>  
> The key question that has come up in a couple venues:  What real-world headers do we want to artificially remove from what the data shows, and what headers not seen by HTTP Archive do we want to force in anyway?
>  
> So far, we’ve:
>  • forced in pseudo-headers because the Archive doesn’t capture them and they would otherwise be absent
>   • :path, :authority, :method
>  • deleted values presumed biased by the test configuration:
>   • Server: (various vendors)
>   • User-Agent
>   • Accept-Language: en-us, en;q=0.9
>   • Content-Length: 531
>    • I still wonder exactly why that’s so common….
> 		• P3p: policyref=”https://www.googleadservices.com/...”….

> 		• Origin: https://www.facebook.com

> 		• Alt-Svc for various versions of gQUIC
>   • …the list goes on
>  • deleted headers prohibited by HTTP/QUIC and HTTP/2
>   • Transfer-Encoding: chunked
>  • Reordered to put headers you’re likely to name-reference at the front, especially if you’re unlikely to add them to the dynamic table
>  
> The question is whether we should also backfill headers which HTTP Archive wouldn’t see, delete headers we wish people wouldn’t use, and/or insert the ones we hope they eventually will.  Some candidates:
>  • Add Alt-Svc entry for HTTP/QUIC with QUIC v1
>  • Add X-Forwarded-For
>  • Don’t add X-Forwarded-For, but do add Forwarded
>  • Remove Expires to incent the use of Cache-Control
>  • Collapse the “Content-Type: <thingey>” and “Content-Type: <thingey>; charset=utf-8” entries together
>   • …but which one to keep?
>  • Add Content-Encoding and/or Accept-Encoding entries for zstd
>  
> There’s an endless parade of bikesheds here.  As Martin has pointed out, this will never be perfect, so the goal is “good enough and keep going.”  Any strong feelings about any of these before we merge it?
>  
> Also, there’s been some discussion of a mechanism for selecting one of several static tables at the start of a connection.  In that case, the spec would probably define three tables (client headers, server headers [for servers that don’t push], combined [for servers that push]) and enable future RFCs to define others for targeted scenarios (proxies, video playback, IoT, etc.).  How much does that interest folks?

--
Mark Nottingham   https://www.mnot.net/

Received on Wednesday, 23 May 2018 23:51:25 UTC