Re: Proposal - Reduce HTTP2 frame length from 16 to 12 bits from 陈智昌 on 2013-05-28 (ietf-http-wg@w3.org from April to June 2013)

From: 陈智昌 <willchan@chromium.org>
Date: Tue, 28 May 2013 16:25:39 -0700
To: Roberto Peon <grmocg@gmail.com>
Cc: James M Snell <jasnell@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>, Patrick McManus <mcmanus@ducksong.com>
Message-ID: <CAA4WUYhOnocH7nxX=ZmzH8jyygF_JAaYzTezCWFXP1XdTUEgKg@mail.gmail.com>
Just to be clear, I don't feel too strongly here. I do want to address a
point as I feel my previous point was lost.


On Tue, May 28, 2013 at 1:12 PM, Roberto Peon <grmocg@gmail.com> wrote:

> responses inline
>
>
> On Tue, May 28, 2013 at 12:16 PM, William Chan (陈智昌) <
> willchan@chromium.org> wrote:
>
>> On Tue, May 28, 2013 at 11:50 AM, James M Snell <jasnell@gmail.com>wrote:
>>
>>> On Tue, May 28, 2013 at 11:41 AM, Roberto Peon <grmocg@gmail.com> wrote:
>>> > As a reverse proxy, I've seen properties for which 4k writes/reads
>>> were too
>>> > small and induced latency increases.
>>> >
>>>
>>> I haven't played with this part too much yet but this is my general
>>> suspicion also.
>>>
>>
>> Can you guys clarify this in more detail? Specifically, where the latency
>> comes from. I have ideas, but I'd rather than an authoritative explanation.
>>
>
> It always comes down to the cost of the context switches (i.e. syscalls)
> and the locking that must be done in the lower layers of the IO stack.
>

Thanks for the clarification, I suspected it was the write()/read() cost,
which I assume is what you mean by syscall.


>
>
>>
>>>
>>> > Admittedly, frame size doesn't have to be the same as read/write size,
>>> but
>>> > it certainly does encourage that implementation (which is, I think, the
>>> > point of smaller max frame size that you proposed).
>>>
>>
>> You're right that it does encourage that implementation. Just like a
>> larger length encourages just naively breaking up frames into that max
>> frame size and thus hurt responsiveness. Which one is likelier to cause
>> worse overall "performance" (I know this is vague, since people care about
>> different aspects of perf)? What we want to do is have the most reasonable
>> default behavior, with the ability for performant implementations to tune
>> without unreasonable difficulty. I believe we're mostly focusing here on
>> optimizing the naive implementations, not the highly tuned implementations.
>>
>
> Remember that I'm the one who proposed the smaller max frame size in the
> first place (now a fair while ago)? :)
>

I don't believe I've said anything that would imply I forgot that :)


> My sweet-spot number was 16k, as I knew that I could saturate a 10G nic
> with 16k frames/writes and have enough CPU left over to do some actual
> work. The amount of overhead goes up more than linearly with the decrease
> in frame size thanks to contention, etc.
>

I think you miss my point. Please correct me if I'm wrong, but I think
you're saying that for your server, 16k was the right choice for write()s.
write() sizes don't need to be tied to actual frame size, but of course
that's what a naive implementation would do. And again, I think we should
pick a max frame size that results in reasonable behavior for naive
implementations/deployments. And I think the highly performant
implementations will want to write their code in a way that decouples frame
size from write() size, and will pick the optimal write() size given the
tradeoffs.


>
>
>>
>>
>>> >
>>> > I propose we keep the 16 bit frame size and instead allow the (now
>>> > negotiated setting of) max frame size to default to 12 bits worth,
>>> with that
>>> > going upwards out downwards when a settings frame arrives from the
>>> other
>>> > side indicating it's max receive size. HK
>>> >
>>>
>>> Honestly, I'd prefer to do away with frame size negotiation altogether
>>> because of the potential for path mtu style issues. Keeping the 16-bit
>>> size for now with strong encouragement (SHOULD, perhaps?) for keeping
>>> sizes around 12-bit lengths for the most common cases  seems like the
>>> right approach.
>>>
>>> -- James
>>>
>>
> Unlike TCP/IP, max frame size is a point-to-point thing, as the primitive
> we mux is streams, not frames. Frames are the way we accomplish the muxing.
> Why would there be any path MTU like thing?
>
> -=R
>
>
>>
>>> > This would give the best chance that the code would be written in such
>>> a way
>>> > as to adapt with the times as they change.
>>> > -=R
>>> >
>>> > On May 28, 2013 10:01 AM, "William Chan (陈智昌)" <willchan@chromium.org>
>>> > wrote:
>>> >>
>>> >> Can you clarify what you mean by a documented performance metric for
>>> >> non-browser use cases? I don't think Patrick said anything browser
>>> specific.
>>> >> He provided some serialization latency numbers and noted that they
>>> are high
>>> >> enough to impact responsiveness. And then he provided numbers on
>>> overhead.
>>> >>
>>> >> I, for one, find the responsiveness argument compelling for browsers.
>>> I'm
>>> >> not completely sure 0.2% is low enough overhead for everyone, but I
>>> wouldn't
>>> >> complain about it. And in absence of complaints, I guess I'd support
>>> moving
>>> >> forward with only 12 bits for length.
>>> >>
>>> >>
>>> >> On Tue, May 28, 2013 at 9:22 AM, James M Snell <jasnell@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Currently, my only challenge with this is that, so far, we have not
>>> >>> seen any documented performance metrics for non-browser based uses.
>>> >>> .That said, I don't really have the time currently to put together a
>>> >>> comprehensive set of such metrics so it wouldn't be polite of me to
>>> >>> insist on them ;-) ... perhaps for now we ought to keep the 16-bit
>>> >>> size but include a recommendation about not exceeding 12-bits, then
>>> >>> see what more implementation experience does for us.
>>> >>>
>>> >>> On Tue, May 28, 2013 at 7:20 AM, Patrick McManus <
>>> mcmanus@ducksong.com>
>>> >>> wrote:
>>> >>> > Hi All,
>>> >>> >
>>> >>> > I've been looking at a lot of spdy frames lately, and I've noticed
>>> what
>>> >>> > I
>>> >>> > consider a common implementation problem that I think a good http/2
>>> >>> > spec
>>> >>> > could help with. I'm commonly seeing frames large enough to
>>> interfere
>>> >>> > with
>>> >>> > effective prioritization. I've seen this from at least 3 different
>>> >>> > servers.
>>> >>> >
>>> >>> > The HTTP/2 draft has a max frame size of 16 bits, which is a huge
>>> >>> > improvement from spdy's 24. I propose we reduce it further to 12.
>>> (i.e.
>>> >>> > 4096
>>> >>> > bytes).
>>> >>> >
>>> >>> > The muxxed approach of multiple streams onto one connection done in
>>> >>> > HTTP/2
>>> >>> > has great advantages, but the one downside of it is that it creates
>>> >>> > head of
>>> >>> > line blocking problems between those streams dictated by frame
>>> >>> > granularity.
>>> >>> > With small frames this is pretty manageable, with extremely large
>>> ones
>>> >>> > we've
>>> >>> > recreated the same head of line problems that HTTP/1 pipelines
>>> have.
>>> >>> > The
>>> >>> > server needs to  be able to respond quickly to higher priority
>>> events
>>> >>> > (including cancellations) and once it has written a frame header
>>> to the
>>> >>> > wire
>>> >>> > it is committed to the entire frame for how ever long it takes to
>>> >>> > serialize
>>> >>> > it. IMO the shorter that time, the better.
>>> >>> >
>>> >>> > Our spec can help implementations do the right thing here by
>>> limiting
>>> >>> > the
>>> >>> > max frame size to 12 bits.
>>> >>> >
>>> >>> > It takes 500msec to serialize 64KB at 1Mbit/sec... 125msec at
>>> >>> > 4Mbit/sec.
>>> >>> > Those are some pretty notable task-switch times. Dropping the
>>> frame to
>>> >>> > 4096
>>> >>> > cuts them to 32msec and 8 msec.. that's much more responsive, at
>>> the
>>> >>> > cost of
>>> >>> > 120 extra bytes of transfer (< 1msec at 1Mbit/sec).
>>> >>> >
>>> >>> > In general - the smaller the better as long as the overhead
>>> doesn't get
>>> >>> > to
>>> >>> > be too large. At 8 in 4096 (~.2%) I think that's acceptable. Its
>>> >>> > roughly the
>>> >>> > same overhead as a VLAN tag.
>>> >>> >
>>> >>> > Obviously this makes a continuation bit for control frames
>>> absolutely
>>> >>> > mandatory, but I think we're already in that spot with 16 bit frame
>>> >>> > lengths.
>>> >>> >
>>> >>> > -Patrick
>>> >>> >
>>> >>> >
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>
Received on Tuesday, 28 May 2013 23:26:10 UTC