Re: END_SEGMENT and END_STREAM redundant from Roberto Peon on 2014-04-20 (ietf-http-wg@w3.org from April to June 2014)

From: Roberto Peon <grmocg@gmail.com>
Date: Sun, 20 Apr 2014 04:27:05 -0700
To: David Krauss <potswa@gmail.com>
Cc: Adrian Cole <adrian.f.cole@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNd5-VYbbeFb_N0mh+mXqBOzSHHB4mZHCqY45RT0GLO5YQ@mail.gmail.com>
On Sun, Apr 20, 2014 at 3:58 AM, David Krauss <potswa@gmail.com> wrote:

>
> On 2014–04–20, at 11:59 AM, Roberto Peon <grmocg@gmail.com> wrote:
>
> On Sat, Apr 19, 2014 at 7:23 PM, David Krauss <potswa@gmail.com> wrote:
>
>> There’s some circular reasoning here. Interoperability refers to what
>> intermediaries may change, or to a lesser extent what synonymous
>> bit-codings portable APIs (e.g. Javascript XHR) may merge. If an underlying
>> representation may be changed according to the semantics it expresses, then
>> relying on bits is not interoperable.
>>
>
> Interoperable implementations of the protocol may not understand each
> other at the application layer. That is not a problem the protocol can
> solve.
>
>
> I’m talking about whether a proxy is allowed to internally use a generic
> END_SEGMENT symbol, or if it must distinguish the case of a bit in an empty
> DATA or HEADERS frame following a frame of the other type. In this way the
> application layer is creeping into transport.
>
> Additional questions I still have are whether data may be coalesced across
> a headers block, and whether header blocks in the same segment may be
> coalesced. I don’t think there are many use-cases for either, and they
> could be surprising to applications that forget END_SEGMENT. However, the
> first case does seem to be allowed according to the current spec. (It says
> nothing about coalescing headers, but then again I can’t find where it
> explicitly describes coalescing data either. If headers are not guaranteed
> to arrive at any particular point, they could all get pushed to the
> beginning or end of the segment, anyway.)
>

You're ascribing a semantic that I'm not thinking is the semantic of the
protocol.
The protocol ensures that any END_SEGMENT occurs at the same byte offset
from the first byte of a stream as when it was created.
Similarly, the protocol ensures that HEADERS are at the same byte offset.


>
> These are protocol and transport issues.
>
> A different programmer sensibility is often applied to binary coding than
>> to text, but it’s best to use the same approach either way. A format
>> defines the expression of a variety of messages, and those messages
>> comprise the only defined meaning.
>>
>>
> I suspect we're arguing semantics at such a level at this point that it
> doesn't matter, but the protocol cannot define a meaning: It defines a
> grammar.
>
>
> Also restrictions and allowances on transporting messages in that grammar,
> including some degree of rearrangement. What the sender sees is not always
> what the receiver gets.
>

That depends on how it is defined. As I state above, END_SEGMENT or HEADERS
are always at the same byte offset in the datastream.
Frames otherwise have zero semantic meaning, as they can be broken
up/coalesced at will.


>
> At this point I suspect we're mostly violently agreeing.
>
>
> Mostly. It comes down to the grammar:
>
>             (HEADERS_WITH_END_SEGMENT | DATA_WITH_END_SEGMENT)
>
>
> Having these two symbols allows for saving 8 bytes, but introduces
> possible application design confusion. Application designers need to decide
> whether the two symbols should (or should not) mean the same thing, and API
> designers whether to support the distinction. Such support actually
> requires *three* symbols, with an additional AUTOSELECT_END_SEGMENT to be
> used by the sender.
>

They always mean the same thing. Frame-level details should not be surfaced
at the application layer. HTTP2 is not a frame-oriented protocol. It is
either message (END_SEGMENT) or bytestream. Frames are subordinate to
either of these and form the building blocks for creating streams of
messages.


>
> Given my suspicions about coalescing data across headers, right now I’m
> thinking that each HEADERS frame should start a new “message” and all of
> segmentation is redundant. Applications that want a sequence of data-only
> messages with no metadata can spend 8 bytes on an empty HEADERS frame.
> Header-only protocols see no overhead.
>

No.


>
> In any case, 8 bytes is only equal to the overhead of the extra DATA frame
> that any segmentation implicitly requires, and nothing compared to the
> total overhead of flushing which is also likely to happen. We shouldn’t
> sacrifice anything for the sake of 8 bytes.
>

The flag bits are there to be used, there is no sacrifice here :)


>
> —
>
> The earlier BNF was imprecise, and it might help the big picture to
> definitively record the application-level view.
>
> The current spec:
>
> stream:
> header-block segment* unterminated-segment? (end-stream|rst-stream)
>
> segment:
> unterminated-segment (headers-end-segment | data-end-segment)
>
> unterminated-segment:
> header-block* data-octet*
> (Transport may move the headers relative to the data, such that their
> order within a segment is insignificant.)
>
>
Applications should never see frames. They should probably get things like:
 Got headers on the stream. Got bytes on the stream. Got end of message.
Got end of stream.


> What we get by fixing the location of all header blocks in the data
> stream, sacrificing multiple header blocks within a segment (replaceable by
> a user-defined x-begin-message header), and adding 8 bytes per segment
> that doesn’t start with headers:
>
> stream:
> segment+ (end-stream|rst-stream)
>
> segment:
> header-block data-octet*
>
> I think this better matches what application designers expect. I didn’t
> include use of END_SEGMENT as an abnormal termination indicator in the list
> of sacrifices, because RST_STREAM already does that.
>

Application designers should never see the frame-level stuff. If they're
ascribing semantic value to the frames, they're doing it wrong and their
application *will* break as it goes throug a proxy. The application-layer
grammar's atoms are: metadata, bytes, end-of-message, end-of-stream.

-=R


>
> It’s also much simpler to correctly specify, and describe usage.
>
>
Received on Sunday, 20 April 2014 11:27:34 UTC