Re: END_SEGMENT and END_STREAM redundant from Adrian Cole on 2014-04-20 (ietf-http-wg@w3.org from April to June 2014)

From: Adrian Cole <adrian.f.cole@gmail.com>
Date: Sun, 20 Apr 2014 09:18:54 -0700
To: Roberto Peon <grmocg@gmail.com>
Cc: David Krauss <potswa@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAHzwyDteiV0Nx=UBiTXWVZXXodHg5EpkvL8CHzZ_pgHaMOfMZw@mail.gmail.com>
Two really helpful sentences that could make it into the draft or
somewhere else to make this more clear.
"The message may not have been completed when the stream was
terminated." <- explains END_STREAM w/o END_END_SEGMENT
"The application-layer grammar's atoms are: metadata, bytes,
end-of-message, end-of-stream." <- message is more clear to me than
segment, spec could align these concepts in text and example.

I would like to reference things like above, or even better examples
from a spec draft when implementing seems to be best named
"end-of-message" semantics.  I think I'm getting close to
understanding how to expose this, so thanks!
https://github.com/square/okhttp/issues/725

-A

On Sun, Apr 20, 2014 at 4:27 AM, Roberto Peon <grmocg@gmail.com> wrote:
>
>
>
> On Sun, Apr 20, 2014 at 3:58 AM, David Krauss <potswa@gmail.com> wrote:
>>
>>
>> On 2014–04–20, at 11:59 AM, Roberto Peon <grmocg@gmail.com> wrote:
>>
>> On Sat, Apr 19, 2014 at 7:23 PM, David Krauss <potswa@gmail.com> wrote:
>>>
>>> There’s some circular reasoning here. Interoperability refers to what
>>> intermediaries may change, or to a lesser extent what synonymous bit-codings
>>> portable APIs (e.g. Javascript XHR) may merge. If an underlying
>>> representation may be changed according to the semantics it expresses, then
>>> relying on bits is not interoperable.
>>
>>
>> Interoperable implementations of the protocol may not understand each
>> other at the application layer. That is not a problem the protocol can
>> solve.
>>
>>
>> I’m talking about whether a proxy is allowed to internally use a generic
>> END_SEGMENT symbol, or if it must distinguish the case of a bit in an empty
>> DATA or HEADERS frame following a frame of the other type. In this way the
>> application layer is creeping into transport.
>>
>> Additional questions I still have are whether data may be coalesced across
>> a headers block, and whether header blocks in the same segment may be
>> coalesced. I don’t think there are many use-cases for either, and they could
>> be surprising to applications that forget END_SEGMENT. However, the first
>> case does seem to be allowed according to the current spec. (It says nothing
>> about coalescing headers, but then again I can’t find where it explicitly
>> describes coalescing data either. If headers are not guaranteed to arrive at
>> any particular point, they could all get pushed to the beginning or end of
>> the segment, anyway.)
>
>
> You're ascribing a semantic that I'm not thinking is the semantic of the
> protocol.
> The protocol ensures that any END_SEGMENT occurs at the same byte offset
> from the first byte of a stream as when it was created.
> Similarly, the protocol ensures that HEADERS are at the same byte offset.
>
>>
>>
>> These are protocol and transport issues.
>>
>>> A different programmer sensibility is often applied to binary coding than
>>> to text, but it’s best to use the same approach either way. A format defines
>>> the expression of a variety of messages, and those messages comprise the
>>> only defined meaning.
>>>
>>
>> I suspect we're arguing semantics at such a level at this point that it
>> doesn't matter, but the protocol cannot define a meaning: It defines a
>> grammar.
>>
>>
>> Also restrictions and allowances on transporting messages in that grammar,
>> including some degree of rearrangement. What the sender sees is not always
>> what the receiver gets.
>
>
> That depends on how it is defined. As I state above, END_SEGMENT or HEADERS
> are always at the same byte offset in the datastream.
> Frames otherwise have zero semantic meaning, as they can be broken
> up/coalesced at will.
>
>>
>>
>> At this point I suspect we're mostly violently agreeing.
>>
>>
>> Mostly. It comes down to the grammar:
>>
>>             (HEADERS_WITH_END_SEGMENT | DATA_WITH_END_SEGMENT)
>>
>>
>> Having these two symbols allows for saving 8 bytes, but introduces
>> possible application design confusion. Application designers need to decide
>> whether the two symbols should (or should not) mean the same thing, and API
>> designers whether to support the distinction. Such support actually requires
>> three symbols, with an additional AUTOSELECT_END_SEGMENT to be used by the
>> sender.
>
>
> They always mean the same thing. Frame-level details should not be surfaced
> at the application layer. HTTP2 is not a frame-oriented protocol. It is
> either message (END_SEGMENT) or bytestream. Frames are subordinate to either
> of these and form the building blocks for creating streams of messages.
>
>>
>>
>> Given my suspicions about coalescing data across headers, right now I’m
>> thinking that each HEADERS frame should start a new “message” and all of
>> segmentation is redundant. Applications that want a sequence of data-only
>> messages with no metadata can spend 8 bytes on an empty HEADERS frame.
>> Header-only protocols see no overhead.
>
>
> No.
>
>>
>>
>> In any case, 8 bytes is only equal to the overhead of the extra DATA frame
>> that any segmentation implicitly requires, and nothing compared to the total
>> overhead of flushing which is also likely to happen. We shouldn’t sacrifice
>> anything for the sake of 8 bytes.
>
>
> The flag bits are there to be used, there is no sacrifice here :)
>
>>
>>
>> —
>>
>> The earlier BNF was imprecise, and it might help the big picture to
>> definitively record the application-level view.
>>
>> The current spec:
>>
>> stream:
>> header-block segment* unterminated-segment? (end-stream|rst-stream)
>>
>> segment:
>> unterminated-segment (headers-end-segment | data-end-segment)
>>
>> unterminated-segment:
>> header-block* data-octet*
>> (Transport may move the headers relative to the data, such that their
>> order within a segment is insignificant.)
>>
>
> Applications should never see frames. They should probably get things like:
> Got headers on the stream. Got bytes on the stream. Got end of message. Got
> end of stream.
>
>>
>> What we get by fixing the location of all header blocks in the data
>> stream, sacrificing multiple header blocks within a segment (replaceable by
>> a user-defined x-begin-message header), and adding 8 bytes per segment that
>> doesn’t start with headers:
>>
>> stream:
>> segment+ (end-stream|rst-stream)
>>
>> segment:
>> header-block data-octet*
>>
>> I think this better matches what application designers expect. I didn’t
>> include use of END_SEGMENT as an abnormal termination indicator in the list
>> of sacrifices, because RST_STREAM already does that.
>
>
> Application designers should never see the frame-level stuff. If they're
> ascribing semantic value to the frames, they're doing it wrong and their
> application *will* break as it goes throug a proxy. The application-layer
> grammar's atoms are: metadata, bytes, end-of-message, end-of-stream.
>
> -=R
>
>>
>>
>> It’s also much simpler to correctly specify, and describe usage.
>>
>
Received on Sunday, 20 April 2014 16:19:22 UTC