Re: END_SEGMENT and END_STREAM redundant

On 2014–04–20, at 11:59 AM, Roberto Peon <grmocg@gmail.com> wrote:

> On Sat, Apr 19, 2014 at 7:23 PM, David Krauss <potswa@gmail.com> wrote:
> There’s some circular reasoning here. Interoperability refers to what intermediaries may change, or to a lesser extent what synonymous bit-codings portable APIs (e.g. Javascript XHR) may merge. If an underlying representation may be changed according to the semantics it expresses, then relying on bits is not interoperable.
> 
> Interoperable implementations of the protocol may not understand each other at the application layer. That is not a problem the protocol can solve.

I’m talking about whether a proxy is allowed to internally use a generic END_SEGMENT symbol, or if it must distinguish the case of a bit in an empty DATA or HEADERS frame following a frame of the other type. In this way the application layer is creeping into transport.

Additional questions I still have are whether data may be coalesced across a headers block, and whether header blocks in the same segment may be coalesced. I don’t think there are many use-cases for either, and they could be surprising to applications that forget END_SEGMENT. However, the first case does seem to be allowed according to the current spec. (It says nothing about coalescing headers, but then again I can’t find where it explicitly describes coalescing data either. If headers are not guaranteed to arrive at any particular point, they could all get pushed to the beginning or end of the segment, anyway.)

These are protocol and transport issues.

> A different programmer sensibility is often applied to binary coding than to text, but it’s best to use the same approach either way. A format defines the expression of a variety of messages, and those messages comprise the only defined meaning.
> 
> 
> I suspect we're arguing semantics at such a level at this point that it doesn't matter, but the protocol cannot define a meaning: It defines a grammar.

Also restrictions and allowances on transporting messages in that grammar, including some degree of rearrangement. What the sender sees is not always what the receiver gets.

> At this point I suspect we're mostly violently agreeing.

Mostly. It comes down to the grammar:

>             (HEADERS_WITH_END_SEGMENT | DATA_WITH_END_SEGMENT)

Having these two symbols allows for saving 8 bytes, but introduces possible application design confusion. Application designers need to decide whether the two symbols should (or should not) mean the same thing, and API designers whether to support the distinction. Such support actually requires three symbols, with an additional AUTOSELECT_END_SEGMENT to be used by the sender.

Given my suspicions about coalescing data across headers, right now I’m thinking that each HEADERS frame should start a new “message” and all of segmentation is redundant. Applications that want a sequence of data-only messages with no metadata can spend 8 bytes on an empty HEADERS frame. Header-only protocols see no overhead.

In any case, 8 bytes is only equal to the overhead of the extra DATA frame that any segmentation implicitly requires, and nothing compared to the total overhead of flushing which is also likely to happen. We shouldn’t sacrifice anything for the sake of 8 bytes.

—

The earlier BNF was imprecise, and it might help the big picture to definitively record the application-level view.

The current spec:

stream:
	header-block segment* unterminated-segment? (end-stream|rst-stream)

segment:
	unterminated-segment (headers-end-segment | data-end-segment)

unterminated-segment:
	header-block* data-octet*
(Transport may move the headers relative to the data, such that their order within a segment is insignificant.)

What we get by fixing the location of all header blocks in the data stream, sacrificing multiple header blocks within a segment (replaceable by a user-defined x-begin-message header), and adding 8 bytes per segment that doesn’t start with headers:

stream:
	segment+ (end-stream|rst-stream)

segment:
	header-block data-octet*

I think this better matches what application designers expect. I didn’t include use of END_SEGMENT as an abnormal termination indicator in the list of sacrifices, because RST_STREAM already does that.

It’s also much simpler to correctly specify, and describe usage.

Received on Sunday, 20 April 2014 10:59:34 UTC