- From: Roberto Peon <grmocg@gmail.com>
- Date: Sun, 20 Apr 2014 21:34:11 -0700
- To: David Krauss <potswa@gmail.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNcNc+H_MccOr1OGLJNAaSLxCNSXBARHAyL7P1sksKRhSQ@mail.gmail.com>
An empty HEADERS frame may not actually convey zero metadata, if HPACK is being used. If one wishes to overload HEADERS to signal END_SEGMENT, then one must: 1) ensure that applications do not use the same keyspace as the reserved key. - can be done, but "smells bad" 2) ensure that the compressor state is modified properly to convey *only* the key - might hurt a bit, as this essentially renders the compressor useless... 3) ensure that an empty metadata frame does not convey some meaning. - Not guaranteed today: an empty METADATA frame might convey some meaning at the application layer. END_SEGMENT as a bit avoids these problems. I agree that we might want more text in there to clarify what should be visible to the application (i.e. *not* frames, but rather metadata, data, end-segment, end-stream), subject to editor approval. -=R On Sun, Apr 20, 2014 at 8:30 PM, David Krauss <potswa@gmail.com> wrote: > > On 2014–04–21, at 4:34 AM, Roberto Peon <grmocg@gmail.com> wrote: > > FYI, you did't reply to the list, and you probably should :) > > > Argh, I’m always forgetting to “reply all”. I’ll just not delete anything > from the quoted text here. My only response is at the very end. > > > On Sun, Apr 20, 2014 at 8:39 AM, David Krauss <potswa@gmail.com> wrote: > >> >> On 2014–04–20, at 7:27 PM, Roberto Peon <grmocg@gmail.com> wrote: >> >> You're ascribing a semantic that I'm not thinking is the semantic of the >> protocol. >> The protocol ensures that any END_SEGMENT occurs at the same byte offset >> from the first byte of a stream as when it was created. >> Similarly, the protocol ensures that HEADERS are at the same byte offset. >> >> >> Okay, good to know. It should be clarified in the spec. >> >> That depends on how it is defined. As I state above, END_SEGMENT or >> HEADERS are always at the same byte offset in the datastream. >> Frames otherwise have zero semantic meaning, as they can be broken >> up/coalesced at will. >> >> >> Well, adjacent HEADERS also cannot be coalesced, regardless of >> END_SEGMENT, right? You mentioned a headers-only protocol. >> > > true > > >> >> They always mean the same thing. Frame-level details should not be >> surfaced at the application layer. HTTP2 is not a frame-oriented protocol. >> It is either message (END_SEGMENT) or bytestream. Frames are subordinate to >> either of these and form the building blocks for creating streams of >> messages. >> >> >> That’s the gist of my question: what the application layer may access. If >> (and only if) these details are available to application layer protocols, >> they may be changed by an intermediary. It is an identical relationship >> because, as you said, the protocol is the grammar. HTTP/2 isn’t litigating >> application-level meaning. >> > > Gotcha. > > >> >> Given my suspicions about coalescing data across headers, right now I’m >>> thinking that each HEADERS frame should start a new “message” and all of >>> segmentation is redundant. Applications that want a sequence of data-only >>> messages with no metadata can spend 8 bytes on an empty HEADERS frame. >>> Header-only protocols see no overhead. >>> >> >> No. >> >> >> Now it’s my turn to say we’re almost entirely “violently” agreeing :) . >> Headers cannot be coalesced, and they always have a fixed position in the >> data stream. The only thing left between your current intention and >> elimination of segmentation is representation of the “end message” metadata >> as just another header, or an optimized flag bit. >> >> In any case, headers do implement segmentation semantics even without >> using the END_SEGMENT bit. >> > > Of data in a data-stream, yes, but otherwise they currently do not (unless > one declared some key-value which did) denote end of message > (end_of_segment). > > >> >> In any case, 8 bytes is only equal to the overhead of the extra DATA >>> frame that any segmentation implicitly requires, and nothing compared to >>> the total overhead of flushing which is also likely to happen. We shouldn’t >>> sacrifice anything for the sake of 8 bytes. >>> >> >> The flag bits are there to be used, there is no sacrifice here :) >> >> >> I’m concerned with interface complexity, not wire overhead or room for >> future flag bits. >> > > Gotcha. > > >> >> The earlier BNF was imprecise, and it might help the big picture to >>> definitively record the application-level view. >>> >>> The current spec: >>> >>> stream: >>> header-block segment* unterminated-segment? (end-stream|rst-stream) >>> >>> segment: >>> unterminated-segment (headers-end-segment | data-end-segment) >>> >>> unterminated-segment: >>> header-block* data-octet* >>> (Transport may move the headers relative to the data, such that their >>> order within a segment is insignificant.) >>> >> >> Applications should never see frames. They should probably get things >> like: Got headers on the stream. Got bytes on the stream. Got end of >> message. Got end of stream. >> >> >> Agreed, although the above grammar does not have any frames (aside from >> rst-stream). It condenses what the stream protocol is guaranteed to >> transport from one end to the other. (Except for ordering of headers and >> data-octets, which I was unaware of.) >> >> What we get by fixing the location of all header blocks in the data >>> stream, sacrificing multiple header blocks within a segment (replaceable by >>> a user-defined x-begin-message header), and adding 8 bytes per segment >>> that doesn’t start with headers: >>> >>> stream: >>> segment+ (end-stream|rst-stream) >>> >>> segment: >>> header-block data-octet* >>> >>> I think this better matches what application designers expect. I didn’t >>> include use of END_SEGMENT as an abnormal termination indicator in the list >>> of sacrifices, because RST_STREAM already does that. >>> >> >> Application designers should never see the frame-level stuff. If they're >> ascribing semantic value to the frames, they're doing it wrong >> >> >> OK. That needs to be specified. Otherwise, they have no reason not to do >> so. >> > > Good point. IF that isn't clear it *really* needs to be clear, else we > will have interop problems (the spec explicitly allows for > coalescing/breaking up frames). > > >> >> and their application *will* break as it goes throug a proxy. >> >> >> The reason I’ve been asking these crazy questions about coalescing is >> because I’m looking for a reason something might break when it goes through >> a proxy. But, I don’t see one, so headers-end-segment and >> data-end-segment are de-facto different application-level symbols. >> >> The application-layer grammar's atoms are: metadata, bytes, >> end-of-message, end-of-stream. >> >> >> Isn’t it all much simpler if end-of-message is just another piece of >> metadata? >> > >> To prevent abuse by applications, the protocol needs to define some >> canonicalization which allows proxies to mangle “wrong” usage, and shuffle >> the headers-end-segment and data-end-segment symbols so they become >> indistinguishable. >> >> One approach would be to define an end-segment header, and define the >> END_SEGMENT bit to be an optimized representation for it. Then an empty >> DATA frame or an empty HEADERS frame with END_SEGMENT set are both encoding >> the same thing, and proxies may forward either as the other. END_SEGMENT on >> a DATA frame encodes a header set with only the end-segment header, and >> END_SEGMENT on a non-empty HEADER frame adds it to the set. >> > > The issue there is that then we're not presenting an arbitrary-metadata > interface with HEADERS. It would be arbitrary metadata *except* the > end-segment header, which seems annoying from an application-use point of > view. > > > It’s only a framing-level difference. This one header may be encoded > without a HEADERS frame. Vice-versa, an end-segment symbol may be encoded > (validly, but inefficiently) without the END_SEGMENT bit as a normal > header, unless there’s a special rule against doing so. Frames aren’t > visible to the application, though, so this is just the sort of separation > we want. > > Application-level language bindings are strictly simplified by folding any > special segmentation APIs into the general header handling. The > complication is moved into into header encoding/decoding, where it’s an > isolated special case. For the encoder, special handling is optional > (although very desirable) unless the spec requires it. > > Is there some property I’m missing that sets end-segment apart from > (other) headers? Both require subsequent data to start a new DATA frame. > Although that’s not directly visible to the application, it guarantees > delivery at a particular byte offset. Both may also encourage a flush that > might otherwise be deferred by too little buffered data. What else is there? > >
Received on Monday, 21 April 2014 04:34:39 UTC