Re: SPDY Review from 陈智昌 on 2012-06-07 (ietf-http-wg@w3.org from April to June 2012)

From: 陈智昌 <willchan@chromium.org>
Date: Thu, 7 Jun 2012 12:10:27 -0700
To: Martin Nilsson <nilsson@opera.com>
Cc: ietf-http-wg@w3.org
Message-ID: <CAA4WUYjCqCxRvxhHMrNxgrVPwPGD9F5jqLT+7Td21CEadQtMQA@mail.gmail.com>
I'm glad to see that Opera has thought deeply about SPDY. I think this is
by and large a very reasonable analysis of SPDY, thanks for sharing this. I
think that issues like flow control and push are very interesting to
discuss and merit separate discussion threads, as is already happening with
push. For the rest, I've commented inline.

On Thu, Jun 7, 2012 at 7:18 AM, Martin Nilsson <nilsson@opera.com> wrote:

>
> I've taken an internal assessment of SPDY and cleaned it up a bit
> (removing internal or 3rd party NDA information). Apologies in advance to
> any misunderstandings, mistakes or inaccuracies I've managed to produce.
>
>
>
>                             SPDY review
>
> 1. Introduction
>
>  Opera Software has a long experience of HTTP optimization protocols,
>  both through our own products Opera Mini and Opera Turbo, but also
>  from integrating with 3rd party optimization, predominantly in the
>  mobile space. In this document we compare the features of SPDY with
>  our implementation and deployment experiences from equivalent
>  features of other protocols.
>
>  We struggled a bit with the format of this document, as there are a
>  lot of interdependencies between the protocol format and its
>  different features. We've decided to present the review in three
>  parts. First we look at the individual conceptual features of SPDY,
>  directly followed by a quick evaluation for each. After that comes a
>  short discussion of the binary serialization of the protocol.
>  Finally we offer our ideas for how to improve the SPDY protocol.
>
>  This document is a compilation of analysis and comments from
>  multiple teams within Opera, and I would like to specially thank Per
>  Hedbor, Markus Johansson and Jonny Rein Eriksen for their
>  contributions.
>
>  A review of the SPDY protocol, based on the ID
>  draft-mbelshe-httpbis-spdy-00. Last update 2012-04-19.
>
> 2. Feature overview
>
>  SPDY is a transport layer protocol that acts on top of TCP and
>  replaces HTTP while allowing all HTTP semantics to be mapped onto
>  SPDY. In addition SPDY provides
>
>  - Enforced pipelining
>  - Out of order response
>  - Multiplexed requests and responses
>  - Flow control
>  - Header compression
>  - Asynchronous headers
>  - Push
>
> 2.1. Pipelining & response order
>
>  Pipelining is very important to avoid the overhead of setting up new
>  TCP connections. Limiting the number of connections is also
>  beneficial for congestion control. HTTP pipelining is defined in
>  HTTP 1.1 and Opera has spent considerable effort to make it work as
>  widely as possible. Unfortunately quite a lot of equipment doesn't
>  implement HTTP 1.1 correctly, so lots of heuristics has to be
>  applied in real world scenarios.
>
>  Given the non-uniform response times of different requests from the
>  same server, due to different amount of backend work required to
>  generate a response, pipelining is prone to stalling when a slow
>  request blocks a faster request. Out-of-order response is a good
>  solution to the problem. It can easily be added to HTTP by adding a
>  request ID header in each request and expecting corresponding
>  response IDs in the responses.
>
>  By adding priority information to the requests the server can take
>  active decision in which resource to send first.
>
> 2.1.1. Evaluation
>
>  Pipelining and out-of-order response are two important concepts that
>  significantly improves HTTP performance. Client indicated priority
>  preference further enhances the protocol. It is worth noting that
>  for implementation purposes, e.g. issues in underlaying software
>  layers, multiple connections may be preferable to a single one.
>
> 2.2. Multiplexing
>
>  The SPDY multiplexing feature allows for multiple resources to load
>  in parallel over a single TCP connection. The benefit is that it
>  offers a protection against stalling of prioritized
>  resources. Stalling in this case doesn't mean that the connection is
>  idle, but it could be downloading a large, low prioritized resource,
>  when the client issues a request for a higher prioritized resource.
>
>  An alternative to multiplexing would be to open a new connection and
>  let the current one, with only low priority requests on it, starve
>  until the high priority items have been handled. This is however bad
>  from congestion control point of view, and resets any states, like
>  compression contexts, associated with the first connection (although
>  those could theoretically be copied to new connections). Also, since
>  SPDY supports push, it may not be technically possible to initiate a
>  new connection from the server to push a high priority resource to
>  the client.
>
>  Finally we can note that multiplexing in SPDY always incur a data
>  overhead. The lower latency we want for higher priority requests to
>  reach the other side, the higher the overhead will be.
>

I think I agree with you, but this paragraph is a bit short and I wanted to
ask for a clarification. When you say data overhead, you're referring to
the framing overhead, right? And when you say lower latency for high
priority requests requires higher overhead, you in particular mean that by
choosing smaller frame sizes, you have more opportunity to interleave, at
the cost of more framing overhead, right? If so, I agree.


>
> 2.2.1. Evaluation
>
>  Multiplexing allows for higher priority requests/responses to
>  interrupt lower priority uploads/downloads without OOB signaling
>  that interferes with TCP.
>
>  The SPDY specification offers no insight in how frame sizes should
>  be chosen.
>

Yes, this is up to the implementation, which should be guided by the
concerns you raised in the previous section. Would you prefer that this
discussion were included in the spec?


>
> 2.3. Flow control
>
>  SPDY provides a per stream flow control mechanism by defining a
>  recipient buffer size that the sender needs to keep track of. The
>  recipient has to acknowledge received data by responding with the
>  currently available buffer space. This feature can be used for
>  several things.
>
>  - Throttling of then entire connection due to one endpoint having
>    capacity issues.
>
>  - Dynamic reprioritization of streams.


>  - Throttling of unwanted pushed content.
>
>  - Throttling of a specific stream due to backend issues for that
>    specific stream.
>
>  - Throttling of a specific stream due to rate of consumption.
>

Just for everyone's reference, there's a lot of interesting SPDY flow
control discussion happening at
https://groups.google.com/forum/#!topic/spdy-dev/JB_aQPNI7rw/discussion.
There's a lot more to say here than just in these bullets :) Since this is
already being heavily discussed on spdy-dev@ and we're trying to figure out
our thoughts there, I don't feel particularly inclined reopen the can of
worms here until that discussion concludes.


>
> 2.3.1. Evaluation
>
>  While not stated in the SPDY specification what problem the SPDY
>  level flow control aims to solve, Mike Belshe writes an example in
>  his blog that stalling recipient of data as a situation where SPDY
>  flow control is helpful, to avoid buffering potentially unbound
>  amount of data. While the concerns is valid, flow control looks like
>  overkill to something where a per-channel pause control frame could
>  do the same job with less implementation and protocol overhead.
>
>  To throttle the entire connection there is already data rate
>  management implemented in TCP. It is possible to throttle specific
>  streams, but that should already be taken care of by the priority
>  settings. Dynamic reprioritization is possible, but would be better
>  made into an explicit stream property. Throttling or avoiding pushed
>  content is another reason, but again an explicit mechanism would be
>  preferrable. For real time communication each sender already have
>  the ability to let one stream starve other streams, though UDP would
>  be preferrable over TCP. Also note that TCP provides the URG channel
>  for exception messaging.
>
> 2.4. Header compression
>
>  For many applications the HTTP headers are a significant
>  overhead. This is most visible in the mobile space where the content
>  itself often is heavily minimized for the lowest common denominator,
>  while at the same time mobile devices can send huge amounts of
>  UAProf headers in every request. Even on Desktop the amount of
>  cookies sent for every request is significant. Header compression is
>  a good solution to this issue.
>
>  SPDY uses a binary representation of headers, encoded as a set of
>  Hollerith encoded key-value pairs. The first line of the HTTP
>  request is also broken into its components and sent as key-value
>  pairs in the stream headers. To mitigate this somewhat less compact
>  representation of the headers, SPDY compresses the headers with
>  zlib, uses a persistent LZ-window for the connection and uses a
>  static dictionary.
>
> 2.4.1. Evaluation
>
>  Header compression is a good feature with real world applications,
>  and deflate with persistent context is a good approach to achieve
>  it. A fixed dictionary is probably not very effective as it
>  complicates the implementation while only providing initial value to
>  the compression. As an example, this is how the average requests
>  sizes compares between HTTP and SPDY in different modes, using the
>  set of captured headers used to train the current SPDY 3 dictionary.
>
>    HTTP                             821.1
>    HTTP zlib compressed             543.5
>    HTTP compressed with dictionary  497.0
>    SPDY                             913.7
>    SPDY zlib compressed             606.5
>    SPDY compressed with dictionary  517.0
>
>  I.e. Just putting the HTTP request in a SPDY stream (after removing
>  disallowed headers) only differs by 20 bytes to SPDY binary header
>  format with dictionary based zlib compression. The benefit of using
>  a dictionary basically goes away entirely on the subsequent request.
>
>  For very constrained devices the major issue with using deflate is
>  the 32K LZ window. The actual decompression is very light weight and
>  doesn't require much code. When sending data, using the no
>  compression mode of deflate effectively disables compression with
>  almost no code. Using fixed huffman codes is almost as easy.


I'm happy that you agree with the use of header compression. As for the
details of header compression, I'm pretty open to tweaks and what not,
provided they're guided by data and implementation experience.


>
> 2.5. Asynchronous headers
>
>  The HEADERS frame allows either side set additional headers for the
>  request or response at any point while the stream is open. This
>  differs from normal HTTP semantics where headers are sent strictly
>  before any data, and at half duplex, and allows for parallel
>  generation of headers and contents. While this is already possible
>  to do with chunked encoding trailer, it is not a feature in popular
>  use.


> 2.5.1. Evaluation
>
>  The problem with this frame is that it creates a situation where a
>  potentially critical header for the interpretation of the content is
>  sent last, thus forcing the receiver to put data processing on hold.
>  As additional headers can be sent up until the stream is closed
>  (though it doesn't specify what is allowed on half-closed
>  connections) this potentially makes all streams subject to last
>  minute reinterpretations. The specification doesn't define the
>  behavior when updating already sent headers. This feature also makes
>  SPDY more vulnerable to protocol injection attacks.
>

Some discussion can be found here:
https://groups.google.com/d/topic/spdy-dev/bRkEsta9ovk/discussion<https://groups.google.com/forum/?fromgroups#!searchin/spdy-dev/HEADERS$20frame/spdy-dev/bRkEsta9ovk/0gFYlb8P_xUJ>


>
> 2.6. Push
>
>  SPDY allows the server to open a stream towards the client and
>  effectively push content to the client. The idea is cache seeding,
>  where resources likely or known to be requested soon will be pushed
>  to the client. Server push has to happen as a result of an earlier
>  client request, which must be explicitly linked.
>
> 2.6.1. Evaluation
>
>  As defined the feature is not powerful enough to push non-request
>  related content (such as new RSS items), while also lacking
>  mechanisms to limit the size of pushed content or fully disable
>  pushing. The client has the option to read and discard this
>  information, but that may be a costly waste of bandwidth.
>
>
> 3. Binary representation
>
> 3.1. Control frame version
>
>  The version field of the control frame is odd for several different
>  reasons.
>
>  - The field is huge, 15 bits, which given the rate of incompatible
>  changes in released protocols (IPv6, HTTP 1.1 etc) appears
>  unwarranted. Also, adding more room for the version field is easy if
>  different versions are considered incompatible. The version field is
>  meaningless if different versions are compatible.
>
>  - The version is sent in every control frame, which appears overly
>  redundant.
>
>  - There is no clear semantics of how the version should be
>  handled. While status code 4, UNSUPPORTED_VERSION, gives a clue that
>  messages with unknown versions should be rejected, there is only a
>  mechanism to do it for streams.
>
> 3.2. Field sizes
>
>  The specification is consistently using 24 bit frame sizes but 32
>  bit sizes for almost all counters and size fields for frame internal
>  data. In design notes 4.4. "Fixed vs Variable Length Fields" the
>  authors note that this is done for speed and simplicity. There is no
>  reason why an 8, 16 or 24 bit field wouldn't be simple or fast
>  enough, especially given the existence of 31-bit and 3-bit fields
>  that also exists in the specification.
>
> 3.3. Directionality
>
>  While the SPDY draft attempt to implement two independent layers, a
>  general purpose framing layer with an HTTP-like layer on top of it,
>  this separation creates some unneeded complexities. The two main use
>  cases for SPDY is HTTP request-response and HTTP push. Both of these
>  are unidirectional once established and both of them have some
>  mandatory header fields that are more efficiently encoded in a fixed
>  structure.
>
> 3.4. Stream header coding
>
>  Multiple headers with the same name are encoded as a list of null
>  separated values. This removes the possibility to store binary
>  values in headers, which both prevents unencoded values and more
>  natural representations of e.g. size, date, age, hash etc.
>
> 4. Proposed changes
>
>  This section details some ideas for improvements of the SPDY draft.
>
> 4.1. Handshake
>
>  We believe the versioning semantics needs to be better formalized,
>  and that the version information can be moved into a connection
>  handshake. In its simplest form a SETTINGS frame is always sent
>  first in the connection, establishing the version and connection
>  parameters. A connection should be rejected if the version is out of
>  range for the server. The server should respond with the highest
>  supported protocol version before closing.
>

I will freely admit the versioning issue may need some work, and this is
worth discussion. For Chromium, this has not been an issue since we only
deploy over TLS using the NPN extension which handles this negotiation for
us in a fairly clean manner. We haven't invested much effort on the
non-TLS-NPN deployment case.


>
> 4.2. Field sizes
>
>  Define a reasonable range for each field, based on statistics where
>  possible, and resize them where appropriate.
>

"Define a reasonable range" :) I'm open to resizing fields. There's been
some amount of debate on certain fields on the spdy-dev@ mailing list.


>
> 4.3. Specialized HTTP frames
>
>  For the common case of using SPDY as an HTTP substitute, create
>  special frames to open HTTP request-response and HTTP push
>  streams. This removes the need for explicit unidirectional flag and
>  mandatory header values.
>

What do you mean by mandatory header values? Those are specific to HTTP
over SPDY, right? I'm not convinced about the necessity of baking in HTTP
assumptions into the core SPDY layer, rather than defining the mandatory
headers at the HTTP layering over SPDY level.


> 4.4. Typed key-value pairs
>
>  Create a better structure for the key-value pair lists where it is
>  possible to have binary values and typed values. The types can
>  include binary string, integer and list. Standardize how all
>  standard HTTP headers should be normalized and typed, including
>  which ones are disallowed to be in lists of multiple values.


>  Introduce a mechanism that allows headers to be properties of the
>  stream, to be used internally in SPDY, and move optional parameters
>

Did you mean properties of the session? Headers are already metadata for
the stream.


>  and parameters that seldom deviate from default value to use
>  internal headers (e.g. priority, slot, associate-to-stream-id). This
>  also provides a mechanism for easy extension.


> 4.5. Push
>
>  The client needs to be in control of what pushed data to accept. A
>  simple accept flag may be too simplisitc, as it prevents tricks like
>  redirect collapsing, but would be an improvement nevertheless.
>

I'm open to trying to better define when a client accepts pushed data. A
primary concern here is the possible deployment of intermediaries that
out-right disable server push, so server push becomes no longer
realistically deployable on the internet. Of course, clients can already
RST_STREAM(CANCEL) server pushes, and it'd be unfortunate if intermediaries
always did this. There's definitely room for discussion here, as has
already started happening on Gabriel's thread.


>
> 4.6. Flow control
>
>  Remove the SPDY flow control and instead introduce a control
>  mechanism to actively put a stream on hold. If priority is moved to
>  stream properties, as suggested in 3.4., dynamic reprioritization is
>  possible.
>

Do you have implementation experience that justifies this suggestion? I'm
skeptical about this solution, which has been discussed on
https://groups.google.com/forum/#!topic/spdy-dev/JB_aQPNI7rw/discussion.
That said, it's a very worthwhile discussion to have.


>
> 4.7. Header compression
>
>  Remove the header compression dictionary. It creates little benefit
>  for the added complexity it introduces.
>
> 4.8. Asynchronous headers
>
>  Specify the asynchronous header feature stricter so that overwriting
>  HTTP headers isn't allowed and the receiver knows in advance what
>  headers will be generated later.
>
> /Martin Nilsson
>
> --
> Using Opera's revolutionary email client: http://www.opera.com/mail/
>
>
Received on Thursday, 7 June 2012 19:11:00 UTC