- From: Zhong Yu <zhong.j.yu@gmail.com>
- Date: Wed, 15 Jan 2014 14:46:39 -0600
- To: Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>
- Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <OSAMAM@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <Michael.Bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
Can you give an example where an intermediary benefits from decoding URI octets into unicodes? On Wed, Jan 15, 2014 at 1:55 PM, Gabriel Montenegro <Gabriel.Montenegro@microsoft.com> wrote: > Hi folks, > > > > Some of us (cc line) have been discussing the unfortunate lack of > determinism with respect to URI encoding in HTTP/1.1 and would like HTTP/2.0 > to improve upon the situation. > > > > I just opened this issue: https://github.com/http2/http2-spec/issues/342 > (enable determinism for URI encoding in HTTP/2.0). > > > > The “http” and “https” URI schemes don’t have a fixed encoding. The URI RFC > (http://tools.ietf.org/html/rfc3986#section-2.5) talks about the generic > syntax for URI components: > > Legacy URI components (before 2005) tend to use UTF-8 “or some other > superset of the US-ASCII character encoding” > New schemes (after 2005) have to use UTF-8 with percent encoding for > reserved characters. > > The first bullet explains why we currently have non-determinism for “http” > and “https” URIs. This is particularly problematic when parsing URIs at the > server side or at intermediate proxies (e.g., when looking for a cache hit). > > > > Proposed goal: enable determinism for URI encoding in HTTP/2.0. > > > > Either (1) a SETTING “SETTINGS_URI_ENCODING” or (2) an ":encoding" header. > We favor option (2). In either case, the value to denote the charset would > be a 32-bit integer equivalent to the “MIBenum” value in the IANA registry > (http://www.iana.org/assignments/character-sets/character-sets.xhtml). > Hence, the value would be 106 for UTF-8. The legacy behavior of > non-determinism is indicated via the value 0. Notice that this is a reserved > value for MIBenum. > > > > Note: We could use the charset name, but there are actually two "name" > columns in the IANA table (with the "name" value possibly having multiple > values, and the "preferred MIME name" not always being present). We would > also have to define a name to denote the legacy behavior. > > > > Some use cases: > > > > 1. A legacy client behind an HTTP/2.0 capable proxy, talking to an > HTTP/2.0 capable server. > > > > The client will use HTTP/1.1 to talk to the proxy. Without special > out-of-band knowledge, the proxy will not know the encoding for sure so > would have to turn off the assumption when talking to the server by setting > the value to 0 to denote legacy behavior. > > > > 2. A HTTP/2.0 capable client behind an HTTP/2.0 capable proxy, talking > to a legacy server. > > > > The client will use HTTP/2.0 to talk to the proxy. The never-standardized > (and now expired) 3987bis added text about using the encoding of the > containing HTML, e.g. iso-8859-1. If the server had that assumption, then > the proxy has no way to know what the encoding was. > > Thus to get correct behavior, either the client has to turn the SETTING off > (indicating legacy behavior) when going via a proxy and get no benefit, or > else have a way for the client to pass via HTTP/2.0 what the encoding was of > the containing document or protocol. The SETTING allows the latter, in this > case by using the value 4 for iso-8859-1. This information allows the > proxy's interpretation, e.g., for purposes of looking for a cache hit, > whereas today it would get a cache miss. > > > > Some pros and cons of these two mechanisms: > > > > 1. SETTINGS would naturally allow to specify a general encoding like > UTF-8 across all requests > > o Pro: This uses HTTP/2.0 as defined. > > o Cons: per-request changes are a bit hackish, requiring constantly > sending this SETTING > > > > 2. :encoding header would allow to specify the encoding on a per-request > basis, but there would be no way to specify it in general. > > o Pro: clean per-request scope for such use-cases > > o Pro: benefits from header compression > > o Cons: This implies sending it on every request per current rules in > HTTP/2.0, or another exception to those rules (similar to :authority). > > > > Comments? > > > > Thanks, > > > > Gabriel
Received on Wednesday, 15 January 2014 20:47:06 UTC