Re: Dealing with Invalid UTF-8 from Ryan Hamilton on 2013-09-03 (ietf-http-wg@w3.org from July to September 2013)

From: Ryan Hamilton <rch@google.com>
Date: Tue, 3 Sep 2013 07:38:00 -0700
To: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Cc: Roberto Peon <grmocg@gmail.com>, Julian Reschke <julian.reschke@gmx.de>, James Snell <jasnell@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAJ_4DfQjsDj09gCpRNET78EO_0YuDmH9zoSKnUWM63LUt+Lw+w@mail.gmail.com>

I also completely agree.  We may want to specify that some headers have
some particular encoding (UTF-8 or typed encoding, or something else) at
the HTTP layer, but the *compressor* should be defined to simply operate on
byte sequences.


On Tue, Sep 3, 2013 at 6:11 AM, Tatsuhiro Tsujikawa
<tatsuhiro.t@gmail.com>wrote:

>
>
>
> On Tue, Sep 3, 2013 at 2:39 AM, Roberto Peon <grmocg@gmail.com> wrote:
>
>> We have not agreed that the compressor should have to interpret
>> Utf-8, but we have not agreed that it shouldn't.
>>
>> I think it should be doing byte-based  comparisons for values right now,
>> with the rest (e.g. other encodings) to be decided after the bare-bones is
>> working and interoperating.
>>
>> +1 for removing UTF-8 encoded values from the draft. Just interacting
> byte array is a way to go. UTF-8 may bring some value if HTTP1
> compatibility issue was resolved, but not now.
>
> Best regards,
> Tatsuhiro Tsujikawa
>
>
>
>> -=R
>>
>> On Sep 2, 2013 7:40 AM, "Julian Reschke" <julian.reschke@gmx.de> wrote:
>> >
>> > Hi,
>> >
>> > I note that compression spec still pretends that field values are
>> encoded in UTF-8.
>> >
>> > Have we agreed on this? If so, how do you forward existing header
>> fields that may not be UTF-8???
>> >
>> > Best regards, Julian
>> >
>> >
>> >
>> > On 2013-08-16 18:31, Roberto Peon wrote:
>> >>
>> >> Yup.
>> >>
>> >>
>> >> On Fri, Aug 16, 2013 at 2:30 AM, Julian Reschke <julian.reschke@gmx.de
>> >> <mailto:julian.reschke@gmx.de>> wrote:
>> >>
>> >>     On 2013-08-14 00:12, James M Snell wrote:
>> >>
>> >>         https://github.com/http2/__http2-spec/issues/232
>> >>
>> >>         <https://github.com/http2/http2-spec/issues/232>
>> >>
>> >>         The current header compression draft states that header field
>> values
>> >>         are UTF-8. However, the spec says nothing about how to deal
>> with
>> >>         overlong encodings, invalid UTF-8 octet sequences or valid
>> UTF-8
>> >>         sequences that encode invalid Unicode codepoints.
>> >>
>> >>         I recommend stating that any of these conditions ought to
>> result
>> >>         in an
>> >>         error. An encoder MUST NOT output any of these; and a decoder
>> >>         ought to
>> >>         signal a connection error if encountered.
>> >>
>> >>
>> >>     ...whether this is an issue or not depends on what we decide we
>> >>     respect to whether field values are octet sequences or strings...
>> >>
>> >>     Best regards, Julian
>> >>
>> >>
>> >>
>> >
>>
>
>

Received on Tuesday, 3 September 2013 14:38:28 UTC