Re: Dealing with Invalid UTF-8 from Tatsuhiro Tsujikawa on 2013-09-03 (ietf-http-wg@w3.org from July to September 2013)

From: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date: Tue, 3 Sep 2013 22:11:03 +0900
To: Roberto Peon <grmocg@gmail.com>
Cc: Julian Reschke <julian.reschke@gmx.de>, James Snell <jasnell@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAPyZ6=KgmPTEdb=vBGdib52t3LKj0XzZ5d4AVJdYXahJ1duaFw@mail.gmail.com>

On Tue, Sep 3, 2013 at 2:39 AM, Roberto Peon <grmocg@gmail.com> wrote:

> We have not agreed that the compressor should have to interpret
> Utf-8, but we have not agreed that it shouldn't.
>
> I think it should be doing byte-based  comparisons for values right now,
> with the rest (e.g. other encodings) to be decided after the bare-bones is
> working and interoperating.
>
> +1 for removing UTF-8 encoded values from the draft. Just interacting byte
array is a way to go. UTF-8 may bring some value if HTTP1 compatibility
issue was resolved, but not now.

Best regards,
Tatsuhiro Tsujikawa



> -=R
>
> On Sep 2, 2013 7:40 AM, "Julian Reschke" <julian.reschke@gmx.de> wrote:
> >
> > Hi,
> >
> > I note that compression spec still pretends that field values are
> encoded in UTF-8.
> >
> > Have we agreed on this? If so, how do you forward existing header fields
> that may not be UTF-8???
> >
> > Best regards, Julian
> >
> >
> >
> > On 2013-08-16 18:31, Roberto Peon wrote:
> >>
> >> Yup.
> >>
> >>
> >> On Fri, Aug 16, 2013 at 2:30 AM, Julian Reschke <julian.reschke@gmx.de
> >> <mailto:julian.reschke@gmx.de>> wrote:
> >>
> >>     On 2013-08-14 00:12, James M Snell wrote:
> >>
> >>         https://github.com/http2/__http2-spec/issues/232
> >>
> >>         <https://github.com/http2/http2-spec/issues/232>
> >>
> >>         The current header compression draft states that header field
> values
> >>         are UTF-8. However, the spec says nothing about how to deal with
> >>         overlong encodings, invalid UTF-8 octet sequences or valid UTF-8
> >>         sequences that encode invalid Unicode codepoints.
> >>
> >>         I recommend stating that any of these conditions ought to result
> >>         in an
> >>         error. An encoder MUST NOT output any of these; and a decoder
> >>         ought to
> >>         signal a connection error if encountered.
> >>
> >>
> >>     ...whether this is an issue or not depends on what we decide we
> >>     respect to whether field values are octet sequences or strings...
> >>
> >>     Best regards, Julian
> >>
> >>
> >>
> >
>

Received on Tuesday, 3 September 2013 13:11:50 UTC