W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2016

Re: New Version Notification for draft-kamp-httpbis-structure-00.txt (fwd)

From: Kari Hurtta <hurtta-ietf@elmme-mailer.org>
Date: Thu, 13 Oct 2016 06:34:30 +0300 (EEST)
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
CC: HTTP working group mailing list <ietf-http-wg@w3.org>, Kari Hurtta <hurtta-ietf@elmme-mailer.org>
Message-Id: <20161013033431.B21C413FF4@welho-filter2.welho.com>
> Htmlized:       https://tools.ietf.org/html/draft-kamp-httpbis-structure-00

3.  HTTP/1 serialization of HTTP header Common Structure
https://tools.ietf.org/html/draft-kamp-httpbis-structure-00#section-3

|       h1_unicode_string = DQUOTE *(
|                       ( "\" DQUOTE )
|                       ( "\" "\" ) /
|                       ( "\" "u" 4*HEXDIG ) /
|                       0x20-21 /
|                       0x23-5B /
|                       0x5D-7E /
|                       0x80-F7
|                       ) DQUOTE
|               # XXX: how to say/import "UTF-8 encoding" ?
|               # HTTP1 unfriendly codepoints (00-1f, 7f) must be
|               # encoded with \uXXXX escapes

How about

RFC 3629: UTF-8, a transformation format of ISO 10646
https://tools.ietf.org/html/rfc3629

4.  Syntax of UTF-8 Byte Sequences
https://tools.ietf.org/html/rfc3629#section-4

|   UTF8-octets = *( UTF8-char )
|   UTF8-char   = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
|   UTF8-1      = %x00-7F
|   UTF8-2      = %xC2-DF UTF8-tail
|   UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
|                 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
|   UTF8-4      = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
|                 %xF4 %x80-8F 2( UTF8-tail )
|   UTF8-tail   = %x80-BF
|
|   NOTE -- The authoritative definition of UTF-8 is in [UNICODE].  This
|   grammar is believed to describe the same thing Unicode describes, but
|   does not claim to be authoritative.  Implementors are urged to rely
|   on the authoritative source, rather than on this ABNF.

This

|               # HTTP1 unfriendly codepoints (00-1f, 7f) must be
|               # encoded with \uXXXX escapes

means that you can not use UTF8-1 however.

Are uou meaining following:

h1_unicode_utf8 = h1_utf8_1 / UTF8-2 / UTF8-3 / UTF8-4
h1_utf8_1 = ( "\" "\" ) /
            ( "\" "u" 4*HEXDIG ) /
            0x20-21 / 
            0x23-5B / 
            0x5D-7E /
            0x80-F7
UTF8-2 = <UTF8-2, defined in RFC 3629, Section 4>
UTF8-3 = <UTF8-3, defined in RFC 3629, Section 4>
UTF8-4 = <UTF8-4, defined in RFC 3629, Section 4>

/ Kari Hurtta
Received on Thursday, 13 October 2016 03:35:08 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 13 October 2016 03:35:12 UTC