- From: Kari Hurtta <hurtta-ietf@elmme-mailer.org>
- Date: Thu, 13 Oct 2016 06:34:30 +0300 (EEST)
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- CC: HTTP working group mailing list <ietf-http-wg@w3.org>, Kari Hurtta <hurtta-ietf@elmme-mailer.org>
> Htmlized: https://tools.ietf.org/html/draft-kamp-httpbis-structure-00
3. HTTP/1 serialization of HTTP header Common Structure
https://tools.ietf.org/html/draft-kamp-httpbis-structure-00#section-3
| h1_unicode_string = DQUOTE *(
| ( "\" DQUOTE )
| ( "\" "\" ) /
| ( "\" "u" 4*HEXDIG ) /
| 0x20-21 /
| 0x23-5B /
| 0x5D-7E /
| 0x80-F7
| ) DQUOTE
| # XXX: how to say/import "UTF-8 encoding" ?
| # HTTP1 unfriendly codepoints (00-1f, 7f) must be
| # encoded with \uXXXX escapes
How about
RFC 3629: UTF-8, a transformation format of ISO 10646
https://tools.ietf.org/html/rfc3629
4. Syntax of UTF-8 Byte Sequences
https://tools.ietf.org/html/rfc3629#section-4
| UTF8-octets = *( UTF8-char )
| UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
| UTF8-1 = %x00-7F
| UTF8-2 = %xC2-DF UTF8-tail
| UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
| %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
| UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
| %xF4 %x80-8F 2( UTF8-tail )
| UTF8-tail = %x80-BF
|
| NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This
| grammar is believed to describe the same thing Unicode describes, but
| does not claim to be authoritative. Implementors are urged to rely
| on the authoritative source, rather than on this ABNF.
This
| # HTTP1 unfriendly codepoints (00-1f, 7f) must be
| # encoded with \uXXXX escapes
means that you can not use UTF8-1 however.
Are uou meaining following:
h1_unicode_utf8 = h1_utf8_1 / UTF8-2 / UTF8-3 / UTF8-4
h1_utf8_1 = ( "\" "\" ) /
( "\" "u" 4*HEXDIG ) /
0x20-21 /
0x23-5B /
0x5D-7E /
0x80-F7
UTF8-2 = <UTF8-2, defined in RFC 3629, Section 4>
UTF8-3 = <UTF8-3, defined in RFC 3629, Section 4>
UTF8-4 = <UTF8-4, defined in RFC 3629, Section 4>
/ Kari Hurtta
Received on Thursday, 13 October 2016 03:35:08 UTC