- From: Francois Yergeau <FYergeau@alis.com>
- Date: Mon, 09 Jun 2003 16:09:37 -0400
- To: ietf-charsets@iana.org
...just submitted to secretariat. This revision addresses two substantive issues raised by the IESG during post-last-call evaluation, as well as a few minor points that have shown up since -04. Changes from IESG review: ============================================================================ = One director requested that it be made clear that the ABNF in section 4 is not normative, both because it is new and untested -- added between Draft and Standard -- and because RFC 2234 is only Proposed. Section 4 now begins with a new para: For the convenience of implementors using ABNF, a definition of UTF-8 in ABNF syntax is given here. and ends with a new Note: NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This grammar is believed to describe the same thing as what Unicode describes, but does not claim to be authoritative. Implementors are urged to rely on the authoritative source, rather than on this ABNF. ============================================================================ = One director requested additional material in Security Considerations about the fact that octet-by-octet comparison is not sufficient (the Unicode normalization issue). The following has been added at the end of section 10: Security may also be impacted by a characteristic of several character encodings, including UTF-8: the "same thing" (as far as a user can tell) can be represented by several distinct character sequences. For instance, an e with acute accent can be represented by the precomposed U+00E9 E ACUTE character or by the canonically equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE). Even though UTF-8 provides a single byte sequence for each character sequence, the existence of multiple character sequences for "the same thing" may have security consequences whenever string matching, indexing, searching, sorting, regular expression matching and selection are involved. An example would be string matching of an identifier appearing in a credential and in access control list entries. This issue is amenable to solutions based on Unicode Normalization Forms, see [UAX15]. together with a new entry in Informative references for "Unicode Standard Annex #15: Unicode Normalization Forms". Minor changes: ============================================================================ = In Introduction, add "code position" to "(the character number, a.k.a. code point or Unicode scalar value)". Rationale: "code position" is the 10646 term. ============================================================================ = In Introduction, change o The octet values C0, C1, FE and FF never appear. If the range of character numbers is restricted to U+0000..U+10FFFF (the UTF-16 accessible range), then the octet values F5..FD also never appear. to o The octet values C0, C1, and F5 to FF never appear. Rationale: we do restrict to U+0000..U+10FFFF now, the "If" is superfluous. ============================================================================ = In Introduction, add "byte-value" to "The lexicographic sorting order of..." Rationale: clarification, that's what it is. ============================================================================ = Add Chris Newman to Acknowlegments Rationale: he had just slipped through the cracks. With apologies. ============================================================================ = -- François Yergeau
Received on Monday, 9 June 2003 16:14:20 UTC