- From: Francois Yergeau <FYergeau@alis.com>
- Date: Mon, 09 Jun 2003 16:54:57 -0400
- To: ietf-charsets@iana.org
I forgot to mention that I also updated the [UNICODE] reference to Unicode 4.0. -- François Yergeau > -----Message d'origine----- > De : Francois Yergeau [mailto:FYergeau@alis.com] > Envoyé : 9 juin 2003 16:10 > À : ietf-charsets@iana.org > Objet : New draft-yergeau-rfc2279bis-05.txt > > > ...just submitted to secretariat. > > This revision addresses two substantive issues raised by the > IESG during > post-last-call evaluation, as well as a few minor points that > have shown up > since -04. > > Changes from IESG review: > ============================================================== > ============== > = > > One director requested that it be made clear that the ABNF in > section 4 is > not normative, both because it is new and untested -- added > between Draft > and Standard -- and because RFC 2234 is only Proposed. > Section 4 now begins > with a new para: > > For the convenience of implementors using ABNF, a > definition of UTF-8 > in ABNF syntax is given here. > > and ends with a new Note: > > NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This > grammar is believed to describe the same thing as what Unicode > describes, but does not claim to be authoritative. Implementors are > urged to rely on the authoritative source, rather than on > this ABNF. > > ============================================================== > ============== > = > > One director requested additional material in Security > Considerations about > the fact that octet-by-octet comparison is not sufficient (the Unicode > normalization issue). The following has been added at the > end of section > 10: > > Security may also be impacted by a characteristic of several > character encodings, including UTF-8: the "same thing" (as far as a > user can tell) can be represented by several distinct character > sequences. For instance, an e with acute accent can be > represented by > the precomposed U+00E9 E ACUTE character or by the canonically > equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE). > Even though > UTF-8 provides a single byte sequence for each character sequence, > the existence of multiple character sequences for "the same thing" > may have security consequences whenever string matching, indexing, > searching, sorting, regular expression matching and selection are > involved. An example would be string matching of an identifier > appearing in a credential and in access control list entries. This > issue is amenable to solutions based on Unicode > Normalization Forms, > see [UAX15]. > > together with a new entry in Informative references for > "Unicode Standard > Annex #15: Unicode Normalization Forms". > > > Minor changes: > ============================================================== > ============== > = > > In Introduction, add "code position" to "(the character > number, a.k.a. code > point or Unicode scalar value)". > > Rationale: "code position" is the 10646 term. > > ============================================================== > ============== > = > > In Introduction, change > > o The octet values C0, C1, FE and FF never appear. If the range of > character numbers is restricted to U+0000..U+10FFFF (the UTF-16 > accessible range), then the octet values F5..FD also > never appear. > > to > > o The octet values C0, C1, and F5 to FF never appear. > > Rationale: we do restrict to U+0000..U+10FFFF now, the "If" > is superfluous. > > ============================================================== > ============== > = > > In Introduction, add "byte-value" to "The lexicographic > sorting order of..." > > Rationale: clarification, that's what it is. > > ============================================================== > ============== > = > > Add Chris Newman to Acknowlegments > > Rationale: he had just slipped through the cracks. With apologies. > > ============================================================== > ============== > = > > -- > François Yergeau >
Received on Monday, 9 June 2003 17:32:16 UTC