- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Mon, 09 Jun 2003 14:11:00 -0700
- To: "'Francois Yergeau'" <FYergeau@alis.com>, ietf-charsets@iana.org
Hi Francois, Which reminds me that the recently published RFC 3454 (December 2002) is based on Unicode/3.2 (of course). But there are (I believe) some new characters registered in Unicode/4.0. Also, Markus Kuhn's good point recently on Linux I18N list that the character class of SOFT-HYPHEN just changed in Unicode/4.0 (which affects Stringprep). Since a lot of IETF WGs are doing Stringprep profiles, it would be desirable that they were referencing Unicode/4.0 - thus new exclusions tables are needed, for example. Comments? Cheers, - Ira McDonald High North Inc -----Original Message----- From: Francois Yergeau [mailto:FYergeau@alis.com] Sent: Monday, June 09, 2003 4:55 PM To: ietf-charsets@iana.org Subject: RE: New draft-yergeau-rfc2279bis-05.txt I forgot to mention that I also updated the [UNICODE] reference to Unicode 4.0. -- François Yergeau > -----Message d'origine----- > De : Francois Yergeau [mailto:FYergeau@alis.com] > Envoyé : 9 juin 2003 16:10 > À : ietf-charsets@iana.org > Objet : New draft-yergeau-rfc2279bis-05.txt > > > ...just submitted to secretariat. > > This revision addresses two substantive issues raised by the > IESG during > post-last-call evaluation, as well as a few minor points that > have shown up > since -04. > > Changes from IESG review: > ============================================================== > ============== > = > > One director requested that it be made clear that the ABNF in > section 4 is > not normative, both because it is new and untested -- added > between Draft > and Standard -- and because RFC 2234 is only Proposed. > Section 4 now begins > with a new para: > > For the convenience of implementors using ABNF, a > definition of UTF-8 > in ABNF syntax is given here. > > and ends with a new Note: > > NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This > grammar is believed to describe the same thing as what Unicode > describes, but does not claim to be authoritative. Implementors are > urged to rely on the authoritative source, rather than on > this ABNF. > > ============================================================== > ============== > = > > One director requested additional material in Security > Considerations about > the fact that octet-by-octet comparison is not sufficient (the Unicode > normalization issue). The following has been added at the > end of section > 10: > > Security may also be impacted by a characteristic of several > character encodings, including UTF-8: the "same thing" (as far as a > user can tell) can be represented by several distinct character > sequences. For instance, an e with acute accent can be > represented by > the precomposed U+00E9 E ACUTE character or by the canonically > equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE). > Even though > UTF-8 provides a single byte sequence for each character sequence, > the existence of multiple character sequences for "the same thing" > may have security consequences whenever string matching, indexing, > searching, sorting, regular expression matching and selection are > involved. An example would be string matching of an identifier > appearing in a credential and in access control list entries. This > issue is amenable to solutions based on Unicode > Normalization Forms, > see [UAX15]. > > together with a new entry in Informative references for > "Unicode Standard > Annex #15: Unicode Normalization Forms". > > > Minor changes: > ============================================================== > ============== > = > > In Introduction, add "code position" to "(the character > number, a.k.a. code > point or Unicode scalar value)". > > Rationale: "code position" is the 10646 term. > > ============================================================== > ============== > = > > In Introduction, change > > o The octet values C0, C1, FE and FF never appear. If the range of > character numbers is restricted to U+0000..U+10FFFF (the UTF-16 > accessible range), then the octet values F5..FD also > never appear. > > to > > o The octet values C0, C1, and F5 to FF never appear. > > Rationale: we do restrict to U+0000..U+10FFFF now, the "If" > is superfluous. > > ============================================================== > ============== > = > > In Introduction, add "byte-value" to "The lexicographic > sorting order of..." > > Rationale: clarification, that's what it is. > > ============================================================== > ============== > = > > Add Chris Newman to Acknowlegments > > Rationale: he had just slipped through the cracks. With apologies. > > ============================================================== > ============== > = > > -- > François Yergeau >
Received on Monday, 9 June 2003 17:32:16 UTC