W3C home > Mailing lists > Public > ietf-charsets@w3.org > April to June 2003

RE: New draft-yergeau-rfc2279bis-05.txt

From: McDonald, Ira <imcdonald@sharplabs.com>
Date: Mon, 09 Jun 2003 14:11:00 -0700
To: "'Francois Yergeau'" <FYergeau@alis.com>, ietf-charsets@iana.org
Message-id: <116DB56CD7DED511BC7800508B2CA53735D037@mailsrvnt02.enet.sharplabs.com>

Hi Francois,

Which reminds me that the recently published RFC 3454 (December 2002)
is based on Unicode/3.2 (of course).  But there are (I believe) some
new characters registered in Unicode/4.0.  Also, Markus Kuhn's good
point recently on Linux I18N list that the character class of 
SOFT-HYPHEN just changed in Unicode/4.0 (which affects Stringprep).

Since a lot of IETF WGs are doing Stringprep profiles, it would be
desirable that they were referencing Unicode/4.0 - thus new exclusions
tables are needed, for example.

Comments?

Cheers,
- Ira McDonald
  High North Inc


-----Original Message-----
From: Francois Yergeau [mailto:FYergeau@alis.com]
Sent: Monday, June 09, 2003 4:55 PM
To: ietf-charsets@iana.org
Subject: RE: New draft-yergeau-rfc2279bis-05.txt


I forgot to mention that I also updated the [UNICODE] reference to Unicode
4.0.

-- 
Franois Yergeau

> -----Message d'origine-----
> De : Francois Yergeau [mailto:FYergeau@alis.com]
> Envoy : 9 juin 2003 16:10
>  : ietf-charsets@iana.org
> Objet : New draft-yergeau-rfc2279bis-05.txt
> 
> 
> ...just submitted to secretariat.
> 
> This revision addresses two substantive issues raised by the 
> IESG during
> post-last-call evaluation, as well as a few minor points that 
> have shown up
> since -04.
> 
> Changes from IESG review:
> ==============================================================
> ==============
> =
> 
> One director requested that it be made clear that the ABNF in 
> section 4 is
> not normative, both because it is new and untested -- added 
> between Draft
> and Standard -- and because RFC 2234 is only Proposed.  
> Section 4 now begins
> with a new para:
> 
>    For the convenience of implementors using ABNF, a 
> definition of UTF-8
>    in ABNF syntax is given here.
> 
> and ends with a new Note:
> 
>    NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This
>    grammar is believed to describe the same thing as what Unicode
>    describes, but does not claim to be authoritative. Implementors are
>    urged to rely on the authoritative source, rather than on 
> this ABNF.
> 
> ==============================================================
> ==============
> =
> 
> One director requested additional material in Security 
> Considerations about
> the fact that octet-by-octet comparison is not sufficient (the Unicode
> normalization issue).  The following has been added at the 
> end of section
> 10:
> 
>    Security may also be impacted by a characteristic of several
>    character encodings, including UTF-8: the "same thing" (as far as a
>    user can tell) can be represented by several distinct character
>    sequences. For instance, an e with acute accent can be 
> represented by
>    the precomposed U+00E9 E ACUTE character or by the canonically
>    equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE). 
> Even though
>    UTF-8 provides a single byte sequence for each character sequence,
>    the existence of multiple character sequences for "the same thing"
>    may have security consequences whenever string matching, indexing,
>    searching, sorting, regular expression matching and selection are
>    involved.  An example would be string matching of an identifier
>    appearing in a credential and in access control list entries.  This
>    issue is amenable to solutions based on Unicode 
> Normalization Forms,
>    see [UAX15].
> 
> together with a new entry in Informative references for 
> "Unicode Standard
> Annex #15: Unicode Normalization Forms".
> 
> 
> Minor changes:
> ==============================================================
> ==============
> =
> 
> In Introduction, add "code position" to "(the character 
> number, a.k.a. code
> point or Unicode scalar value)".
> 
> Rationale: "code position" is the 10646 term.
> 
> ==============================================================
> ==============
> =
> 
> In Introduction, change
> 
>    o  The octet values C0, C1, FE and FF never appear. If the range of
>       character numbers is restricted to U+0000..U+10FFFF (the UTF-16
>       accessible range), then the octet values F5..FD also 
> never appear.
> 
> to
> 
>    o  The octet values C0, C1, and F5 to FF never appear.
> 
> Rationale: we do restrict to U+0000..U+10FFFF now, the "If" 
> is superfluous.
> 
> ==============================================================
> ==============
> =
> 
> In Introduction, add "byte-value" to "The lexicographic 
> sorting order of..."
> 
> Rationale: clarification, that's what it is.
> 
> ==============================================================
> ==============
> =
> 
> Add Chris Newman to Acknowlegments
> 
> Rationale: he had just slipped through the cracks.  With apologies.
> 
> ==============================================================
> ==============
> =
> 
> -- 
> Franois Yergeau
> 
Received on Monday, 9 June 2003 17:32:16 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Monday, 12 September 2005 15:53:32 GMT