Re: Comments on draft-yergeau-rfc2279bis-00.txt from Martin Duerst on 2002-04-18 (ietf-charsets@w3.org from April to June 2002)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 18 Apr 2002 13:42:26 +0900
To: Dan Oscarsson <Dan.Oscarsson@trab.se>, ietf-charsets@iana.org, FYergeau@alis.com
Message-id: <4.2.0.58.J.20020418133959.047a5b90@localhost>

While I'm definitely an advocate of NFC, this isn't and
should not be part of the definition of UTF-8.
Maybe Francois finds a good place to put in a pointer
to NFC and UAX #15, but it definitely shoudn't be part
of the normative definition.

Regards,   Martin.

At 13:44 02/04/17 +0200, Dan Oscarsson wrote:

>I would also very much like UTF-8 to require that Unicode
>normalisation form C has been used on the UCS encoded.
>Otherwise can the same character sequence have
>different UTF-8 codings.
>While it is no problem to use overlong UTF-8 sequences, they
>are forbidden in the document. This makes it impossible to
>encode the same ASCII character sequence in several ways.
>The same should be applied to all characters in UCS - only
>one form should be allowed.
>As form C do not destroy any data and is most compact, it is
>the best choice.
>So UTF-8 should REQUIRE the characters to be normalised
>using form C. (note: text normalised using from KC will
>work also, it it is normalised using form C it will result
>in the same text).
>
>Having both BOM removed and form C required will make handling
>of UTF-8 in software much simpler as well as less error and security
>prone.
>
>     Dan

Received on Thursday, 18 April 2002 01:44:35 UTC