Re: Comments on draft-yergeau-rfc2279bis-00.txt

While I'm definitely an advocate of NFC, this isn't and
should not be part of the definition of UTF-8.
Maybe Francois finds a good place to put in a pointer
to NFC and UAX #15, but it definitely shoudn't be part
of the normative definition.

Regards,   Martin.

At 13:44 02/04/17 +0200, Dan Oscarsson wrote:

>I would also very much like UTF-8 to require that Unicode
>normalisation form C has been used on the UCS encoded.
>Otherwise can the same character sequence have
>different UTF-8 codings.
>While it is no problem to use overlong UTF-8 sequences, they
>are forbidden in the document. This makes it impossible to
>encode the same ASCII character sequence in several ways.
>The same should be applied to all characters in UCS - only
>one form should be allowed.
>As form C do not destroy any data and is most compact, it is
>the best choice.
>So UTF-8 should REQUIRE the characters to be normalised
>using form C. (note: text normalised using from KC will
>work also, it it is normalised using form C it will result
>in the same text).
>
>Having both BOM removed and form C required will make handling
>of UTF-8 in software much simpler as well as less error and security
>prone.
>
>     Dan

Received on Thursday, 18 April 2002 01:44:35 UTC