- From: Tony Graham <tgraham@mentea.net>
- Date: Wed, 27 Feb 2013 20:06:40 -0000 (GMT)
- To: "public-microxml" <public-microxml@w3.org>
- Message-ID: <46163.94.197.127.177.1361995600.squirrel@mail3.webfaction.com>
FYI, though it so far hasn't made a ripple in XML circles... ---------------------------- Original Message ---------------------------- Subject: Corrigendum #9 clarifies noncharacter usage in Unicode From: announcements@unicode.org Date: Wed, February 20, 2013 8:49 pm To: announcements@unicode.org -------------------------------------------------------------------------- There has been confusion about whether noncharacters were permitted in Unicode text. The new Corrigendum #9: Clarification About Noncharacters <http://www.unicode.org/versions/corrigendum9.html> makes it clear that noncharacters are permissible even in open interchange, although their intended semantics may not beinterpretable in such contexts. The UTF-8, UTF-16, UTF-32 & BOM FAQ <http://www.unicode.org/faq/utf_bom.html> has also been updated for clarity, and other informative text about noncharacters will be revised over time, including the Core Specification. Background. There are 66 noncharacters permanently reserved for internal use, typically used for some sort of control function or sentinel value. They should be supported by APIs, components, and applications that handle (i.e., either process or pass through) all Unicode strings, such as a text editor or string class. Where an application does make internal use of a noncharacter, it should take some measures to sanitize input text from unknown sources. The best practice is to replace that particular noncharacter on input by U+FFFD. (The noncharacter should not be simply deleted, since that has security problems. For more information, see Section 3.5 Deletion of Code Points <http://www.unicode.org/reports/tr36/#Deletion_of_Noncharacters> in UTR #36, Unicode Security Guidelines <http://www.unicode.org/reports/tr36/>.) http://unicode-inc.blogspot.com/2013/02/corrigendum-9-clarifies-noncharacter.html
Attachments
- text/html attachment: untitled-2
Received on Wednesday, 27 February 2013 20:07:04 UTC