- From: Felix Sasaki <fsasaki@w3.org>
- Date: Sun, 03 Feb 2013 23:59:14 +0100
- To: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>, "Lieske, Christian" <christian.lieske@sap.com>
Hi Christian, I had tried to foster a discussion on a solution about 3b at http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0210.html and http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0243.html but we didn't get to it yet. We have now an action for shaun to work on a BP for normalization https://www.w3.org/International/multilingualweb/lt/track/actions/430 So here asking explicitly: Would such a BP note also resolve issue-73, comment 3? Repeated below: [Input and output have to consider Unicode Normalization Forms/Unicode Equivalence (e.g. so that the algorithm does produce identical results for sentences that contain "Äffin" and "A\u0308ffin")] Note that the i18n WG itself who is pushing for normalization on the Web is not asking to make it a normative requirement, but rather a recommendation - see citation from http://www.unicode.org/mail-arch/unicode-ml/y2013-m02/0007.html [The current consensus is that early uniform normalization is not required for the generation of content, that "late normalization" (when comparing strings) is also not required, and that both of these cases are ingrained in the fabric of Web technologies in a way that makes it difficult to change them. Thus, content authors and users are cautioned to use a *consistent* character sequences in their content, with NFC being generally recommended as one way to ensure this. In point of fact, for most languages in most scripts, content tends to be in form NFC. But you can't count on it. And far from being dead, other normalization forms like NFD are useful for various kinds of processing. ] Best, Felix
Received on Sunday, 3 February 2013 22:59:37 UTC