W3C home > Mailing lists > Public > public-multilingualweb-lt-comments@w3.org > February 2013

issue-73, comment 3b

From: Felix Sasaki <fsasaki@w3.org>
Date: Sun, 03 Feb 2013 23:59:14 +0100
Message-ID: <510EEBC2.4030602@w3.org>
To: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>, "Lieske, Christian" <christian.lieske@sap.com>
Hi Christian,

I had tried to foster a discussion on a solution about 3b at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0210.html
and
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0243.html
but we didn't get to it yet.

We have now an action for shaun to work on a BP for normalization
https://www.w3.org/International/multilingualweb/lt/track/actions/430

So here asking explicitly: Would such a BP note also resolve issue-73, 
comment 3? Repeated below:

[Input and output have to consider Unicode Normalization Forms/Unicode 
Equivalence (e.g. so that the algorithm does produce identical results 
for sentences that contain "Äffin" and "A\u0308ffin")]

Note that the i18n WG itself who is pushing for normalization on the Web 
is not asking to make it a normative requirement, but rather a 
recommendation - see citation from
http://www.unicode.org/mail-arch/unicode-ml/y2013-m02/0007.html

[The current consensus is that early uniform normalization is not 
required for the generation of content, that "late normalization" (when 
comparing strings) is also not required, and that both of these cases 
are ingrained in the fabric of Web technologies in a way that makes it 
difficult to change them. Thus, content authors and users are cautioned 
to use a *consistent* character sequences in their content, with NFC 
being generally recommended as one way to ensure this. In point of fact, 
for most languages in most scripts, content tends to be in form NFC. But 
you can't count on it. And far from being dead, other normalization 
forms like NFD are useful for various kinds of processing. ]

Best,

Felix
Received on Sunday, 3 February 2013 22:59:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 3 February 2013 22:59:38 GMT