- From: Jonathan Kew <jonathan@jfkew.plus.com>
- Date: Thu, 5 Feb 2009 00:12:57 +0000
- To: Robert J Burns <rob@robburns.com>
- Cc: public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
On 4 Feb 2009, at 23:46, Robert J Burns wrote: > > A slight correction on this that isn't really all that germane to > the present conversation, but I felt I should make nonetheless and > this correction helps improve understanding of the various issues. > > On Feb 4, 2009, at 3:07 PM, I wrote: >> Take for example the following three strings (NFD, NFC and non- >> normalized): >> >> 〈this string〉 >> 〈this string〉 >> 〈this string〉 > > > Actually the first form is non-normalized too. The second string is > conforming to both NFC and NFD. The third string is non-normalized > as well > > Just to provide further clarification each line is a separate string > where the interior "this string" is an identical code point sequence > irrelevant for normalization purposes. The angle brackets themselves > however, have been encoded repeatedly as different code points > despite Unicode offering no semantically distinct interpretation > between the two code points. And as an illustration of just how unwise it would be for someone to use these distinct but canonically-equivalent characters to represent a significant distinction in markup: when I copy and paste those lines from my email client into a text editor, and examine the resulting codepoints, I find that all three lines are identical. Some process -- I'm not sure whether it is my mail client's Copy command, my text editor's Paste, or the operating system pasteboard in between -- has helpfully applied Unicode normalization to the data. So if that was a semantically important distinction in the hypothetical markup language you're using, it just got destroyed. By processes that are fully Unicode-compliant. (I know that you did indeed use different characters in the original mail, and they reached my mail client in that form, because I can examine the bytes in the message and see that this was the case. But simply copying the text to a plain-text editor changes that.) JK
Received on Thursday, 5 February 2009 00:13:46 UTC