On 4 Feb 2009, at 23:46, Robert J Burns wrote: > > A slight correction on this that isn't really all that germane to > the present conversation, but I felt I should make nonetheless and > this correction helps improve understanding of the various issues. > > On Feb 4, 2009, at 3:07 PM, I wrote: >> Take for example the following three strings (NFD, NFC and non- >> normalized): >> >> 〈this string〉 >> 〈this string〉 >> 〈this string〉 > > > Actually the first form is non-normalized too. The second string is > conforming to both NFC and NFD. The third string is non-normalized > as well > > Just to provide further clarification each line is a separate string > where the interior "this string" is an identical code point sequence > irrelevant for normalization purposes. The angle brackets themselves > however, have been encoded repeatedly as different code points > despite Unicode offering no semantically distinct interpretation > between the two code points. And as an illustration of just how unwise it would be for someone to use these distinct but canonically-equivalent characters to represent a significant distinction in markup: when I copy and paste those lines from my email client into a text editor, and examine the resulting codepoints, I find that all three lines are identical. Some process -- I'm not sure whether it is my mail client's Copy command, my text editor's Paste, or the operating system pasteboard in between -- has helpfully applied Unicode normalization to the data. So if that was a semantically important distinction in the hypothetical markup language you're using, it just got destroyed. By processes that are fully Unicode-compliant. (I know that you did indeed use different characters in the original mail, and they reached my mail client in that form, because I can examine the bytes in the message and see that this was the case. But simply copying the text to a plain-text editor changes that.) JKReceived on Thursday, 5 February 2009 00:13:46 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 5 February 2009 00:13:47 GMT