W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2009

Re: Unicode Normalization

From: Andrew Cunningham <andrewc@vicnet.net.au>
Date: Thu, 05 Feb 2009 13:50:27 +1100
Message-ID: <498A53F3.90609@vicnet.net.au>
To: Jonathan Kew <jonathan@jfkew.plus.com>
CC: Robert J Burns <rob@robburns.com>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>

Jonathan Kew wrote:
> And as an illustration of just how unwise it would be for someone to 
> use these distinct but canonically-equivalent characters to represent 
> a significant distinction in markup: when I copy and paste those lines 
> from my email client into a text editor, and examine the resulting 
> codepoints, I find that all three lines are identical. Some process -- 
> I'm not sure whether it is my mail client's Copy command, my text 
> editor's Paste, or the operating system pasteboard in between -- has 
> helpfully applied Unicode normalization to the data. So if that was a 
> semantically important distinction in the hypothetical markup language 
> you're using, it just got destroyed. By processes that are fully 
> Unicode-compliant.
> (I know that you did indeed use different characters in the original 
> mail, and they reached my mail client in that form, because I can 
> examine the bytes in the message and see that this was the case. But 
> simply copying the text to a plain-text editor changes that.)

Now the question is which characters did you receive? U+003c and U+003e? 
Which weren't present in the example? Just curious

Andrew Cunningham
Senior Manager, Research and Development
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com


Received on Thursday, 5 February 2009 02:51:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:23:04 UTC