Re: Unicode Normalization from Andrew Cunningham on 2009-02-05 (www-style@w3.org from February 2009)

From: Andrew Cunningham <andrewc@vicnet.net.au>
Date: Thu, 05 Feb 2009 13:50:27 +1100
To: Jonathan Kew <jonathan@jfkew.plus.com>
CC: Robert J Burns <rob@robburns.com>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
Message-ID: <498A53F3.90609@vicnet.net.au>

HI

Jonathan Kew wrote:
> And as an illustration of just how unwise it would be for someone to 
> use these distinct but canonically-equivalent characters to represent 
> a significant distinction in markup: when I copy and paste those lines 
> from my email client into a text editor, and examine the resulting 
> codepoints, I find that all three lines are identical. Some process -- 
> I'm not sure whether it is my mail client's Copy command, my text 
> editor's Paste, or the operating system pasteboard in between -- has 
> helpfully applied Unicode normalization to the data. So if that was a 
> semantically important distinction in the hypothetical markup language 
> you're using, it just got destroyed. By processes that are fully 
> Unicode-compliant.
>
> (I know that you did indeed use different characters in the original 
> mail, and they reached my mail client in that form, because I can 
> examine the bytes in the message and see that this was the case. But 
> simply copying the text to a plain-text editor changes that.)

Now the question is which characters did you receive? U+003c and U+003e? 
Which weren't present in the example? Just curious

-- 
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au

Received on Thursday, 5 February 2009 02:51:57 UTC