Re: Unicode Normalization

HI

Jonathan Kew wrote:
> And as an illustration of just how unwise it would be for someone to 
> use these distinct but canonically-equivalent characters to represent 
> a significant distinction in markup: when I copy and paste those lines 
> from my email client into a text editor, and examine the resulting 
> codepoints, I find that all three lines are identical. Some process -- 
> I'm not sure whether it is my mail client's Copy command, my text 
> editor's Paste, or the operating system pasteboard in between -- has 
> helpfully applied Unicode normalization to the data. So if that was a 
> semantically important distinction in the hypothetical markup language 
> you're using, it just got destroyed. By processes that are fully 
> Unicode-compliant.
>
> (I know that you did indeed use different characters in the original 
> mail, and they reached my mail client in that form, because I can 
> examine the bytes in the message and see that this was the case. But 
> simply copying the text to a plain-text editor changes that.)

Now the question is which characters did you receive? U+003c and U+003e? 
Which weren't present in the example? Just curious

-- 
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au

Received on Thursday, 5 February 2009 02:51:57 UTC