HI
Jonathan Kew wrote:
> And as an illustration of just how unwise it would be for someone to
> use these distinct but canonically-equivalent characters to represent
> a significant distinction in markup: when I copy and paste those lines
> from my email client into a text editor, and examine the resulting
> codepoints, I find that all three lines are identical. Some process --
> I'm not sure whether it is my mail client's Copy command, my text
> editor's Paste, or the operating system pasteboard in between -- has
> helpfully applied Unicode normalization to the data. So if that was a
> semantically important distinction in the hypothetical markup language
> you're using, it just got destroyed. By processes that are fully
> Unicode-compliant.
>
> (I know that you did indeed use different characters in the original
> mail, and they reached my mail client in that form, because I can
> examine the bytes in the message and see that this was the case. But
> simply copying the text to a plain-text editor changes that.)
Now the question is which characters did you receive? U+003c and U+003e?
Which weren't present in the example? Just curious
--
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Ph: +61-3-8664-7430
Fax: +61-3-9639-2175
Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com
http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au