HI Jonathan Kew wrote: > And as an illustration of just how unwise it would be for someone to > use these distinct but canonically-equivalent characters to represent > a significant distinction in markup: when I copy and paste those lines > from my email client into a text editor, and examine the resulting > codepoints, I find that all three lines are identical. Some process -- > I'm not sure whether it is my mail client's Copy command, my text > editor's Paste, or the operating system pasteboard in between -- has > helpfully applied Unicode normalization to the data. So if that was a > semantically important distinction in the hypothetical markup language > you're using, it just got destroyed. By processes that are fully > Unicode-compliant. > > (I know that you did indeed use different characters in the original > mail, and they reached my mail client in that form, because I can > examine the bytes in the message and see that this was the case. But > simply copying the text to a plain-text editor changes that.) Now the question is which characters did you receive? U+003c and U+003e? Which weren't present in the example? Just curious -- Andrew Cunningham Senior Manager, Research and Development Vicnet State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Ph: +61-3-8664-7430 Fax: +61-3-9639-2175 Email: andrewc@vicnet.net.au Alt email: lang.support@gmail.com http://home.vicnet.net.au/~andrewc/ http://www.openroad.net.au http://www.vicnet.net.au http://www.slv.vic.gov.auReceived on Thursday, 5 February 2009 02:51:59 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 5 February 2009 02:52:01 GMT