At 1:35 PM -0400 6/26/00, John Cowan wrote: >Kevin Regan wrote: > >> If it is the usual case that documents are created in the normalized >> form, then it does not seem like a big issue. What would happen >> in the case of an editor or application written in Java (Unicode)? > >Most people do not have the capability of keyboarding separate accent >marks anyhow (their keyboards generate the normalized forms). But this is a gross oversimplification of how users might enter non-canonicalized characters in a document. An easy example from plane zero is U+00BC (VULGAR FRACTION ONE QUARTER). Microsoft Word (and other programs) will insert this into a document as its uncanonicalized form; Word will even do it behind your back unless you turn off Word's default "helpful" auto-correction feature. U+00BC canonicalizes into U+0031 followed by U+2044 followed by U+0034. There are dozens of other common cases of easily-entered non-canconical forms, and thousands of less common cases that could still be found without much effort. --Paul Hoffman, Director --Internet Mail ConsortiumReceived on Monday, 26 June 2000 15:18:57 GMT
This archive was generated by hypermail 2.2.0 + w3c-0.29 : Thursday, 13 January 2005 12:10:09 GMT