- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Mon, 26 Jun 2000 12:18:46 -0700
- To: w3c-ietf-xmldsig@w3.org
At 1:35 PM -0400 6/26/00, John Cowan wrote: >Kevin Regan wrote: > >> If it is the usual case that documents are created in the normalized >> form, then it does not seem like a big issue. What would happen >> in the case of an editor or application written in Java (Unicode)? > >Most people do not have the capability of keyboarding separate accent >marks anyhow (their keyboards generate the normalized forms). But this is a gross oversimplification of how users might enter non-canonicalized characters in a document. An easy example from plane zero is U+00BC (VULGAR FRACTION ONE QUARTER). Microsoft Word (and other programs) will insert this into a document as its uncanonicalized form; Word will even do it behind your back unless you turn off Word's default "helpful" auto-correction feature. U+00BC canonicalizes into U+0031 followed by U+2044 followed by U+0034. There are dozens of other common cases of easily-entered non-canconical forms, and thousands of less common cases that could still be found without much effort. --Paul Hoffman, Director --Internet Mail Consortium
Received on Monday, 26 June 2000 15:18:57 UTC