- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Wed, 21 May 1997 08:55:55 -0400
- To: jjc@jclark.com
- CC: w3c-sgml-wg@w3.org
>>If you count characters of the source content, why won't everyone count them >>the same way? The only problem I see would be with a system that silently >>"normalized" data somehow, and lost track of what the original data was. > >I think a system might quite reasonably choose normalize precomposed >characters into composite characters or vice-versa, since there's no real >difference in the semantics. If I've got a base character followed by a >non-spacing character, and the combination of the two is available in >Unicode as a precomposed character, and I've got a system that doesn't have >the machinery to render non-spacing characters, then it might well be >desirable for it to do the precomposition. We could get around this by requiring that data be normlised per the Unicode standard, and define a "word" quantum to be the result of the boundary discovery algorithm in Unicode 2.0. Not that I think this is a good idea though...
Received on Wednesday, 21 May 1997 08:57:23 UTC