- From: David Carlisle <davidc@nag.co.uk>
- Date: Tue, 2 May 2006 17:23:03 +0100
- To: juanrgonzaleza@canonicalscience.com
- CC: www-math@w3.org
> I thought that o + combining-diaeresis and ö were two different things in > Unicode even when both are rendered equal. Of course, both are defined to > be "canonically equivalent" via "canonical decomposition" but are not > defined to be "equivalent". You could not have a mathematical (or text) markup scheme that relied on both forms being available, and inferred different semantics in the two cases. The set of characters for which precomposed forms are available is just a random ad hoc list based mainly on historical accident. In general, when designing the markup language, you have to assume that there is no precompomposed unicode character for the base+diacritic, and if that is the case no such precomposed character will be added to Unicode in later releases either: http://www.unicode.org/faq/ligature_digraph.html#3 At this point, the UTC has a default position: no new characters for digraphs or pre-composed diacritic letters should be accepted for encoding as individual characters. So in practice any argument about the exact relationship between the prcomposed character and the sequence using combing characters is irrelevant. In the vast majority of cases there is no precomposed character. The precomposed characers cover a reasonable proportion of the diacritic-base combinations used in European languages, but if you are using a dot-above to denote derivatives you need that diacritic (potentially) on any base letter. David ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________
Received on Tuesday, 2 May 2006 17:16:02 UTC