- From: Asmus Freytag (t) <asmus-inc@ix.netcom.com>
- Date: Fri, 26 Jun 2015 14:35:34 -0700
- To: Murray Sargent <murrays@exchange.microsoft.com>, David Carlisle <davidc@nag.co.uk>, William F Hammond <hammond@csc.albany.edu>
- Cc: www-math@w3.org, Michel Suignard <michel@suignard.com>
On 6/23/2015 3:20 PM, Murray Sargent wrote: > David Carlisle wrote that one could made definitions like > >> U+2102 DOUBLE-STRUCK CAPITAL C = Complex numbers >> >> Leaving U+1D53A free to be defined as a part of a generic alphabetic run as >> MATHEMATICAL DOUBLE-STRUCK CAPITAL C > One can't change the definitions of the math alphanumerics now since they are already encoded and Unicode has a stability guarantee. In addition they are widely used in technical documents as defined. We might have been able to get away with such definitions before the math alphanumerics were added to the Unicode Standard 3.1 back in March, 2001. For Microsoft Office apps, I wrote routines to work around the separation of the math alphabetics into the LetterLike Symbols and math alphanumerics blocks and it's complicated and even error prone. So I really wish that we had done something along the lines David suggests. But it's clearly water over the dam at this point. > > +Asmus and Michel in case they want to defend Unicode's position of not duplicating characters. Someone may have to forward this reply to the list. > I'd argue that simplicity of implementation should play an important role in this regard. This isn't the only place where Unicode is over unified. But these complications do provide ways to keep programmers employed <grin>. Unicode generally does not encode characters by usage. For example there's no distinction between period, decimal point, abbreviation point etc.. This reflects the underlying situation, to wit, that this is a case of the *same* symbol being used in different conventions. The downside is that it is thus not possible to use plain text to capture which convention is intended (but nothing prevents anyone from providing rich-text markup). The upside is that data can't exhibit "random alternation" between identical looking symbols; experience has shown that this is a most likely outcome if "the same" item is encoded several times, based merely on convention. In the existing case, when 2102 and friends were encoded in the Letterlike Symbols block, they were clearly intended as a subset of the double struck alphabet. The fact that some of the conventional meanings for characters from this subset are annotated in the nameslist does not detract from that. It took a few versions of Unicode to better understand the best way to encode symbols and alphabets used for math. The unfortunate side effect of that is that the math alphabets are not sequential but have "holes". In some cases, Unicode apparently does encode convention, for example the micro vs. Greek mu, or Ohm vs. Greek Omega. These have complicated histories. The desire to preserve the Latin-1 layout as an aid in migration overrode the normal reluctance to code by convention. The downside is that now users will use "random alternation" for the mu used as micro sign. Greek users will most certainly not use the Latin-1 code point for that purpose. Some of the letterlike symbols should not have been coded in that block, but in the Squared abbreviations block. That is because their origin was fundamentally the special em-square set of units used in Asian standards. In the early versions of Unicode, there was this idea of filtering out from such sets, any symbol that might be used "generically", that is, outside an Asian typographic environment. For most such usages, the standard Latin (or Greek, for Ohm) letters would have been the correct characters, leaving Kelvin, Ohm, and Angstrom as specifically "squared" characters. So, while there are exceptions to what has by now become the principle for new encodings, I would not call the treatment of math alphabets "over unified". It is rather an attempt to not needlessly repeat the "underunification" of micro, Kelvin, Ohm and Angstrom. A./
Received on Friday, 26 June 2015 21:36:07 UTC