- From: Deyan Ginev <deyan.ginev@gmail.com>
- Date: Tue, 13 Jul 2021 23:37:50 -0400
- To: Susan Jolly <easjolly@ix.netcom.com>
- Cc: Neil Soiffer <soiffer@alum.mit.edu>, David Farmer <farmer@aimath.org>, Sam Dooley <samdooley64@gmail.com>, "Hammond, William F" <whammond@albany.edu>, "Noble, Stephen" <steve.noble@pearson.com>, Murray Sargent <murrays@exchange.microsoft.com>, Louis Maher <ljmaher03@outlook.com>, www-math@w3.org
On Tue, Jul 13, 2021 at 10:09 PM Susan Jolly <easjolly@ix.netcom.com> wrote: > > Hi Deyan, Hi Susan, > > I'm confused by what you wrote. Apologies for the confusion. Assume that I write as a casual user of Unicode, who has not been involved in any of the efforts you're describing here. I will try to improve my use of terminology, so please do correct me when I stray into misusing terms. > My understanding of Unicode is that it > distinguishes characters from glyphs and that a great deal of effort has > gone into creating the Unicode set of over 100,000 unique characters. > Characters in Unicode are distinguished by their numerical character codes, > not by their visual appearance. Unicode decided back in 1993 that a colon > punctuation mark and the mathematical symbol for ratio are two different > characters. If my understanding up to this point is incorrect, please > correct me. I am no expert on the history, but the final outcome of over 100,000 unique characters is indeed something I am aware of. The great deal of effort of devising these characters has now been gracefully passed down to developers who have to expect arbitrary unicode inputs in their applications. Sometimes for good reason - sometimes one wonders. [Aside] Actually, 1993 was likely when I wrote my first mathematical colon, as I must have been in first grade. And I see to this day that it is taught to primary school students as our preferred division sign. Here's a Bulgarian Khan academy video to illustrate that: https://youtu.be/d_Q8xICTFpQ?t=35 So add "divides" as the tenth notation in my list above. > > It is also my understanding that characters are displayed visually by glyphs > with the Unicode tables providing a typical or reference glyph for each > character. However the visual appearance of a given character is not going > to be identical in all fonts. Certainly. > > The use of Unicode character codes aids in the automatic translation of math > to braille. Of course a given braille system cannot define easy-to-remember > braille symbols for all of the Unicode characters so it needs some method > for dealing with this issue. It certainly *could* aid that. But the world is not necessarily perfectly encoded in the correct Unicode characters. The examples of the many notations above were meant to illustrate that. I can use the regular colon (in other words U+003A) to encode all of the nine distinct mathematical notations above, and a reader of a web page would have no problem understanding what is written. All courtesy of the textual context surrounding the expressions, which usually suffices to obtain clarity. My perspective is entirely of an external onlooker here, but also of someone who wants to garden 700 million mathematical expressions from arXiv.org. And in the case of arXiv, they need to be tackled in the way people wrote them from 1991 till today. If people used ascii colons for their ratios, it is a lot more manageable to expect the AT tools to pass on a generic "colon" character to their readers, as a baseline expectation. And then ratios can be inferred by the reader based on context. At least I consider it a better design choice than trying to guess where ratios are to be inserted heuristically, using U+2236, and ending up with e.g. "the time is 14-to-10" for "the time is 14:10". Quite amusingly, "14-to-10" is a valid reading of a time in English, but encodes a completely different minute, the one at 9:46. Imagine the poor reader that has to debug that mistake (hopefully they don't work at a transport station). And then if a willing author is ready to remediate, I would prefer that they annotated in natural language the mathematical concept they intended. Because while they can use U+2236 to explicitly designate "ratio" in Unicode, there is no character for "such-that", "coordinate-separator", "typing-judgement", "namespace-separator", "ruby-symbol" ... and so on. Also looking forward to the new uses of colons that are to be invented in 2021 and beyond. To reuse an expression Sam Dooley threw my way when I joined the CG, the effort of enumerating all possible uses of the same visual glyph (say by introducing a new Unicode character per each meaning) is akin to "trying to boil the ocean". > One posssibility is direct representation of > hexidecimal character codes. Is the burden on memory feasible to remember a list as large as Unicode? I certainly like the ability to know the exact code - I have an active plugin in VSCode that shows the code points of the characters under my cursor in the editor status bar. But I would assume an unsuspecting user that stumbles on say " 𝕬 " would be a lot more capable of mentally working with "MATHEMATICAL BOLD FRAKTUR CAPITAL A" than they would be tackling U+1D56C in isolation. Of course, if there is a convenient interface to easily switch between the code and the name/description of a character, starting with the hexadecimals could be workable, as one will remember them when they're frequent and important. And keep jumping back to their text descriptions when they aren't. That said, thinking in hexadecimals is also a burden that's pretty unusual, it may take quite some practice to do well. I will keep thinking as I continue learning about Braille and reading the rest of the replies. Greetings, Deyan > > Susan J. >
Received on Wednesday, 14 July 2021 03:38:29 UTC