- From: Russell Galvin <russell@blissymbolics.org>
- Date: Tue, 17 Dec 2024 12:33:05 -0500
- To: public-adapt@w3.org
Hi all, Below is the edit I have prepared for the Symbols Implementation tests wiki. It is primarily in response to ideas that have come out of Issue 240. Please review if interested; feedback is welcome. Russell --- ## Converting symbols to fonts To demonstrate displaying symbols as text annotation using <ruby>, a WOFF/WOFF2 SVG colour web test font was created. Containing vector graphic SVG-based glyphs for each supported code point, the images are effectively infinitely scaleable without loss of image quality and without the need for more than one version of the image as is the case with bitmap fonts. As a web font, the font can be accessed from the local machine but can alternatively be located on a web server, making it available to any device connected to the web eliminating the need for local installation. The test font was created using software available for Apple Macintosh called Glyphs. The process is quite simple, whereby existing SVG images can be dragged and dropped into an SVG layer for a given code point thereby creating the SVG scaleable glyph for that code point. Proper positioning must be attended to. Other software exists that provides this functionality such as icomoon.io, fontastic.me, and fontello.com, however the popular open source application FontForge does not support SVG colour fonts at this time. ### ARASAAC ARASAAC does not at this time make SVG versions of their symbols freely available on their website. All images they provide are PNG raster (colour bitmap) format. In order to demonstrate symbols with more complex and/or detailed graphics than Blissymbols such as ARASAAC, Jellow and Mulberry CC licensed SVG symbols were used. ### Bliss Blissymbols are freely available in SVG format from the BCI website at https://blissymbolics.org/index.php/symbol-files-2024. Some examples were included in the test font. The current Bliss Unicode draft proposal by Michael Everson is published at https://www.unicode.org/wg2/docs/n5228-blissymbols.pdf. This is not finalized so Michael requests that people do not start using the code points published there until it is. This discussion shifts the code points into the private use area in order to comply with that request. ## Mapping code points within the font This section will document examples of making the mapping between Unicode code points corresponding to Bliss word parts (Bliss-characters) to Bliss-words (Blissymbols) for the particular concepts encoded by those Bliss-words. The pros and cons of using Blissymbol (proposed) Unicode code points as the method of identifying symbols and mapping between AAC symbol sets is discussed. In order to demonstrate why this is the case, we will assume we are using these code points along with WOFF2 fonts for displaying symbols and provide some use cases to illustrate the problems that will arise. 1) Simplest case: Annotating the word "drink". The Bliss-word "drink" is identified by the BCI-ID 13881. Here we use a demonstration Bliss Unicode code point of U+E20E for drink. A font containing glyphs for all Bliss-characters would correctly display the Blissymbol for drink in text strings containing this code point. In this case, the Blissymbol for drink is both a Bliss-character and a Bliss-word. An analogy would be the letter "a" in English which is also a meaningful word on its own. An AAC symbol user who prefers ARASAAC symbols, for example, could have the same ruby annotation displayed as an ARASAAC symbol simply by using a font with the glyph for the Bliss character "drink" replaced by the ARASAAC symbol for "drink". So far, so good. 2) Less simple case: Annotating the word "tea". The Bliss-word "tea" is identified by the BCI-ID 17511. It is composed of two Bliss-characters, "drink" followed by "leaf". Using U+E451 as the code point for leaf, the Unicode representation of "tea" is a two character string of the code points U+E20E and U+E451. In HTML it can be represented as "". The ARASAAC symbol set — in common with all AAC symbol sets other than Bliss — has a single pictographic symbol for tea. As an aside, there are often multiple alternate symbols for the same concept in non-Bliss AAC symbol sets but a particular one would normally be selected for a particular user. So in an ARASAAC font, how do we map a single image to a sequence of two or more code points? Option a) A mechanism exists in Unicode for character composition whereby a base character followed by one or more combining characters is equivalent to and may be replaced by what is called a precomposed character. This is typically used for diacritical marks, superscripts, subscripts, etc. At first glance this may seem a plausible solution but it would require almost every Bliss-character to be used as a combining character as well as non-combining and is not how the Bliss encoding is designed and will not work without extensive modifications to the current proposed encoding. In addition, pre-composed characters would have to be added to fonts with every new addition to the Bliss vocabulary...in short, this is not how this Unicode mechanism is intended to be used. Option b) Another Unicode mechanism exists that is used in the implementation of Emoji combinations is zero width joiner (ZWJ) sequences. This approach could conceivably work for AAC symbol annotation. By inserting a ZWJ code point between elements in a sequence, the rendering engine is instructed to consider the code points in the sequence as a group and display an image that it retrieves from a lookup table or through a similar technique. The natural Unicode way to support this is by using ligatures which replace the sequence with a single glyph. Usually a ligature is a glyph that occurs in some languages when a certain sequence of characters occurs that when written together become joined or overlapped in some way. Examples in Latin scripted languages are character combinations such as OE becoming Œ in Old English or IJ becoming IJ in Dutch. But there is no reason this could not be used to convert Bliss-words composed of multiple characters to ARASAAC or other AAC symbol sets with a single image for the concept being represented. In this use case example, the Bliss-character sequence of "drink"+ZWJ+"leaf" would be replaced with a ligature glyph associated with that sequence in the particular font being used. In an ARASAAC font for example, the sequence would be replaced with the image of a cup of tea with a tea bag in it. There are about 1,200 Bliss-characters and currently about 6,400 Blissymbols. So approximately 5,200 Blissymbols (Bliss-words) are composed of sequences of two or more Bliss-characters that would need to have ligatures provided for them for the full current vocabulary to be covered in a non-Bliss AAC annotation font. Just over 600 characters serve as initial characters in multi-character symbols giving an average of about 8.5 ligatures per base character. About 2,000 symbols are composed of three characters, 2,000 have four characters and 900 have five characters. With WAI-Adapt targetting core annotation i.e. not expecting 100% coverage, this could be reduced to a core vocabulary but if there are no practical limitations then it may just be easiest to implement the entire vocabulary. 3) More troublesome case: Annotating the expression "chocolate drink". In Blissymbolics there is a Bliss-word for "chocolate drink" with BCI-ID 20772 that is made up of characters "drink" + "bean" + "up". Using HTML hexadecimal notation this would be written as "‍‍" where x200D is the hex value of the ZWJ non-spacing code point. So, other than having an extra character in the string, why is this a more troublesome case? The reason is not that it is more difficult to create a ligature with a single glyph to be displayed in place of the string of glyphs. The reason is to do with the nature of Bliss as a living language. The symbol for "chocolate drink" illustrates a relatively common occurrence with Bliss that will cause problems in the future. The problem is that Bliss spellings can change over time. In this case, "chocolate" was formerly spelled completely differently as "powder" + "brown". The language undergoes constant review and revision by users, teachers, and other participants in its development. For backward compatibility the previous spellings are retained and marked with _OLD to indicate that they are deprecated. However, the implications of this for non-Bliss symbol set developers is that whenever a Bliss spelling changes, they would have to modify their font to support the new spelling in addition to the deprecated one. Also, due to the nature of Bliss, there would be a cascading effect where every other symbol that includes "chocolate" such as "chocolate sauce", "chocolate spread", etc, would also have to be modified. If an alternate approach were used where the identifier for each concept is independent of a particular representation of that concept, then no symbol set will be dependent on — or, in software design terminology "coupled to" — any other symbol set. Coupling all other AAC symbol sets to Blissymbolics would be a design mistake, in this writer's opinion. 4) Most troublesome case: Annotating a word that does not exist in the BCI AV The BCI Authorized Vocabulary currently contains about 6,400 entries. It is being expanded all the time but coming up with new Blissymbols is a time consuming process. The quality of the language depends on consistent, well thought out strategies for representing increasingly complex concepts. ARASAAC has over 13,000. Other symbol sets likely have more. At first glance you might think that one of these sets would contain all Blissymbol concepts with many others as well but this is not the case. Bliss tends to have a single symbol for a concept — with the exception of deprecated symbols — whereas most other, purely pictographic symbol sets often have many alternative images for the same concept with different alternatives often representing the idea in a way that is more relateable to for a particular user. This is much along the lines of alternative Emoji of humans with different skin tone. Accurate counts of how much duplication within a symbol set and how much overlap between symbol sets are not available but it can be safely assumed that there are significant percentages of both. So...what happens when an ARASAAC symbol is required for a user and a corresponding Blissymbol does not exist? What gets coded into the HTML? How long will the ARASAAC user have to wait for the corresponding Blissymbol to be created so there is a spelling to be entered in the ruby annotation so the ARASAAC providers can provide a ligature for it in their font? That could be a very long wait. The people developing Blissymbols naturally respond to the Blissymbol user community with priority. This is the ultimate example of how coupling to the Blissymbolics Unicode encoding would create barriers to other symbol set providers. They would essentially be blocked from providing a particular symbol if an equivalent did not exist in the BCI AV. Browser vendors could provide a workaround but with accessibility already a low priority and the tiny subset of AAC users being an even lower priority it is in the best interests of those users to design a standard from the outset without such barriers to overcome. Conclusion Although on face value it may seem attractive to use the proposed Blissymbol Unicode encoding instead of a W3C registry as the reference for AAC symbol annotation, I don't believe it is the right thing to do and cannot recommend it for this purpose. Also, it is important to note that the problem illustrated in use case 4) would exist with any AAC symbol set being used as a reference. If a registry were used it would ideally initially consist of a superset of all existing AAC symbol set concepts using an identifier that is independent of any one symbol set. Whether this ID is created for the purpose or borrowed from other widely used lexical database systems such as WordNet (wordnet.princeton.edu) or ConceptNet (conceptnet.io) or a combination thereof is a matter for further discussion. For example, one way to define the ID would be to use code points in the private use area which would then just be directly displayed by the rendering engine without any need to resort to ligatures. Or perhaps ConceptNet IDs could be used which are then directly mapped to private use area code points. ConceptNet is an interesting option because it is already very extensive in scope but also relatively open for adding entries. To be fair, the advantage of using the proposed Bliss Unicode encoding along with Ruby and fonts for display is that most of the work is already done. SVG fonts with ligatures would have to be created but for the AAC annotation display mechanism that's about it. ## Rendering: use of ruby element The demonstration HTML using ruby elements to annotate text with symbols implemented as a web font works well. There could be a practical issue with layout when using symbols for annotation of text that was not anticipated to be annotated when the original layout was done. This could possibly also be an issue for ruby annotation of Japanese text for pronunciation but this use of ruby is more likely to be as designed as opposed to after-the-fact as will be the case for symbol annotation. In addition, the size of the symbols will, instead of being smaller than the base text which is usually the case for ruby, will usually be required to be larger than the base text thereby creating a higher probability of layout and formatting issues. ### In languages that don't use ruby If ruby is not typically used in a language then there are no potential collision issues of using ruby both as originally intended and as displaying symbol annotation as suggested. ### In languages that already use ruby In languages that do use ruby — usually for pronunciation annotation — there is the potential for conflict with simultaneously using ruby for symbol annotation. However, ruby has been designed to support multiple annotations so there may very well not be a conflict. Further testing is required but from rudimentary inital experimentation, reasonably satisfactory results were obtained. Multiple annotations are definitely possible, the only questions would be regarding what is acceptable layout and how would an algorithm achieve that when annotating with symbols. ### Hiding ruby elements containing symbol content This is needed so that users who are not using symbols do not see such content. This raises the question: are all pages to be pre-annotated with symbols? Or are pages being viewed by symbol users to be pre-processed by the browser in order to provide annotation for the particular user?
Received on Tuesday, 17 December 2024 17:33:22 UTC