Symbols Implementation Wiki edit from Russell Galvin on 2024-12-17 (public-adapt@w3.org from December 2024)

From: Russell Galvin <russell@blissymbolics.org>
Date: Tue, 17 Dec 2024 12:33:05 -0500
To: public-adapt@w3.org
Message-ID: <0c0cabe7-c773-498e-87f5-deb1c82a204a@blissymbolics.org>
Hi all,

Below is the edit I have prepared for the Symbols Implementation tests 
wiki. It is primarily in response to ideas that have come out of Issue 
240. Please review if interested; feedback is welcome.

Russell

---

## Converting symbols to fonts

To demonstrate displaying symbols as text annotation using <ruby>, a 
WOFF/WOFF2 SVG colour web test font was created. Containing vector 
graphic SVG-based glyphs for each supported code point, the images are 
effectively infinitely scaleable without loss of image quality and 
without the need for more than one version of the image as is the case 
with bitmap fonts. As a web font, the font can be accessed from the 
local machine but can alternatively be located on a web server, making 
it available to any device connected to the web eliminating the need for 
local installation.

The test font was created using software available for Apple Macintosh 
called Glyphs. The process is quite simple, whereby existing SVG images 
can be dragged and dropped into an SVG layer for a given code point 
thereby creating the SVG scaleable glyph for that code point. Proper 
positioning must be attended to. Other software exists that provides 
this functionality such as icomoon.io, fontastic.me, and fontello.com, 
however the popular open source application FontForge does not support 
SVG colour fonts at this time.

### ARASAAC

ARASAAC does not at this time make SVG versions of their symbols freely 
available on their website. All images they provide are PNG raster 
(colour bitmap) format. In order to demonstrate symbols with more 
complex and/or detailed graphics than Blissymbols such as ARASAAC, 
Jellow and Mulberry CC licensed SVG symbols were used.

### Bliss

Blissymbols are freely available in SVG format from the BCI website at 
https://blissymbolics.org/index.php/symbol-files-2024. Some examples 
were included in the test font.

The current Bliss Unicode draft proposal by Michael Everson is published 
at https://www.unicode.org/wg2/docs/n5228-blissymbols.pdf. This is not 
finalized so Michael requests that people do not start using the code 
points published there until it is. This discussion shifts the code 
points into the private use area in order to comply with that request.

## Mapping code points within the font

This section will document examples of making the mapping between 
Unicode code points corresponding to Bliss word parts (Bliss-characters) 
to Bliss-words (Blissymbols) for the particular concepts encoded by 
those Bliss-words.

The pros and cons of using Blissymbol (proposed) Unicode code points as 
the method of identifying symbols and mapping between AAC symbol sets is 
discussed. In order to demonstrate why this is the case, we will assume 
we are using these code points along with WOFF2 fonts for displaying 
symbols and provide some use cases to illustrate the problems that will 
arise.

1) Simplest case: Annotating the word "drink".

The Bliss-word "drink" is identified by the BCI-ID 13881. Here we use a 
demonstration Bliss Unicode code point of U+E20E for drink. A font 
containing glyphs for all Bliss-characters would correctly display the 
Blissymbol for drink in text strings containing this code point. In this 
case, the Blissymbol for drink is both a Bliss-character and a 
Bliss-word. An analogy would be the letter "a" in English which is also 
a meaningful word on its own.

An AAC symbol user who prefers ARASAAC symbols, for example, could have 
the same ruby annotation displayed as an ARASAAC symbol simply by using 
a font with the glyph for the Bliss character "drink" replaced by the 
ARASAAC symbol for "drink". So far, so good.

2) Less simple case: Annotating the word "tea".

The Bliss-word "tea" is identified by the BCI-ID 17511. It is composed 
of two Bliss-characters, "drink" followed by "leaf". Using U+E451 as the 
code point for leaf, the Unicode representation of "tea" is a two 
character string of the code points U+E20E and U+E451. In HTML it can be 
represented as "&#xE20E;&#xE451;".

The ARASAAC symbol set ‍— in common with all AAC symbol sets other than 
Bliss ‍— has a single pictographic symbol for tea. As an aside, there 
are often multiple alternate symbols for the same concept in non-Bliss 
AAC symbol sets but a particular one would normally be selected for a 
particular user. So in an ARASAAC font, how do we map a single image to 
a sequence of two or more code points?

Option a) A mechanism exists in Unicode for character composition 
whereby a base character followed by one or more combining characters is 
equivalent to and may be replaced by what is called a precomposed 
character. This is typically used for diacritical marks, superscripts, 
subscripts, etc. At first glance this may seem a plausible solution but 
it would require almost every Bliss-character to be used as a combining 
character as well as non-combining and is not how the Bliss encoding is 
designed and will not work without extensive modifications to the 
current proposed encoding. In addition, pre-composed characters would 
have to be added to fonts with every new addition to the Bliss 
vocabulary...in short, this is not how this Unicode mechanism is 
intended to be used.

Option b) Another Unicode mechanism exists that is used in the 
implementation of Emoji combinations is zero width joiner (ZWJ) 
sequences. This approach could conceivably work for AAC symbol 
annotation. By inserting a ZWJ code point between elements in a 
sequence, the rendering engine is instructed to consider the code points 
in the sequence as a group and display an image that it retrieves from a 
lookup table or through a similar technique. The natural Unicode way to 
support this is by using ligatures which replace the sequence with a 
single glyph. Usually a ligature is a glyph that occurs in some 
languages when a certain sequence of characters occurs that when written 
together become joined or overlapped in some way. Examples in Latin 
scripted languages are character combinations such as OE becoming Œ in 
Old English or IJ becoming Ĳ in Dutch. But there is no reason this could 
not be used to convert Bliss-words composed of multiple characters to 
ARASAAC or other AAC symbol sets with a single image for the concept 
being represented. In this use case example, the Bliss-character 
sequence of "drink"+ZWJ+"leaf" would be replaced with a ligature glyph 
associated with that sequence in the particular font being used. In an 
ARASAAC font for example, the sequence would be replaced with the image 
of a cup of tea with a tea bag in it.

There are about 1,200 Bliss-characters and currently about 6,400 
Blissymbols. So approximately 5,200 Blissymbols (Bliss-words) are 
composed of sequences of two or more Bliss-characters that would need to 
have ligatures provided for them for the full current vocabulary to be 
covered in a non-Bliss AAC annotation font. Just over 600 characters 
serve as initial characters in multi-character symbols giving an average 
of about 8.5 ligatures per base character. About 2,000 symbols are 
composed of three characters, 2,000 have four characters and 900 have 
five characters. With WAI-Adapt targetting core annotation i.e. not 
expecting 100% coverage, this could be reduced to a core vocabulary but 
if there are no practical limitations then it may just be easiest to 
implement the entire vocabulary.

3) More troublesome case: Annotating the expression "chocolate drink".

In Blissymbolics there is a Bliss-word for "chocolate drink" with BCI-ID 
20772 that is made up of characters "drink" + "bean" + "up". Using HTML 
hexadecimal notation this would be written as 
"&#xE20E;&#x200D;&#xE44A;&#x200D;&#xE265;" where x200D is the hex value 
of the ZWJ non-spacing code point.

So, other than having an extra character in the string, why is this a 
more troublesome case? The reason is not that it is more difficult to 
create a ligature with a single glyph to be displayed in place of the 
string of glyphs. The reason is to do with the nature of Bliss as a 
living language. The symbol for "chocolate drink" illustrates a 
relatively common occurrence with Bliss that will cause problems in the 
future. The problem is that Bliss spellings can change over time. In 
this case, "chocolate" was formerly spelled completely differently as 
"powder" + "brown". The language undergoes constant review and revision 
by users, teachers, and other participants in its development. For 
backward compatibility the previous spellings are retained and marked 
with _OLD to indicate that they are deprecated. However, the 
implications of this for non-Bliss symbol set developers is that 
whenever a Bliss spelling changes, they would have to modify their font 
to support the new spelling in addition to the deprecated one. Also, due 
to the nature of Bliss, there would be a cascading effect where every 
other symbol that includes "chocolate" such as "chocolate sauce", 
"chocolate spread", etc, would also have to be modified.

If an alternate approach were used where the identifier for each concept 
is independent of a particular representation of that concept, then no 
symbol set will be dependent on ‍— or, in software design terminology 
"coupled to" ‍— any other symbol set. Coupling all other AAC symbol sets 
to Blissymbolics would be a design mistake, in this writer's opinion.

4) Most troublesome case: Annotating a word that does not exist in the 
BCI AV

The BCI Authorized Vocabulary currently contains about 6,400 entries. It 
is being expanded all the time but coming up with new Blissymbols is a 
time consuming process. The quality of the language depends on 
consistent, well thought out strategies for representing increasingly 
complex concepts. ARASAAC has over 13,000. Other symbol sets likely have 
more. At first glance you might think that one of these sets would 
contain all Blissymbol concepts with many others as well but this is not 
the case. Bliss tends to have a single symbol for a concept ‍— with the 
exception of deprecated symbols ‍— whereas most other, purely 
pictographic symbol sets often have many alternative images for the same 
concept with different alternatives often representing the idea in a way 
that is more relateable to for a particular user. This is much along the 
lines of alternative Emoji of humans with different skin tone. Accurate 
counts of how much duplication within a symbol set and how much overlap 
between symbol sets are not available but it can be safely assumed that 
there are significant percentages of both.

So...what happens when an ARASAAC symbol is required for a user and a 
corresponding Blissymbol does not exist? What gets coded into the HTML? 
How long will the ARASAAC user have to wait for the corresponding 
Blissymbol to be created so there is a spelling to be entered in the 
ruby annotation so the ARASAAC providers can provide a ligature for it 
in their font? That could be a very long wait. The people developing 
Blissymbols naturally respond to the Blissymbol user community with 
priority. This is the ultimate example of how coupling to the 
Blissymbolics Unicode encoding would create barriers to other symbol set 
providers. They would essentially be blocked from providing a particular 
symbol if an equivalent did not exist in the BCI AV. Browser vendors 
could provide a workaround but with accessibility already a low priority 
and the tiny subset of AAC users being an even lower priority it is in 
the best interests of those users to design a standard from the outset 
without such barriers to overcome.

Conclusion
Although on face value it may seem attractive to use the proposed 
Blissymbol Unicode encoding instead of a W3C registry as the reference 
for AAC symbol annotation, I don't believe it is the right thing to do 
and cannot recommend it for this purpose.

Also, it is important to note that the problem illustrated in use case 
4) would exist with any AAC symbol set being used as a reference. If a 
registry were used it would ideally initially consist of a superset of 
all existing AAC symbol set concepts using an identifier that is 
independent of any one symbol set. Whether this ID is created for the 
purpose or borrowed from other widely used lexical database systems such 
as WordNet (wordnet.princeton.edu) or ConceptNet (conceptnet.io) or a 
combination thereof is a matter for further discussion. For example, one 
way to define the ID would be to use code points in the private use area 
which would then just be directly displayed by the rendering engine 
without any need to resort to ligatures. Or perhaps ConceptNet IDs could 
be used which are then directly mapped to private use area code points. 
ConceptNet is an interesting option because it is already very extensive 
in scope but also relatively open for adding entries.

To be fair, the advantage of using the proposed Bliss Unicode encoding 
along with Ruby and fonts for display is that most of the work is 
already done. SVG fonts with ligatures would have to be created but for 
the AAC annotation display mechanism that's about it.

## Rendering: use of ruby element

The demonstration HTML using ruby elements to annotate text with symbols 
implemented as a web font works well. There could be a practical issue 
with layout when using symbols for annotation of text that was not 
anticipated to be annotated when the original layout was done. This 
could possibly also be an issue for ruby annotation of Japanese text for 
pronunciation but this use of ruby is more likely to be as designed as 
opposed to after-the-fact as will be the case for symbol annotation. In 
addition, the size of the symbols will, instead of being smaller than 
the base text which is usually the case for ruby, will usually be 
required to be larger than the base text thereby creating a higher 
probability of layout and formatting issues.

### In languages that don't use ruby

If ruby is not typically used in a language then there are no potential 
collision issues of using ruby both as originally intended and as 
displaying symbol annotation as suggested.

### In languages that already use ruby

In languages that do use ruby ‍— usually for pronunciation annotation ‍— 
there is the potential for conflict with simultaneously using ruby for 
symbol annotation. However, ruby has been designed to support multiple 
annotations so ‍there may very well not be a conflict. Further testing 
is required but from rudimentary inital experimentation, reasonably 
satisfactory results were obtained. Multiple annotations are definitely 
possible, the only questions would be regarding what is acceptable 
layout and how would an algorithm achieve that when annotating with symbols.


### Hiding ruby elements containing symbol content

This is needed so that users who are not using symbols do not see such 
content.

This raises the question: are all pages to be pre-annotated with 
symbols? Or are pages being viewed by symbol users to be pre-processed 
by the browser in order to provide annotation for the particular user?
Received on Tuesday, 17 December 2024 17:33:22 UTC