- From: r12a <ishida@w3.org>
- Date: Thu, 9 Apr 2026 18:01:10 +0100
- To: Andrew Cunningham <lang.support@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>
- Cc: public-i18n-core@w3.org, Fuqiao Xue <xfq@w3.org>
- Message-ID: <96ab34a4-a52e-32a2-7d08-946addec30fe@w3.org>
Thanks to the fragmentation associated with email threads (a thing i had long forgotten about) i don't think that the (personal) response i gave to the WCAG folks has had any visibility to people on this thread. I'm therefore going to copy it here, for posterity. However, i recommend that we now wait for them to create issues in a WCAG repo, where we can continue that discussion with hopefully a little more light shed on what they are looking for. They have already agreed to do that. So here we go: > Steven, before this goes much further, please switch the discussion to > a GH issue. We find it difficult to follow and share information from > emails, and we cannot fit them into our review management system. You > could raise an issue in the WCAG repo of your choice and add the > i18n-track label — that will alert us to the issue automatically. > > I suggest you add your questions to the initial comment, and then add > what i'm going to say in another comment. > > > > > Rather than just generally talk about diacritics, i think you may need > to be more specific about what diacritics are relevant here. Leaving > aside the Hawaiian example for now (which i'd like to understand > better), the type of diacritic you seem to be mostly concerned with > are those that indicate vowel sounds in scripts of type 'abjad'. > These scripts include Arabic, Hebrew, and Classical Syriac. Note > carefully that these are scripts, not languages. The Arabic script is > used to write the largest number of languages in the world, after > Latin script. You can find lists of languages used Arabic and Hebrew > scripts at > https://www.w3.org/International/questions/qa-scripts.en.html#how-many-people > > However, it is important to note that not all language orthographies > using the Arabic script hide their diacritics. Languages that hide > the diacritics include Arabic (and its dialects), Pashto, Persian, > Saraiki, Sindhi, and Urdu, etc. Other languages using the Arabic > script always show all diacritics include Fulfulde, Hausa, Kashmiri, > and Wolof, etc. Other languages using the Arabic script don't actually > use diacritics at all to represent vowel sounds; these include Sorani, > Uighur, etc. > > It is important to also recognise that not all marks on the page that > look like diacritics are ignorable, even in abjads. See this > description of the difference between ijam and tashkil > https://r12a.github.io/scripts/arab/homographs#ijam_tashkil. (It's the > tashkil, only, that get hidden in Arabic script.) > > It's also worth noting that Hebrew has 3 spelling variants, one of > which is possibly worth considering as an alternative to flipping > diacritics on and off. See > https://r12a.github.io/scripts/hebr/he.html#spelling. There are also > some diacritics that are more useful than others for disambiguating > sounds. If you have time, you can read more about that in the article > just pointed to. > > Classical Syriac is an abjad, but is not much used these days. Other > uses of the Syriac script include for Assyrian Neo-Aramaic and Turoyo > communities. Their orthographies generally preserve the diacritics, > and so are not abjads. > > > > > > Fine points on your Google doc: > > in the text > > * > > ملاك, pronounced malaak, means “angel” > > * > > مَلك, pronounced malak, is an archaic version that means “angel” > mostly in religious texts > > * > > ملك, pronounced malik, means “king” > > the difference between the first bullet and the second involves letter > changes, not diacritic changes — so it's not a good example. I may be > able to come up with something better, if you need. > > The 3rd bullet doesn't even show the kasra diacritic that indicates > the i. To contrast those properly you'd need to show: > > • مَلَك > • مَلِك > > hope that helps > ri Btw, i also looked into the Hawaiian issue a little. The examples in the Google Doc don't make it clear to me from a quick reading whether the apostrophe used for the glottal stop is one of the things they consider to be a 'diacritic'. But the vowel lengthening macron does seem to be one such. Since the macron is phonemically distinctive, i'm dubious that it's sensible to propose that it be dropped for readers. > Fuqiao Xue <mailto:xfq@w3.org> > 9 April 2026 at 02:13 > Update: They’re reviewing what Richard sent, and are awaiting some > survey results and need to integrate those. They will then compose a > set of GitHub issues to formalize the remaining questions they have > and share them with us (so we can track), and won't attend our meeting > today. > > Fuqiao > > > > Martin J. Dürst <mailto:duerst@it.aoyama.ac.jp> > 9 April 2026 at 00:22 > Hello everybody, > > On 2026-04-08 20:08, Andrew Cunningham wrote: >> On Tue, 7 Apr 2026 at 03:56, Addison Phillips <addisoni18n@gmail.com> >> wrote: >> >>> It would be useful to know what they are actually trying to achieve. >>> Sometimes "removing diacritics" is a naive thing that (for example) >>> English speakers try to do (because, generally speaking, they are >>> affectations in English). > > Exactly. I think many on this list get somewhat confused because of > the word 'diacritics'. My assumption would be that they were looking > at a phenomenon (languages that in their written form have more or > less information, where the form with less information is the 'usual' > form, but the form with more information is helpful in an > accessibility context because it makes it easier for some people to > read). The actual graphical expression of the information was mostly > in form of additional marks, and they then called that 'diacritics' > because that's a word they were familiar with. > > Examples that come to my mind that haven't yet been mentioned: > - Stress marks in Russian (used for learners who don't know on which > syllable the stress is, potentially helpful in an accessibility context). > - Lengthening marks in Japanese written in Latin (e.g. Taro Sato vs. > Tarō Satō), similar to Hawai'ian. They may help foreigners with a bit > of knowledge of Japanese. > - Ruby in Japanese (these are extremely far from diacritics,... but > nevertheless can be very helpful in accessibility contexts) > > So the main point in the common discussion should be to look at the > purpose. Terminology should to be cleaned up, but that should be > secondary. > > >> I'd assume they are referring to languages that normally aren't >> marked, but >> can be marked for pedagogical reasons or to add clarity. Arabic, >> Lithuanian >> and a range of African languages come to mind. >> >> There are no lists of such languages. It would also have to be >> orthography >> specific not just language specific. >> >> The only language independent way of achieving this that would also work >> with any tech stack would be having both versions of the text stored and >> switching between them. > > Fully agree. > > Regards, Martin. > >>> The meaning of "diacritic" itself is complex. Some diacritics alter or >>> hint the pronunciation of the base letter. Other diacritics are used to >>> form an entirely different letter. Diacritics are not just used with >>> the >>> Latin script. There is also the tendency to confuse "combining mark" >>> with "diacritic". Without knowing what or why, it's difficult to make >>> progress--and there might be better approaches than removing >>> information >>> from the text. >>> >>> Look forward to the conversation. >>> >>> Addison >>> >>> On 4/6/2026 5:39 AM, Fuqiao Xue wrote: >>>> The WCAG 3 Text & Wording subgroup is defining use of diacritics for >>>> languages "where they are optional". Here's their current >>>> draft/working document for that provision: >>>> >>>> >>> https://docs.google.com/document/d/1z_Xuava_GS-Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing >>> >>>> >>>> >>>> They are asking us to help them on principles or practices that may >>>> guide this work. >>>> >>>> Some of the specific concerns are around: >>>> >>>> 1. Identifying the applicable languages. Is there a list, or >>>> especially some programmatic standard to identify those? >>>> 2. How assistive technology actually handles (or should handle!) cases >>>> like this. Is requiring the full-diacritic version the right answer? >>>> 3. Expectations around burden/effort. It was brought up that having >>>> both versions in a datastore, and a user-visible toggle, is a big >>>> change. >>>> >>>> They are happy to answer questions, or have a joint call to talk about >>>> this. >>>> >>>> Any thoughts? >>>> >>> -- >>> Internationalization is not a feature. >>> It is an architecture. > > > Andrew Cunningham <mailto:lang.support@gmail.com> > 8 April 2026 at 12:08 > > > On Tue, 7 Apr 2026 at 03:56, Addison Phillips <addisoni18n@gmail.com > <mailto:addisoni18n@gmail.com>> wrote: > > It would be useful to know what they are actually trying to achieve. > Sometimes "removing diacritics" is a naive thing that (for example) > English speakers try to do (because, generally speaking, they are > affectations in English). > > > > I'd assume they are referring to languages that normally aren't > marked, but can be marked for pedagogical reasons or to add clarity. > Arabic, Lithuanian and a range of African languages come to mind. > > There are no lists of such languages. It would also have to be > orthography specific not just language specific. > > The only language independent way of achieving this that would also > work with any tech stack would be having both versions of the text > stored and switching between them. > > > The meaning of "diacritic" itself is complex. Some diacritics > alter or > hint the pronunciation of the base letter. Other diacritics are > used to > form an entirely different letter. Diacritics are not just used > with the > Latin script. There is also the tendency to confuse "combining mark" > with "diacritic". Without knowing what or why, it's difficult to make > progress--and there might be better approaches than removing > information > from the text. > > Look forward to the conversation. > > Addison > > On 4/6/2026 5:39 AM, Fuqiao Xue wrote: > > The WCAG 3 Text & Wording subgroup is defining use of diacritics > for > > languages "where they are optional". Here's their current > > draft/working document for that provision: > > > > > https://docs.google.com/document/d/1z_Xuava_GS-Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing > > > > > > > They are asking us to help them on principles or practices that may > > guide this work. > > > > Some of the specific concerns are around: > > > > 1. Identifying the applicable languages. Is there a list, or > > especially some programmatic standard to identify those? > > 2. How assistive technology actually handles (or should handle!) > cases > > like this. Is requiring the full-diacritic version the right answer? > > 3. Expectations around burden/effort. It was brought up that having > > both versions in a datastore, and a user-visible toggle, is a > big change. > > > > They are happy to answer questions, or have a joint call to talk > about > > this. > > > > Any thoughts? > > > -- > Internationalization is not a feature. > It is an architecture. > > > > > -- > Andrew Cunningham > lang.support@gmail.com <mailto:lang.support@gmail.com> > > > Addison Phillips <mailto:addisoni18n@gmail.com> > 6 April 2026 at 18:55 > It would be useful to know what they are actually trying to achieve. > Sometimes "removing diacritics" is a naive thing that (for example) > English speakers try to do (because, generally speaking, they are > affectations in English). > > The meaning of "diacritic" itself is complex. Some diacritics alter or > hint the pronunciation of the base letter. Other diacritics are used > to form an entirely different letter. Diacritics are not just used > with the Latin script. There is also the tendency to confuse > "combining mark" with "diacritic". Without knowing what or why, it's > difficult to make progress--and there might be better approaches than > removing information from the text. > > Look forward to the conversation. > > Addison > > > Fuqiao Xue <mailto:xfq@w3.org> > 6 April 2026 at 13:39 > The WCAG 3 Text & Wording subgroup is defining use of diacritics for > languages "where they are optional". Here's their current > draft/working document for that provision: > > https://docs.google.com/document/d/1z_Xuava_GS-Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing > > > They are asking us to help them on principles or practices that may > guide this work. > > Some of the specific concerns are around: > > 1. Identifying the applicable languages. Is there a list, or > especially some programmatic standard to identify those? > 2. How assistive technology actually handles (or should handle!) cases > like this. Is requiring the full-diacritic version the right answer? > 3. Expectations around burden/effort. It was brought up that having > both versions in a datastore, and a user-visible toggle, is a big change. > > They are happy to answer questions, or have a joint call to talk about > this. > > Any thoughts? >
Received on Thursday, 9 April 2026 17:01:15 UTC