Re: agenda+ Diacritics in WCAG from r12a on 2026-04-09 (public-i18n-core@w3.org from April to June 2026)

From: r12a <ishida@w3.org>
Date: Thu, 9 Apr 2026 18:01:10 +0100
To: Andrew Cunningham <lang.support@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>
Cc: public-i18n-core@w3.org, Fuqiao Xue <xfq@w3.org>
Message-ID: <96ab34a4-a52e-32a2-7d08-946addec30fe@w3.org>
Thanks to the fragmentation associated with email threads (a thing i had 
long forgotten about)  i don't think that the (personal) response i gave 
to the WCAG folks has had any visibility to people on this thread.  I'm 
therefore going to copy it here, for posterity.  However, i recommend 
that we now wait for them to create issues in a WCAG repo, where we can 
continue that discussion with hopefully a little more light shed on what 
they are looking for.  They have already agreed to do that.

So here we go:

> Steven, before this goes much further, please switch the discussion to 
> a GH issue.  We find it difficult to follow and share information from 
> emails, and we cannot fit them into our review management system.  You 
> could raise an issue in the WCAG repo of your choice and add the 
> i18n-track label — that will alert us to the issue automatically.
>
> I suggest you add your questions to the initial comment, and then add 
> what i'm going to say in another comment.
>
>
>
>
> Rather than just generally talk about diacritics, i think you may need 
> to be more specific about what diacritics are relevant here. Leaving 
> aside the Hawaiian example for now (which i'd like to understand 
> better), the type of diacritic you seem to be mostly concerned with 
> are those that indicate vowel sounds in scripts of type 'abjad'.  
> These scripts include Arabic, Hebrew, and Classical Syriac.  Note 
> carefully that these are scripts, not languages.  The Arabic script is 
> used to write the largest number of languages in the world, after 
> Latin script.  You can find lists of languages used Arabic and Hebrew 
> scripts at 
> https://www.w3.org/International/questions/qa-scripts.en.html#how-many-people
>
> However, it is important to note that not all language orthographies 
> using the Arabic script hide their diacritics.  Languages that hide 
> the diacritics include Arabic (and its dialects), Pashto, Persian, 
> Saraiki, Sindhi, and Urdu, etc.  Other languages using the Arabic 
> script always show all diacritics include Fulfulde, Hausa, Kashmiri, 
> and Wolof, etc. Other languages using the Arabic script don't actually 
> use diacritics at all to represent vowel sounds; these include Sorani, 
> Uighur, etc.
>
> It is important to also recognise that not all marks on the page that 
> look like diacritics are ignorable, even in abjads.  See this 
> description of the difference between ijam and tashkil 
> https://r12a.github.io/scripts/arab/homographs#ijam_tashkil. (It's the
> tashkil, only, that get hidden in Arabic script.)
>
> It's also worth noting that Hebrew has 3 spelling variants, one of 
> which is possibly worth considering as an alternative to flipping 
> diacritics on and off. See 
> https://r12a.github.io/scripts/hebr/he.html#spelling. There are also
> some diacritics that are more useful than others for disambiguating 
> sounds. If you have time, you can read more about that in the article 
> just pointed to.
>
> Classical Syriac is an abjad, but is not much used these days.  Other 
> uses of the Syriac script include for Assyrian Neo-Aramaic and Turoyo 
> communities. Their orthographies generally preserve the diacritics, 
> and so are not abjads.
>
>
>
>
>
> Fine points on your Google doc:
>
> in the text
>
>  *
>
>     ملاك, pronounced malaak, means “angel”
>
>  *
>
>     مَلك, pronounced malak, is an archaic version that means “angel”
>     mostly in religious texts
>
>  *
>
>     ملك, pronounced malik, means “king”
>
> the difference between the first bullet and the second involves letter 
> changes, not diacritic changes — so it's not a good example.  I may be 
> able to come up with something better, if you need.
>
> The 3rd bullet doesn't even show the kasra diacritic that indicates 
> the i.  To contrast those properly you'd need to show:
>
> • مَلَك
> • مَلِك
>
> hope that helps
> ri


Btw, i also looked into the Hawaiian issue a little.  The examples in 
the Google Doc don't make it clear to me from a quick reading whether 
the apostrophe used for the glottal stop is one of the things they 
consider to be a 'diacritic'.  But the vowel lengthening macron does 
seem to be one such. Since the macron is phonemically distinctive, i'm 
dubious that it's sensible to propose that it be dropped for readers.


> Fuqiao Xue <mailto:xfq@w3.org>
> 9 April 2026 at 02:13
> Update: They’re reviewing what Richard sent, and are awaiting some 
> survey results and need to integrate those. They will then compose a 
> set of GitHub issues to formalize the remaining questions they have 
> and share them with us (so we can track), and won't attend our meeting 
> today.
>
> Fuqiao
>
>
>
> Martin J. Dürst <mailto:duerst@it.aoyama.ac.jp>
> 9 April 2026 at 00:22
> Hello everybody,
>
> On 2026-04-08 20:08, Andrew Cunningham wrote:
>> On Tue, 7 Apr 2026 at 03:56, Addison Phillips <addisoni18n@gmail.com> 
>> wrote:
>>
>>> It would be useful to know what they are actually trying to achieve.
>>> Sometimes "removing diacritics" is a naive thing that (for example)
>>> English speakers try to do (because, generally speaking, they are
>>> affectations in English).
>
> Exactly. I think many on this list get somewhat confused because of 
> the word 'diacritics'. My assumption would be that they were looking 
> at a phenomenon (languages that in their written form have more or 
> less information, where the form with less information is the 'usual' 
> form, but the form with more information is helpful in an 
> accessibility context because it makes it easier for some people to 
> read). The actual graphical expression of the information was mostly 
> in form of additional marks, and they then called that 'diacritics' 
> because that's a word they were familiar with.
>
> Examples that come to my mind that haven't yet been mentioned:
> - Stress marks in Russian (used for learners who don't know on which 
> syllable the stress is, potentially helpful in an accessibility context).
> - Lengthening marks in Japanese written in Latin (e.g. Taro Sato vs. 
> Tarō Satō), similar to Hawai'ian. They may help foreigners with a bit 
> of knowledge of Japanese.
> - Ruby in Japanese (these are extremely far from diacritics,... but 
> nevertheless can be very helpful in accessibility contexts)
>
> So the main point in the common discussion should be to look at the 
> purpose. Terminology should to be cleaned up, but that should be 
> secondary.
>
>
>> I'd assume they are referring to languages that normally aren't 
>> marked, but
>> can be marked for pedagogical reasons or to add clarity. Arabic, 
>> Lithuanian
>> and a range of African languages come to mind.
>>
>> There are no lists of such languages. It would also have to be 
>> orthography
>> specific not just language specific.
>>
>> The only language independent way of achieving this that would also work
>> with any tech stack would be having both versions of the text stored and
>> switching between them.
>
> Fully agree.
>
> Regards,    Martin.
>
>>> The meaning of "diacritic" itself is complex. Some diacritics alter or
>>> hint the pronunciation of the base letter. Other diacritics are used to
>>> form an entirely different letter. Diacritics are not just used with 
>>> the
>>> Latin script. There is also the tendency to confuse "combining mark"
>>> with "diacritic". Without knowing what or why, it's difficult to make
>>> progress--and there might be better approaches than removing 
>>> information
>>> from the text.
>>>
>>> Look forward to the conversation.
>>>
>>> Addison
>>>
>>> On 4/6/2026 5:39 AM, Fuqiao Xue wrote:
>>>> The WCAG 3 Text & Wording subgroup is defining use of diacritics for
>>>> languages "where they are optional". Here's their current
>>>> draft/working document for that provision:
>>>>
>>>>
>>> https://docs.google.com/document/d/1z_Xuava_GS-Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing 
>>>
>>>>
>>>>
>>>> They are asking us to help them on principles or practices that may
>>>> guide this work.
>>>>
>>>> Some of the specific concerns are around:
>>>>
>>>> 1. Identifying the applicable languages. Is there a list, or
>>>> especially some programmatic standard to identify those?
>>>> 2. How assistive technology actually handles (or should handle!) cases
>>>> like this. Is requiring the full-diacritic version the right answer?
>>>> 3. Expectations around burden/effort. It was brought up that having
>>>> both versions in a datastore, and a user-visible toggle, is a big 
>>>> change.
>>>>
>>>> They are happy to answer questions, or have a joint call to talk about
>>>> this.
>>>>
>>>> Any thoughts?
>>>>
>>> -- 
>>> Internationalization is not a feature.
>>> It is an architecture.
>
>
> Andrew Cunningham <mailto:lang.support@gmail.com>
> 8 April 2026 at 12:08
>
>
> On Tue, 7 Apr 2026 at 03:56, Addison Phillips <addisoni18n@gmail.com 
> <mailto:addisoni18n@gmail.com>> wrote:
>
>     It would be useful to know what they are actually trying to achieve.
>     Sometimes "removing diacritics" is a naive thing that (for example)
>     English speakers try to do (because, generally speaking, they are
>     affectations in English).
>
>
>
> I'd assume they are referring to languages that normally aren't 
> marked, but can be marked for pedagogical reasons or to add clarity. 
> Arabic, Lithuanian and a range of African languages come to mind.
>
> There are no lists of such languages. It would also have to be 
> orthography specific not just language specific.
>
> The only language independent way of achieving this that would also 
> work with any tech stack would be having both versions of the text 
> stored and switching between them.
>
>
>     The meaning of "diacritic" itself is complex. Some diacritics
>     alter or
>     hint the pronunciation of the base letter. Other diacritics are
>     used to
>     form an entirely different letter. Diacritics are not just used
>     with the
>     Latin script. There is also the tendency to confuse "combining mark"
>     with "diacritic". Without knowing what or why, it's difficult to make
>     progress--and there might be better approaches than removing
>     information
>     from the text.
>
>     Look forward to the conversation.
>
>     Addison
>
>     On 4/6/2026 5:39 AM, Fuqiao Xue wrote:
>     > The WCAG 3 Text & Wording subgroup is defining use of diacritics
>     for
>     > languages "where they are optional". Here's their current
>     > draft/working document for that provision:
>     >
>     >
>     https://docs.google.com/document/d/1z_Xuava_GS-Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing
>
>     >
>     >
>     > They are asking us to help them on principles or practices that may
>     > guide this work.
>     >
>     > Some of the specific concerns are around:
>     >
>     > 1. Identifying the applicable languages. Is there a list, or
>     > especially some programmatic standard to identify those?
>     > 2. How assistive technology actually handles (or should handle!)
>     cases
>     > like this. Is requiring the full-diacritic version the right answer?
>     > 3. Expectations around burden/effort. It was brought up that having
>     > both versions in a datastore, and a user-visible toggle, is a
>     big change.
>     >
>     > They are happy to answer questions, or have a joint call to talk
>     about
>     > this.
>     >
>     > Any thoughts?
>     >
>     -- 
>     Internationalization is not a feature.
>     It is an architecture.
>
>
>
>
> -- 
> Andrew Cunningham
> lang.support@gmail.com <mailto:lang.support@gmail.com>
>
>
> Addison Phillips <mailto:addisoni18n@gmail.com>
> 6 April 2026 at 18:55
> It would be useful to know what they are actually trying to achieve. 
> Sometimes "removing diacritics" is a naive thing that (for example) 
> English speakers try to do (because, generally speaking, they are 
> affectations in English).
>
> The meaning of "diacritic" itself is complex. Some diacritics alter or 
> hint the pronunciation of the base letter. Other diacritics are used 
> to form an entirely different letter. Diacritics are not just used 
> with the Latin script. There is also the tendency to confuse 
> "combining mark" with "diacritic". Without knowing what or why, it's 
> difficult to make progress--and there might be better approaches than 
> removing information from the text.
>
> Look forward to the conversation.
>
> Addison
>
>
> Fuqiao Xue <mailto:xfq@w3.org>
> 6 April 2026 at 13:39
> The WCAG 3 Text & Wording subgroup is defining use of diacritics for 
> languages "where they are optional". Here's their current 
> draft/working document for that provision:
>
> https://docs.google.com/document/d/1z_Xuava_GS-Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing 
>
>
> They are asking us to help them on principles or practices that may 
> guide this work.
>
> Some of the specific concerns are around:
>
> 1. Identifying the applicable languages. Is there a list, or 
> especially some programmatic standard to identify those?
> 2. How assistive technology actually handles (or should handle!) cases 
> like this. Is requiring the full-diacritic version the right answer?
> 3. Expectations around burden/effort. It was brought up that having 
> both versions in a datastore, and a user-visible toggle, is a big change.
>
> They are happy to answer questions, or have a joint call to talk about 
> this.
>
> Any thoughts?
>
Received on Thursday, 9 April 2026 17:01:15 UTC