Re: agenda+ Diacritics in WCAG

Hello Richard,

Many thanks for re-posting. There may indeed have been people who missed 
your earlier post. However, it went to the list and is archived at 
https://www.w3.org/mid/dfaccfb9-1132-9288-6825-48b43aa83551@w3.org. And 
it was in part what prompted my response.

Re-reading, I found one additional issue. You write (with respect to 
Hawai'ian):
 > Since the macron is phonemically distinctive, i'm
 > dubious that it's sensible to propose that it be dropped for readers.

You could as well write "Since the short vowels in Arabic are 
phonemically distinctive, I'm dubious that it's sensible to propose that 
they be dropped for readers." But that's exactly what everyday Arabic 
writing does, and that's what some variant(s) Hawai'ian, probably the 
more popular, do.

Regards,   Martin.

On 2026-04-10 02:01, r12a wrote:
> Thanks to the fragmentation associated with email threads (a thing i had 
> long forgotten about)  i don't think that the (personal) response i gave 
> to the WCAG folks has had any visibility to people on this thread.  I'm 
> therefore going to copy it here, for posterity.  However, i recommend 
> that we now wait for them to create issues in a WCAG repo, where we can 
> continue that discussion with hopefully a little more light shed on what 
> they are looking for.  They have already agreed to do that.
> 
> So here we go:
> 
>> Steven, before this goes much further, please switch the discussion to 
>> a GH issue.  We find it difficult to follow and share information from 
>> emails, and we cannot fit them into our review management system.  You 
>> could raise an issue in the WCAG repo of your choice and add the i18n- 
>> track label — that will alert us to the issue automatically.
>>
>> I suggest you add your questions to the initial comment, and then add 
>> what i'm going to say in another comment.
>>
>>
>>
>>
>> Rather than just generally talk about diacritics, i think you may need 
>> to be more specific about what diacritics are relevant here. Leaving 
>> aside the Hawaiian example for now (which i'd like to understand 
>> better), the type of diacritic you seem to be mostly concerned with 
>> are those that indicate vowel sounds in scripts of type 'abjad'. These 
>> scripts include Arabic, Hebrew, and Classical Syriac.  Note carefully 
>> that these are scripts, not languages.  The Arabic script is used to 
>> write the largest number of languages in the world, after Latin 
>> script.  You can find lists of languages used Arabic and Hebrew 
>> scripts at https://www.w3.org/International/questions/qa- 
>> scripts.en.html#how-many-people
>>
>> However, it is important to note that not all language orthographies 
>> using the Arabic script hide their diacritics.  Languages that hide 
>> the diacritics include Arabic (and its dialects), Pashto, Persian, 
>> Saraiki, Sindhi, and Urdu, etc.  Other languages using the Arabic 
>> script always show all diacritics include Fulfulde, Hausa, Kashmiri, 
>> and Wolof, etc. Other languages using the Arabic script don't actually 
>> use diacritics at all to represent vowel sounds; these include Sorani, 
>> Uighur, etc.
>>
>> It is important to also recognise that not all marks on the page that 
>> look like diacritics are ignorable, even in abjads.  See this 
>> description of the difference between ijam and tashkil https:// 
>> r12a.github.io/scripts/arab/homographs#ijam_tashkil. (It's the
>> tashkil, only, that get hidden in Arabic script.)
>>
>> It's also worth noting that Hebrew has 3 spelling variants, one of 
>> which is possibly worth considering as an alternative to flipping 
>> diacritics on and off. See https://r12a.github.io/scripts/hebr/ 
>> he.html#spelling. There are also
>> some diacritics that are more useful than others for disambiguating 
>> sounds. If you have time, you can read more about that in the article 
>> just pointed to.
>>
>> Classical Syriac is an abjad, but is not much used these days.  Other 
>> uses of the Syriac script include for Assyrian Neo-Aramaic and Turoyo 
>> communities. Their orthographies generally preserve the diacritics, 
>> and so are not abjads.
>>
>>
>>
>>
>>
>> Fine points on your Google doc:
>>
>> in the text
>>
>>  *
>>
>>     ملاك, pronounced malaak, means “angel”
>>
>>  *
>>
>>     مَلك, pronounced malak, is an archaic version that means “angel”
>>     mostly in religious texts
>>
>>  *
>>
>>     ملك, pronounced malik, means “king”
>>
>> the difference between the first bullet and the second involves letter 
>> changes, not diacritic changes — so it's not a good example.  I may be 
>> able to come up with something better, if you need.
>>
>> The 3rd bullet doesn't even show the kasra diacritic that indicates 
>> the i.  To contrast those properly you'd need to show:
>>
>> • مَلَك
>> • مَلِك
>>
>> hope that helps
>> ri
> 
> 
> Btw, i also looked into the Hawaiian issue a little.  The examples in 
> the Google Doc don't make it clear to me from a quick reading whether 
> the apostrophe used for the glottal stop is one of the things they 
> consider to be a 'diacritic'.  But the vowel lengthening macron does 
> seem to be one such. Since the macron is phonemically distinctive, i'm 
> dubious that it's sensible to propose that it be dropped for readers.
> 
> 
>> Fuqiao Xue <mailto:xfq@w3.org>
>> 9 April 2026 at 02:13
>> Update: They’re reviewing what Richard sent, and are awaiting some 
>> survey results and need to integrate those. They will then compose a 
>> set of GitHub issues to formalize the remaining questions they have 
>> and share them with us (so we can track), and won't attend our meeting 
>> today.
>>
>> Fuqiao
>>
>>
>>
>> Martin J. Dürst <mailto:duerst@it.aoyama.ac.jp>
>> 9 April 2026 at 00:22
>> Hello everybody,
>>
>> On 2026-04-08 20:08, Andrew Cunningham wrote:
>>> On Tue, 7 Apr 2026 at 03:56, Addison Phillips <addisoni18n@gmail.com> 
>>> wrote:
>>>
>>>> It would be useful to know what they are actually trying to achieve.
>>>> Sometimes "removing diacritics" is a naive thing that (for example)
>>>> English speakers try to do (because, generally speaking, they are
>>>> affectations in English).
>>
>> Exactly. I think many on this list get somewhat confused because of 
>> the word 'diacritics'. My assumption would be that they were looking 
>> at a phenomenon (languages that in their written form have more or 
>> less information, where the form with less information is the 'usual' 
>> form, but the form with more information is helpful in an 
>> accessibility context because it makes it easier for some people to 
>> read). The actual graphical expression of the information was mostly 
>> in form of additional marks, and they then called that 'diacritics' 
>> because that's a word they were familiar with.
>>
>> Examples that come to my mind that haven't yet been mentioned:
>> - Stress marks in Russian (used for learners who don't know on which 
>> syllable the stress is, potentially helpful in an accessibility context).
>> - Lengthening marks in Japanese written in Latin (e.g. Taro Sato vs. 
>> Tarō Satō), similar to Hawai'ian. They may help foreigners with a bit 
>> of knowledge of Japanese.
>> - Ruby in Japanese (these are extremely far from diacritics,... but 
>> nevertheless can be very helpful in accessibility contexts)
>>
>> So the main point in the common discussion should be to look at the 
>> purpose. Terminology should to be cleaned up, but that should be 
>> secondary.
>>
>>
>>> I'd assume they are referring to languages that normally aren't 
>>> marked, but
>>> can be marked for pedagogical reasons or to add clarity. Arabic, 
>>> Lithuanian
>>> and a range of African languages come to mind.
>>>
>>> There are no lists of such languages. It would also have to be 
>>> orthography
>>> specific not just language specific.
>>>
>>> The only language independent way of achieving this that would also work
>>> with any tech stack would be having both versions of the text stored and
>>> switching between them.
>>
>> Fully agree.
>>
>> Regards,    Martin.
>>
>>>> The meaning of "diacritic" itself is complex. Some diacritics alter or
>>>> hint the pronunciation of the base letter. Other diacritics are used to
>>>> form an entirely different letter. Diacritics are not just used with 
>>>> the
>>>> Latin script. There is also the tendency to confuse "combining mark"
>>>> with "diacritic". Without knowing what or why, it's difficult to make
>>>> progress--and there might be better approaches than removing 
>>>> information
>>>> from the text.
>>>>
>>>> Look forward to the conversation.
>>>>
>>>> Addison
>>>>
>>>> On 4/6/2026 5:39 AM, Fuqiao Xue wrote:
>>>>> The WCAG 3 Text & Wording subgroup is defining use of diacritics for
>>>>> languages "where they are optional". Here's their current
>>>>> draft/working document for that provision:
>>>>>
>>>>>
>>>> https://docs.google.com/document/d/1z_Xuava_GS- 
>>>> Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing
>>>>>
>>>>>
>>>>> They are asking us to help them on principles or practices that may
>>>>> guide this work.
>>>>>
>>>>> Some of the specific concerns are around:
>>>>>
>>>>> 1. Identifying the applicable languages. Is there a list, or
>>>>> especially some programmatic standard to identify those?
>>>>> 2. How assistive technology actually handles (or should handle!) cases
>>>>> like this. Is requiring the full-diacritic version the right answer?
>>>>> 3. Expectations around burden/effort. It was brought up that having
>>>>> both versions in a datastore, and a user-visible toggle, is a big 
>>>>> change.
>>>>>
>>>>> They are happy to answer questions, or have a joint call to talk about
>>>>> this.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>> -- 
>>>> Internationalization is not a feature.
>>>> It is an architecture.
>>
>>
>> Andrew Cunningham <mailto:lang.support@gmail.com>
>> 8 April 2026 at 12:08
>>
>>
>> On Tue, 7 Apr 2026 at 03:56, Addison Phillips <addisoni18n@gmail.com 
>> <mailto:addisoni18n@gmail.com>> wrote:
>>
>>     It would be useful to know what they are actually trying to achieve.
>>     Sometimes "removing diacritics" is a naive thing that (for example)
>>     English speakers try to do (because, generally speaking, they are
>>     affectations in English).
>>
>>
>>
>> I'd assume they are referring to languages that normally aren't 
>> marked, but can be marked for pedagogical reasons or to add clarity. 
>> Arabic, Lithuanian and a range of African languages come to mind.
>>
>> There are no lists of such languages. It would also have to be 
>> orthography specific not just language specific.
>>
>> The only language independent way of achieving this that would also 
>> work with any tech stack would be having both versions of the text 
>> stored and switching between them.
>>
>>
>>     The meaning of "diacritic" itself is complex. Some diacritics
>>     alter or
>>     hint the pronunciation of the base letter. Other diacritics are
>>     used to
>>     form an entirely different letter. Diacritics are not just used
>>     with the
>>     Latin script. There is also the tendency to confuse "combining mark"
>>     with "diacritic". Without knowing what or why, it's difficult to make
>>     progress--and there might be better approaches than removing
>>     information
>>     from the text.
>>
>>     Look forward to the conversation.
>>
>>     Addison
>>
>>     On 4/6/2026 5:39 AM, Fuqiao Xue wrote:
>>     > The WCAG 3 Text & Wording subgroup is defining use of diacritics
>>     for
>>     > languages "where they are optional". Here's their current
>>     > draft/working document for that provision:
>>     >
>>     >
>>     https://docs.google.com/document/d/1z_Xuava_GS- 
>> Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing
>>
>>     >
>>     >
>>     > They are asking us to help them on principles or practices that may
>>     > guide this work.
>>     >
>>     > Some of the specific concerns are around:
>>     >
>>     > 1. Identifying the applicable languages. Is there a list, or
>>     > especially some programmatic standard to identify those?
>>     > 2. How assistive technology actually handles (or should handle!)
>>     cases
>>     > like this. Is requiring the full-diacritic version the right 
>> answer?
>>     > 3. Expectations around burden/effort. It was brought up that having
>>     > both versions in a datastore, and a user-visible toggle, is a
>>     big change.
>>     >
>>     > They are happy to answer questions, or have a joint call to talk
>>     about
>>     > this.
>>     >
>>     > Any thoughts?
>>     >
>>     --     Internationalization is not a feature.
>>     It is an architecture.
>>
>>
>>
>>
>> -- 
>> Andrew Cunningham
>> lang.support@gmail.com <mailto:lang.support@gmail.com>
>>
>>
>> Addison Phillips <mailto:addisoni18n@gmail.com>
>> 6 April 2026 at 18:55
>> It would be useful to know what they are actually trying to achieve. 
>> Sometimes "removing diacritics" is a naive thing that (for example) 
>> English speakers try to do (because, generally speaking, they are 
>> affectations in English).
>>
>> The meaning of "diacritic" itself is complex. Some diacritics alter or 
>> hint the pronunciation of the base letter. Other diacritics are used 
>> to form an entirely different letter. Diacritics are not just used 
>> with the Latin script. There is also the tendency to confuse 
>> "combining mark" with "diacritic". Without knowing what or why, it's 
>> difficult to make progress--and there might be better approaches than 
>> removing information from the text.
>>
>> Look forward to the conversation.
>>
>> Addison
>>
>>
>> Fuqiao Xue <mailto:xfq@w3.org>
>> 6 April 2026 at 13:39
>> The WCAG 3 Text & Wording subgroup is defining use of diacritics for 
>> languages "where they are optional". Here's their current draft/ 
>> working document for that provision:
>>
>> https://docs.google.com/document/d/1z_Xuava_GS- 
>> Fwfk4Hg8KYDr1WcjgcuswKmTELukzvwo/edit?usp=sharing
>>
>> They are asking us to help them on principles or practices that may 
>> guide this work.
>>
>> Some of the specific concerns are around:
>>
>> 1. Identifying the applicable languages. Is there a list, or 
>> especially some programmatic standard to identify those?
>> 2. How assistive technology actually handles (or should handle!) cases 
>> like this. Is requiring the full-diacritic version the right answer?
>> 3. Expectations around burden/effort. It was brought up that having 
>> both versions in a datastore, and a user-visible toggle, is a big change.
>>
>> They are happy to answer questions, or have a joint call to talk about 
>> this.
>>
>> Any thoughts?
>>
> 
> 

-- 
Prof. Dr.sc. Martin J. Dürst
Department of Intelligent Information Technology
College of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
252-5258 Japan

Received on Friday, 10 April 2026 00:34:35 UTC