W3C home > Mailing lists > Public > www-validator@w3.org > August 2018

Re: Macrolanguage subtags are not recognized by validator.w3.org!

From: Mark Rogers <mark.rogers@powermapper.com>
Date: Thu, 2 Aug 2018 06:39:27 +0000
To: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <4DD79118-EF08-4B65-AABE-3094DED67732@powermapper.com>
HTML 5 uses the BCP47 standard for lang attributes.

BCP47 defines the handling of macrolanguages here:

Worth noting that BCP47 defines special handling for Chinese ('zh') and Arabic ('ar') then goes on to say “Two different languages encompassed by the same macrolanguage may differ from one another more than, say, French and Spanish do.”

The actual language registry used by the validator is here:

Best Regards

Mark Rogers - mark.rogers@powermapper.com<mailto:mark.rogers@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

From: Human Rights Activist <humanrightsactivist@pm.me>
Reply-To: Human Rights Activist <humanrightsactivist@pm.me>
Date: Wednesday, 1 August 2018 at 18:30
To: "www-validator@w3.org" <www-validator@w3.org>
Subject: Macrolanguage subtags are not recognized by validator.w3.org!
Resent-From: "www-validator@w3.org" <www-validator@w3.org>
Resent-Date: Wednesday, 1 August 2018 at 18:29


The validator prefer "ar-arb" for standard Arabic and does not accept the "arb" subtag alone. This is a problem!


Type: language

Subtag: arb

Description: Standard Arabic

Added: 2009-07-29

Macrolanguage: ar



The W3C validator does not recognize any macrolanguage subtag without the prefix.

W3C says: "The macrolanguage subtag can be used on its own" at https://www.w3.org/International/articles/language-tags/#extlang

"ar is a macrolanguage<http://www.w3.org/International/articles/language-tags/#extlang> that encompasses the following more specific primary language subtags: aao, abh, abv, acm, acq, acw, acx, acy, adf, aeb, aec, afb, ajp, apc, apd, arb, arq, ars, ary, arz, auz, avl, ayh, ayl, ayn, ayp, bbz, pga, shu, and ssh. If it doesn't break legacy usage for your application, you should use one of these more specific language subtags instead."

Will be nice to see a fix, otherwise we going to stuck in implementing language subtags correctly. I personally don't want use "ar-arb" when clearly "arb" is a valid tag. The sad part is that the validator reporting error for all macrolanguages and not allowing you to use the subtag without prefix. Contradiction here with IANA language subtag registry.

Other notable example is that validator not recognizing "fa-pes" nether "pes"  only "fa" (Persian) is recognized which is a macrolanguage.

Over all, I believe that the use of "arb" for Standard Arabic and "pes" for Iranian Persian is perfectly valid!

W3C validator MUST accept the subtags without the prefix!

It is better to use "gsw" for Swiss German than "de-CH-1996"!
I strongly encourage everyone to visit https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry  and take steps to fix the W3C Validator accordingly.

Recommendation: Accept AINA language subtags without prefix which will allow us to use very specific subtags like (gsw,  to describe the language of the content correctly.

Thank you!
Useful tool recommended by W3C:


(lookup for Swiss German "gsw", Iranian Persian "pes" and Standard Arabic "arb" returns no warning) Clearly need a fix in the validator.

Received on Thursday, 2 August 2018 06:39:54 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 2 August 2018 06:39:55 UTC