W3C home > Mailing lists > Public > www-validator@w3.org > April 2013

IANA Language Subtag Values in HTML5 lang Attribute

From: Steven Turner <suibhne@cyberscotia.com>
Date: Fri, 26 Apr 2013 08:36:18 +0100
Message-ID: <1D3ACA40627C4D2FA20114EF2AC2589F@pc1>
To: <www-validator@w3.org>
I have many pages of HTML5 that validate perfectly with the following fragment -

      <div id="la_page_01" class="la_page la_black" lang="cy" dir="ltr">

- where lang="cy" indicates Cymraeg (Modern Welsh).  However, the <div> at stake here actually contains text in 
(medieval) Middle Welsh, but as soon as I attempt this perfectly-correct markup with lang="wlm" (Middle Welsh) -

      <div id="la_page_01" class="la_page la_black" lang="wlm" dir="ltr">

- the Validator says -

===========================
Bad value wlm for attribute lang on element div: The language subtag wlm is not a valid ISO language part of a language 
tag.
      <div id="la_page_01" class="la_page la_black" lang="wlm" dir="ltr">
Syntax of language tag: An RFC 5646 language tag consists of hyphen-separated ASCII-alphanumeric subtags. There is a 
primary tag identifying a natural language by its shortest ISO 639 language code (e.g. en for English) and zero or more 
additional subtags adding precision. The most common additional subtag type is a region subtag which most commonly is a 
two-letter ISO 3166 country code (e.g. GB for the United Kingdom). IANA maintains a registry of permissible subtags.
===========================

But aside from the obvious fact that hyphen-separated values are not quite as important as this report suggests, it is 
simply the case that, from the link it provides to IANA's "registry of permissible subtags" 
(http://www.iana.org/assignments/language-subtag-registry), the following is clearly defined -

Type: language
Subtag: wlm
Description: Middle Welsh
Added: 2009-07-29

In other words, lang="wlm" is indeed valid, and has been for nearly 4 years now!

I've tested with various other IANA-permissible subtags for historical languages from Western Europe, particularly the 
Celtic and Germanic ones from Britain & Ireland, and found that the Validator's accuracy is in fact generally *erratic* 
on this issue.  For example, the Validator doesn't seem to have a problem with the Irish analogue to my Welsh situation 
above - both Modern Irish (lang="ga") and Middle Irish (lang="mga") validate exactly as they should.  Whereas switching 
the lang attribute's value between Modern Cornish ("kw") and Middle Cornish ("cnx") gives the same results as with Welsh 
and Middle Welsh.

So, the Validator's behaviour seems remarkably inconsistent on this issue.  I can only assume that its list of 
IANA-permissible subtags requires an update?  As it currently stands, it's a rather irritating wee bug for historical 
researchers!

Thanks in advance for your time and consideration.  :-)


yours aye,

dr. steve sweeney-turner
http://steve-sweeney-turner.com/
 
Received on Saturday, 27 April 2013 17:52:58 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:08 UTC