Re: IANA Language Subtag Values in HTML5 lang Attribute from Jukka K. Korpela on 2013-04-27 (www-validator@w3.org from April 2013)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sat, 27 Apr 2013 23:12:16 +0300
To: Steven Turner <suibhne@cyberscotia.com>
CC: www-validator@w3.org
Message-ID: <517C3120.8050309@cs.tut.fi>

2013-04-26 10:36, Steven Turner wrote:

> In other words, lang="wlm" is indeed valid, and has been for nearly 4
> years now!

Yes, see my answer to a recent question on the same topic:
http://lists.w3.org/Archives/Public/www-validator/2013Apr/0075.html

However, I need to add that for XHTML serialization, XML rules apply, 
and XML 1.0 normatively refers to
“IETF BCP 47
     IETF (Internet Engineering Task Force). BCP 47, consisting of RFC 
4646: Tags for Identifying Languages, and RFC 4647: Matching of Language 
Tags, A. Phillips, M. Davis. 2006.”
which might be interpreted as referring to a specific version of BCP 47.

The point is that if specifications (or draft specifications) refer to a 
specific version of an external document, they are at risk of becoming 
obsolete when that version becomes obsolete. And on the other hand, by 
referring to generically to the latest BCP or RFC or spec or whatever of 
something, you are passing an open cheque and make the content of your 
spec depend on something external. So a document might conform to your 
spec this morning and fail to conform in the afternoon.

In this issue, there is the additional complexity that HTML and XHTML 
syntax might be interpreted differently. I honestly don’t know what a 
validator should do in a case like this.

> For example, the Validator
> doesn't seem to have a problem with the Irish analogue to my Welsh
> situation above - both Modern Irish (lang="ga") and Middle Irish
> (lang="mga") validate exactly as they should.  Whereas switching the
> lang attribute's value between Modern Cornish ("kw") and Middle Cornish
> ("cnx") gives the same results as with Welsh and Middle Welsh.

“mga” is defined in ISO 639-2, hence valid by the old version of BCP 47.

> As it currently stands, it's a rather irritating
> wee bug for historical researchers!

I would be very surprised in any web browser or general indexing robot 
or other software that regularly consumes HTML documents paid the least 
attention to attributes like lang="wlm". The few programs that actually 
make use of lang attributes recognize things like lang="en" and 
lang="fr", and maybe lang="sv" and even lang="en-US" if we’re lucky, but 
for most languages, there just isn’t any language-specific processing to 
be triggered. In this sense, the question is rather theoretical.

Yucca

Received on Saturday, 27 April 2013 20:12:41 UTC