- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sat, 27 Apr 2013 23:12:16 +0300
- To: Steven Turner <suibhne@cyberscotia.com>
- CC: www-validator@w3.org
2013-04-26 10:36, Steven Turner wrote: > In other words, lang="wlm" is indeed valid, and has been for nearly 4 > years now! Yes, see my answer to a recent question on the same topic: http://lists.w3.org/Archives/Public/www-validator/2013Apr/0075.html However, I need to add that for XHTML serialization, XML rules apply, and XML 1.0 normatively refers to “IETF BCP 47 IETF (Internet Engineering Task Force). BCP 47, consisting of RFC 4646: Tags for Identifying Languages, and RFC 4647: Matching of Language Tags, A. Phillips, M. Davis. 2006.” which might be interpreted as referring to a specific version of BCP 47. The point is that if specifications (or draft specifications) refer to a specific version of an external document, they are at risk of becoming obsolete when that version becomes obsolete. And on the other hand, by referring to generically to the latest BCP or RFC or spec or whatever of something, you are passing an open cheque and make the content of your spec depend on something external. So a document might conform to your spec this morning and fail to conform in the afternoon. In this issue, there is the additional complexity that HTML and XHTML syntax might be interpreted differently. I honestly don’t know what a validator should do in a case like this. > For example, the Validator > doesn't seem to have a problem with the Irish analogue to my Welsh > situation above - both Modern Irish (lang="ga") and Middle Irish > (lang="mga") validate exactly as they should. Whereas switching the > lang attribute's value between Modern Cornish ("kw") and Middle Cornish > ("cnx") gives the same results as with Welsh and Middle Welsh. “mga” is defined in ISO 639-2, hence valid by the old version of BCP 47. > As it currently stands, it's a rather irritating > wee bug for historical researchers! I would be very surprised in any web browser or general indexing robot or other software that regularly consumes HTML documents paid the least attention to attributes like lang="wlm". The few programs that actually make use of lang attributes recognize things like lang="en" and lang="fr", and maybe lang="sv" and even lang="en-US" if we’re lucky, but for most languages, there just isn’t any language-specific processing to be triggered. In this sense, the question is rather theoretical. Yucca
Received on Saturday, 27 April 2013 20:12:41 UTC