[Bug 14709] lang tag validation is insufficiently specified

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709

--- Comment #18 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-11-08 02:12:30 UTC ---
One of the ways in which *use* of language tags are insufficiently specified in
HTML5, is related to the fact that the spec currenly only operates with the
term "language tag" , whereas BCP47 discerns between "subtags" and "tags",
where the former is the building blocks of the latter. The spec, when it
discusses the invalid language tags, also uses a very simple example wher the
entire tag is made up of a single, invalid subtag. Let's consider something
more complicated:

Example: The invalid language tag "en-UB".

In that example, the region subtag 'UB' is invalid/not-registered. It seems
like HTML5 says that the
entire language tag 'en-UB'  therefore "is not a recognized language tag" and
thus "MUST be
treated as an unknown language". This means, in turn, that there is no
requirement - according to HTML5 (as there is only a SHOULD) with regard to
passing through the tag.

Does that make sense? Is it in accordance with BCP47? Hardly.

After all, BCP47 represents a system where it is possible to combine registered
and unregistered subtags into language tags that are:

 a) invalid, but still makes some sense - e.g. "en-UB"
 b) valid but (http://tools.ietf.org/html/rfc5646#section-4.2)
    "unlikely to represent a useful combination of language attributes"

Thus, it seems that HTML5 should operate with a MUST w.r.t. passing through the
language tag, even if parts of the tag might be invalid. At least as long as
the first tag - the primary language subtag - is a valid one.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Tuesday, 8 November 2011 02:12:32 UTC