W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > November 2011

[Bug 14709] lang tag validation is insufficiently specified

From: <bugzilla@jessica.w3.org>
Date: Tue, 08 Nov 2011 20:35:14 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RNsO2-0005kU-RE@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709

--- Comment #23 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-11-08 20:35:12 UTC ---
(In reply to comment #21)
> To me, it seems like a bad idea to help legacy language tags proliferate.

'mya' (comment #0) is clearly invalid as a BCP47 language tag. However, AFAICS,
it would have to be BCP47 which defined what a "legacy language tag" would be.
But "legacy" is a word that does not occur in BCP47. 

BCP47 operates with 'Deprecated', and in the Language Subtag registry, there
appear to be 90 entries which have the 'Deprecated:' field. Since 'mya' does
not appear in the Language Subtag registry, it has no other status than
invalid.

It appears that Validator.nu treats most of the deprecated tags and subtags as
valid, with a warning.

> I
> think document conformance should require strict RFC 4646 validity and,

I agree that 'document conformance'/validation should require conformance to
BCP47 - nothing more or less. No one has, so far, suggested that e.g. 'mya'
should be seen as valid.

If we read John's comment #0, then it appears that this is more about UA
handling than about validation. I therefore suggest that John refines the
subject line of this bug. This is clearly not as much about validation as it is
about *handling* of invalid language tags.

> furthermore, OpenType values shouldn't leak to HTML. That is, I think we should
> require lang=my in HTML and leave it to OpenType implementations to map my to
> BRM. This way, the burden of dealing with legacy would be contained to
> implementations that deal with OpenType instead of burdening all kinds of
> implementations.

Agreed 100%

Note though, that as far is 'brm' is concerned, the answer is pretty simple:
'brm' is a registered language subtag for the Barambu language. So it would be
destructive to interpret it as Burmese.

But I think John's question is "What if 'foo' has not been registered, but
someone anyhow uses lang="foo" because API X supports 'foo' ?" That 'foo'
should be invalid is a given, as long as there is registered language subtag
'bar' that one can use instead and which has the same meaning. But the question
is: Should HTML5 *also* require that 'foo' does not work? I suppose the
motivation for such a thing would be to avoid vendor-specific coding.

BCP47 says that one must not use non-registered values, since what is
non-registered today, could become registered, in an incompatible way in the
future.

'''
   Users MUST NOT assign language tags that
   use subtags that do not appear in the registry
   [snip]
   Besides not being valid, the user also risks collision
   with a future possible assignment or registrations.
'''
http://tools.ietf.org/html/rfc5646#page-20

May be it would be enough to quote/reference that part from BCP47.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Tuesday, 8 November 2011 20:35:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 November 2011 20:35:20 GMT