W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > November 2011

[Bug 14709] user agent lang tag handling is insufficiently specified

From: <bugzilla@jessica.w3.org>
Date: Sat, 12 Nov 2011 03:03:21 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RP3sH-0007ad-39@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709

--- Comment #36 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-11-12 03:03:20 UTC ---
(In reply to comment #35)

> This is only true if internal API's support *only* BCP47 tags and not an
> amalgamation of tag formats (e.g. BCP47 *and* ISO 639-3 tags).

Let's talk numbers: You want a specialcasing in HTML5, of the ISO 639-3 tags.
(I fail to see that other BCP47 erronous tags does matter to OpenType.) Thus,
we are talking about 184 '2-letter' primary language subtags, for which there
is a 3-letter double. (There are 8000 other primary language subtag in BCP47.)
Most, if not close to all, of the other 3-letter subtags are part of BCP47.'

> > *For HTML's purposes*, invalid language tags are to be treated as
> > unknown languages. For all other purposes, the language is passed
> > through unmodified. I don't understand the difficulty here.


> <p lang="my">BCP47 language subtag for Burmese</p>
> <p lang="Burmese">Human readable language name</p>

lang='Burmese' can't be a problem to OpenType.

> <p lang="mya">ISO 639-3 three-letter tag for Burmese</p>
> <p lang="BRM">OpenType language system tag for Burmese</p>

lang="BRM" can't possible be a problem. OpenType historically expects 3-letter
codes, which it internally converts to its own codes. 'BRM' is a BCP47 code not
for Burmese but for abother language.

> If user agents pass through all four of these language tags without
> validating them as BCP47 language subtags, then OpenType API's will
> recognize 'BRM' but hyphenation API's won't.

This is not true. If OpenType Api recognizes 'BRM', then you got to fix the
OpenType API since, as told 'brm' does not mean 'Burmese' in BCP47. Please
don't use that exampel anymore.

>  Inconsistency would also
> be possible across users agents; if vendor X uses a hyphenation API that
> matches OSX language tags but vendor Y uses a hyphenation API that
> matches Windows language tags, then the rendering of content will vary
> due to this purely internal inconsistency.

I don't have experience with OpenType on Web pages. But can you point to a Web
browser on Mac OS X which allows you to use e.g. lang="Burmese" and get any
effect from it? 

> I think it would make a lot more sense if user agents simply treat
> non-BCP47 language tags as "unknown" and interpret what "unknown" means
> in the context of specific API's, rather than passing through the
> unmodified tags.

Or may be we - eventually - should just focus on the closed list of roughly 200
languages for wich it is a matter. 

> > > No one commenting on this bug is arguing that validation should
> > > occur when matching CSS selectors.
> > 
> > CSS is no different than OpenType or any other technology. If we say
> > that you have to do validation for one, it follows that validation
> > would apply to the other.
> 
> CSS is not interpreting what the meaning of a language tag is, it's
> simply matching it against content that is labeled as such.  This is
> completely different from inferring language-specific rules based on the
> *meaning* of those tags.

Actually, via CSS you can add hyphenation (e.g. in Prince XML). And thus, it is
entirely possible to do <p lang="leif"> and p:lang(leif){/*burmese hyphenation
*/} So the author has teh possibility of mapping as he/she likes.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Saturday, 12 November 2011 03:03:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 12 November 2011 03:03:23 GMT