[Bug 14709] user agent lang tag handling is insufficiently specified from bugzilla@jessica.w3.org on 2011-11-12 (public-html-bugzilla@w3.org from November 2011)

From: <bugzilla@jessica.w3.org>
Date: Sat, 12 Nov 2011 01:56:50 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RP2pu-0004uG-VV@jessica.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709

--- Comment #35 from John Daggett <jdaggett@mozilla.com> 2011-11-12 01:56:47 UTC ---
> Whatever spec defines the mapping is the spec that should say how the
> mapping is to work. It's not up to the HTML spec to define every
> mapping. 

What the mapping from BCP47 is to some other language tag scheme is not
the problem here (nor does HTML need to even consider it).  The problem
is that you seem to want to have it both ways, to have authors use valid
BCP47 language tags but at the same time pass through anything that
the author specifies which allows huge inconsistencies.

> For technologies that use BCP47, there's no mapping necessary; HTML
> requires that the values be passed through unmodified.

This is only true if internal API's support *only* BCP47 tags and not an
amalgamation of tag formats (e.g. BCP47 *and* ISO 639-3 tags).

> *For HTML's purposes*, invalid language tags are to be treated as
> unknown languages. For all other purposes, the language is passed
> through unmodified. I don't understand the difficulty here.

The practical problem with this statement is that it's not possible to
distinguish "unknown" languages from "known languages using a different
language tag format".  Passing through the contents of language tags
allows these two cases to be conflated and that will be a source of
author and implementor confusion as more and more language-specific
behaviors are added to user agents.

<p lang="my">BCP47 language subtag for Burmese</p>
<p lang="Burmese">Human readable language name</p>
<p lang="mya">ISO 639-3 three-letter tag for Burmese</p>
<p lang="BRM">OpenType language system tag for Burmese</p>

If user agents pass through all four of these language tags without
validating them as BCP47 language subtags, then OpenType API's will
recognize 'BRM' but hyphenation API's won't.  Inconsistency would also
be possible across users agents; if vendor X uses a hyphenation API that
matches OSX language tags but vendor Y uses a hyphenation API that
matches Windows language tags, then the rendering of content will vary
due to this purely internal inconsistency.

I think it would make a lot more sense if user agents simply treat
non-BCP47 language tags as "unknown" and interpret what "unknown" means
in the context of specific API's, rather than passing through the
unmodified tags.

> > No one commenting on this bug is arguing that validation should
> > occur when matching CSS selectors.
> 
> CSS is no different than OpenType or any other technology. If we say
> that you have to do validation for one, it follows that validation
> would apply to the other.

CSS is not interpreting what the meaning of a language tag is, it's
simply matching it against content that is labeled as such.  This is
completely different from inferring language-specific rules based on the
*meaning* of those tags.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Saturday, 12 November 2011 01:56:56 UTC