[Bug 14709] user agent lang tag handling is insufficiently specified from bugzilla@jessica.w3.org on 2011-11-09 (public-html-bugzilla@w3.org from November 2011)

From: <bugzilla@jessica.w3.org>
Date: Wed, 09 Nov 2011 13:03:47 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RO7oh-0004CN-Be@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709

--- Comment #27 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-11-09 13:03:43 UTC ---
(In reply to comment #24)
> (In reply to comment #23)

> The 'mya' language subtag is a registered ISO 639-3 language subtag but
> there's an equivalent ISO 639-1 two-letter code, 'my', so the valid
> BCP47 form is 'my' and not 'mya'.

Or instead of deduction, just look up BCP47's Language Subtag registry.

>  I don't think "deprecated" really is
> a fitting description, it's simply invalid in the context of a BCP47 tag.

Yup. That's bascially was what I tried to say.

> I agree that invalid language subtags should not be mutated in the DOM
> but invalid BCP47 language subtags must be interpreted as being
> equivalent to null lang subtags.

Like Glenn said, there is a question what "null lang subtag" means: It could
not be equal to the empty string. Let's consider a spelling checker: how should
it behave in case it saw this:

<div lang="en">English <span lang="mya">Some other language</span></div>

My thought is that it should not spell the  <span> as if it was English.

One primary language subtags in the language subtag registry that means
something close to "null", is 'und' (Undtermined). So one option could perhaps
be to convert illegal primary language subtags to that subtag - 'und'?

If this would also happen in the DOM, then it could become a nice way to check
that one did not use any invalid primary language subtags.

But perhaps this would be against the intentions of 'und'? An alternative would
then be to register a primary language subtag for this purpose. But note that
'und' or this new tag could only be used when the primary language is invalid.

Another alternative could be to use the private subtag (the 'x') and transform
it to 'x-error'. I guess one could also do 'x-error-myab', so that it became
possible to differenciate the errors. 'x-error-myab' would be a legal tag, with
only an entirely private meaning.

If the error occured somewhere else than in the primary language subtag - e.g.
"en-UB", then one could transform it into "en-x-error-UB, which would be a
legal language tag for English.

However, because we would this way attribute meaning to the 'x-error-' string,
it would perhaps be best to use something other than the x-/-x-, like a special
extension for this purpose. Lets call it the -e- extension (e for error). Then
'mya' could be transformed to 'e-mya' and 'en-UB' could be transformed into
'en-e-UB'. Etc.

Does these things sound like something?

There are already one 'u' extension to BCP: http://tools.ietf.org/html/rfc6067
And an 't' extension is in the works:
http://unicode.org/repos/cldr/trunk/docs/rfc/draft-davis-t-langtag-ext.html

> For background, below is the bug discussion that led me to file this
> bug: https://bugzilla.mozilla.org/show_bug.cgi?id=631479#c92

Thanks.

> Basically Gecko has several backends for handling fonts, one for
> OpenType fonts and another for Graphite fonts under development, and the
> language tag format is different between these.  My feeling is that
> only valid BCP47 language subtags should be mapped or passed down to
> these font backends, invalid tags should treated as if the lang subtag
> was not specified.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Wednesday, 9 November 2011 13:03:54 UTC