[Bug 14709] user agent lang tag handling is insufficiently specified

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709

--- Comment #32 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-11-11 05:48:52 UTC ---
(In reply to comment #31)
> > How OpenType behaves is up to OpenType.

> so the question here is what should
> tags that are not valid BCP47 tags map to?  A null mapping?  Or should
> invalid values simply be passed through?

OK: You don't care so much about invalid tags that happen to match OpenType
tags (because that would lead e.g. 'brm' to be mapped incorrectly) as you care
about whether invalid BCP47 tags that matches 3-letter ISO tags can be
interpreted as ISO tags, since OpenType already have a mapping for these.

> Given the language highlighted in comment 11, the spec can be read
> either way.  The lang attribute specifies a valid BCP47 tag so the
> mapping should be a null mapping.  But it's unknown so it should be
> passed through.

When the spec says 'an unknown language', then it isn't meant 'a null mapping'.
What the spec describes is an inability to know the meaning of an tag (and thus
being 'an unknown language' in that sense) because the tag is invalid. That is
something other than stating that the language of a text is unknown. In the
latter case, you could have used the 'und' subtag to tag the text as
'undetermined'. Whereas in the forme case - what the spec speaks about, it is a
situation where it is known that the language is known/classified - it is just
so that you  yourself don't know what it is known/classified as.

W.r.t. to the spec text, then I would go so far as to say that the word
'unknown' is unimportant. The text could have said "must be treated as a
language having the given tag", and it would have had the same meaning. And
then, after the comma, the spec text says: "distinct from all other languages",
which means that the (unknown) language which the invalid tag represents,
cannot be one of the 8000 languages listed in the 8000 languages in the
Language Subtag Registry.  Hence, there is no justificaiton for mapping it to
the 3-letter ISO codes either.

It is OpenType's duty to have language tag interpreter that interpret the tag
to have the same meaning as in HTML and XML. However, it is the author's
responsibility to use correct tags. 

Thus, if OpenType interprets e.g. 'ara' to mean the same as 'ar', then there
would be no sanction for doing so in HTML5. So, if OpenType did not map 'ara'
to anything, then that would be correct.

And I suppose that OpenType already knows what to do with unknown tags that are
thrown at it. Or that it knows what to do if you through a "und" at it - etc.

> No one commenting on this bug is arguing that validation should occur
> when matching CSS selectors.

Actually, what Glenn said in comment #24 (and I said in comment #23), would
also have impacted on CSS.

Question: Are you in doubt about what the spec says, yourself? 

I think that what Ian said confirms that for, for example lang='ara', then
'ara' should be passed on to OpenType, just as 'ar' should also be passed on to
OpenType as well. And then it would be up to OpenType to interpret 'ara' the
way that BCP47 requires it to be interpreted. 

It would be correct if OpenType treated 'ara' as unknown. But as long as it
doesn't cause authors to start using 'ara', then it doesn't matter much if
OpenType also understands 'ara'.

I think all the 3-letter codes in BCP47 stem from the ISO 639 registries. As
such - provided that the 3-letter code really is - and is meant to be - a
3-letter ISO 639 code, the risk of doing something very incompatible if
interpreting 'ara' as 'ar', should be pretty low.

The greatest risk I see is thus that authors start to use 3 letters tags -
because it works in OpenType. This, in turn, could lead to problems on *other*
areas than OpenType.  E.g. there is no guarantee that a screenreader understand
that code.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 11 November 2011 05:49:01 UTC