- From: <bugzilla@jessica.w3.org>
- Date: Sun, 06 Nov 2011 19:52:38 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709
Summary: lang tag validation is insufficiently specified
Product: HTML WG
Version: unspecified
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: HTML5 spec (editor: Ian Hickson)
AssignedTo: ian@hixie.ch
ReportedBy: jdaggett@mozilla.com
QAContact: public-html-bugzilla@w3.org
CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
public-html@w3.org
In section "The lang and xml:lang attributes" describing the behavior
of language tags in HTML elements, there's wording that makes it difficult
to determine exactly if/when some form of language tag validation should occur.
The spec currently contains this wording:
If the resulting value is not a recognized language tag, then
it must be treated as an unknown language having the given
language tag, distinct from all other languages. For the
purposes of round-tripping or communicating with other services
that expect language tags, user agents should pass unknown
language tags through unmodified.
Thus, for instance, an element with lang="xyzzy" would be
matched by the selector :lang(xyzzy) (e.g. in CSS), but it
would not be matched by :lang(abcde), even though both are
equally invalid. Similarly, if a Web browser and screen reader
working in unison communicated about the language of the
element, the browser would tell the screen reader that the
language was "xyzzy", even if it knew it was invalid, just in
case the screen reader actually supported a language with that
tag after all.
To give a concrete example of where this leads to fuzzy interpretation
in implementations, consider the language tag 'mya', the ISO 639-3
language code for Burmese. There's a two-letter language tag from ISO
639-1 'my', so the valid BCP47 language tag is 'my'. So what's the exact
behavior for user agents that use API's that make use of language tag
information, for example OpenType API's that have use OpenType
language tags. Should the language tag be validated and a default used
if none exists? Or should 'mya' be passed through to these API's just
in case it might be a supported OpenType tag? The spec can be read
either way, especially given the example of a screen reader which
"actually supported a language with that tag after all".
I think the wording needs to be stronger than this, I think the spec
specifically needs to say that when the language is used, if it
doesn't match a BCP47 language tag (such as 'mya'), then the only
interpretation is that it's the equivalent of an unknown language when
passed along to an API. As is, the spec merely defines the
*expectation* that the language code is a BCP47 code but allows for an
entirely different language tag format to be used in it's place.
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Received on Sunday, 6 November 2011 19:54:44 UTC