- From: <bugzilla@jessica.w3.org>
- Date: Wed, 09 Nov 2011 19:36:36 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14709 --- Comment #28 from Glenn Adams <glenn@skynav.com> 2011-11-09 19:36:31 UTC --- (In reply to comment #27) > Like Glenn said, there is a question what "null lang subtag" means: It could > not be equal to the empty string. Let's consider a spelling checker: how should > it behave in case it saw this: Presumably your reasoning for why it (null lang subtag) could not be equal to the empty string is based on the point that the empty string is not a valid BCP47 tag. Is this correct? Looking back at HTML4.0 [1], I see that lang was defined to be an RFC1766 Language-Tag [2], which, to be well formed, must consist of at least one character (in the Primary-tag) [3][4]. There is no discussion in HTML4.0 or RFC1766 about a default "unknown" or "undetermined" language. [1] http://www.w3.org/TR/1998/REC-html40-19980424/ [2] http://www.ietf.org/rfc/rfc1766.txt [3] http://www.w3.org/TR/1998/REC-html40-19980424/struct/dirlang.html#langcodes [4] http://www.w3.org/TR/1998/REC-html40-19980424/types.html#h-6.8 [5] http://www.w3.org/TR/1998/REC-html40-19980424/struct/dirlang.html#h-8.1.3 HTML4.0 also defines semantics for inheritance of language [6], wherein the language that applies to a parent element is inherited by its child elements unless the child specifies a language attribute. [6] http://www.w3.org/TR/1998/REC-html40-19980424/struct/dirlang.html#h-8.1.2 HTML4.0 does NOT specify a means for a child to block inheritance except by specifying a valid RFC1766 language in its lang attribute. That is, HTML4.0 does not define the use of the empty string (or any other value) as a way to reset the child's language to "unknown" or "undetermined" or "default". Notwithstanding the above, the language tag "i-default" was registered with IANA in March 1998 [7], making it a valid language tag that means 'default' language. This tag is also included in BCP47 as a valid grandfathered tag. [7] http://www.iana.org/assignments/lang-tags/i-default Curiously, 'i-default' is defined in terms of the recipient's language preferences, and not in terms of the language of the message being transmitted: "It is not a specific language, but rather identifies the condition where the language preferences of the user cannot be established." Furthermore, it is required that: "Messages in Default Language MUST be understandable by an English-speaking person..." In essence, 'i-default' is like a weak form of 'en'. My conclusion is that 'i-default' is NOT the same as stating that the language of the marked content is unknown or undetermined. So it should not be used for this purpose. XML 1.0 1998 [1st Edition] also defines xml:lang [8] in terms of RFC1766, and does not mention a default or unknown/undetermined language value, and does NOT specify the use of the empty string as a way of denoting a default or unknown language value. [8] http://www.w3.org/TR/1998/REC-xml-19980210#sec-lang-tag Subsequently, in XML 1.0 2004 [3rd Edition] [9], the use of RFC1766 is updated to the use of RFC3066 [10] AND the null / empty string is introduced as a legal value [11]: "The values of the attribute are language identifiers as defined by [IETF RFC 3066], Tags for the Identification of Languages, or its successor; in addition, the empty string may be specified." and "The intent declared with xml:lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml:lang on another element within that content. In particular, the empty value of xml:lang is used on an element B to override a specification of xml:lang on an enclosing element A, without specifying another language. Within B, it is considered that there is no language information available, just as if xml:lang had not been specified on B or any of its ancestors." [9] http://www.w3.org/TR/2004/REC-xml-20040204/ [10] http://www.ietf.org/rfc/rfc3066.txt [11] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag The last paragraph quoted above is expanded in XML 1.0 2006 [4th Edition] [12] to read as: "The language specified by xml:lang applies to the element where it is specified (including the values of its attributes), and to all elements in its content unless overridden with another instance of xml:lang. In particular, the empty value of xml:lang is used on an element B to override a specification of xml:lang on an enclosing element A, without specifying another language. Within B, it is considered that there is no language information available, just as if xml:lang had not been specified on B or any of its ancestors. Applications determine which of an element's attribute values and which parts of its character content, if any, are treated as language-dependent values described by xml:lang." [12] http://www.w3.org/TR/2006/REC-xml-20060816/#sec-lang-tag This language remains unchanged in the current XML 1.0 2008 [5th Edition] [13]. [13] http://www.w3.org/TR/REC-xml/#sec-lang-tag > One primary language subtags in the language subtag registry that means > something close to "null", is 'und' (Undtermined). So one option could perhaps > be to convert illegal primary language subtags to that subtag - 'und'? To be consistent with XML 1.0 3rd Edition and later, we need to use the empty (null) string to both (1) specify the absence of language information and (2) override inheritance of language information from the parent. For invalid language tags, I would now conclude that it should have the same treatment, i.e., be treated as if the empty string had been specified. Note that a language tag may be valid according to BCP47 but not listed in the IANA registry. This is due to the possible use of privateuse subtags. So given the above, I would now propose the language of HTML5 be changed as follows: In 3.2.3.3 In 1st paragraph, remove last sentence (this gets moved to 13 paragraph described below): "Setting the attribute to the empty string indicates that the primary language is unknown." In 11th paragraph, change "If the resulting value is not a recognized language tag, then it must be treated as an unknown language having the given language tag, distinct from all other languages. For the purposes of round-tripping or communicating with other services that expect language tags, user agents should pass unknown language tags through unmodified." to read as: "If the resulting value is non-empty and is not valid according to BCP47 ยง2.2.9, then it must be treated as if the empty string had been specified." Remove 12th paragraph starting with "Thus, for instance, an element with lang="xyzzy" ..." In 13th paragraph, change: "If the resulting value is the empty string, then it must be interpreted as meaning that the language of the node is explicitly unknown." to read: "If the resulting value is the empty string, then it must be interpreted as meaning no language information is available, just as if the lang attribute had not been specified on the element or any of its ancestors." -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Wednesday, 9 November 2011 19:36:40 UTC