- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 18 Mar 2010 10:27:55 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9263 Summary: Incorrect language determination algorithm Product: HTML WG Version: unspecified Platform: PC URL: http://dev.w3.org/html5/spec/Overview.html#the-lang-and- xml:lang-attributes OS/Version: All Status: NEW Severity: normal Priority: P2 Component: HTML5 spec bugs AssignedTo: dave.null@w3.org ReportedBy: xn--mlform-iua@xn--mlform-iua.no QAContact: public-html-bugzilla@w3.org CC: ian@hixie.ch, mike@w3.org, public-html@w3.org Section '3.2.3.3 The lang and xml:lang attributes' says: ]] Setting the attribute to the empty string indicates that the primary language is unknown. [BCP47] [[ General comment: Please look through the text in this textion and get rid of unclarities related to the use of the wordings "unknown" and "abscense of any language information" etc. Please specify what it means that the lang is unknown. Should the user agent accept that the lang is unknown? Or should it go looking for a language? Note that the last step of the language determination algorithm of the same section says: ]] In the absence of any language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown (the empty string). [[ Should a user agent consider an empty lang="" as "absence of any language information"? Or should it consider that it means that the language is "unknown"? The above sentence should say that the language is "unknown" also when the lang="" attribute is set to the empty string. The user agent should then abort the language detection algorithm and set the language of the node to "unknown". Proposal: I think that user agents, internally, should discern between an empty lang="" that sets the language to "unknown" and "no language information can be found". Comments in more detail, on the language determination algorithm: ]] To determine the language of a node, … [[ PROBLEM: What is the language of a node *before* the user agent starts looking for its language? Is it "uknown"? If it is "unknown", what should then happen when the user agent detects that the nearest lang="" attribute contains the empty string? Should it go looking for the next non-empty lang attribute and/or for a content-language header? Or should it stop looking? (Answer: It should stop looking.) Please make clear(er) what the User Agent should do when the the lang attribute contains the empty string. ]] If no explicit language is given for any ancestors of the node, including the root element, but there is a pragma-set default language set, then that is the language of the node. [[ Comment: If the @lang attribute is set to the empty string, does this then count as "no explicit language is given"? Or does it mean that a explicit "unknown language" has been set? (I suggest that it should be the latter.) ]] If there is no pragma-set default language, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language. In the absence of any language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown (the empty string). [[ Please make clear that the pragma-set language and/or the higher protocol MUST not be used as fallback language whenever the lang="" attirbute has been set to the empty string. (Currently, Firefox and Safari violate this.) I concretely suggest saying something like "then the language of the node is equal to unknown (equal to the empty string)" instead of the current "the language of the node is unknown (the empty string)" Test case to show that Mozilla and Webkit wrongly ignores a lang="" with the empty string, and instead go looking for the pragma and/or the http header: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/406 -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Thursday, 18 March 2010 10:27:57 UTC