[Bug 9263] New: Incorrect language determination algorithm

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9263

           Summary: Incorrect language determination algorithm
           Product: HTML WG
           Version: unspecified
          Platform: PC
               URL: http://dev.w3.org/html5/spec/Overview.html#the-lang-and-
                    xml:lang-attributes
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec bugs
        AssignedTo: dave.null@w3.org
        ReportedBy: xn--mlform-iua@xn--mlform-iua.no
         QAContact: public-html-bugzilla@w3.org
                CC: ian@hixie.ch, mike@w3.org, public-html@w3.org


Section '3.2.3.3 The lang and xml:lang attributes' says:

]]
Setting the attribute to the empty string indicates that the primary language
is unknown. [BCP47]
[[

General comment: Please look through the text in this textion and get rid of
unclarities related to the use of the wordings "unknown" and "abscense of any
language information" etc.

Please specify what it means that the lang is unknown. Should the user agent
accept that the lang is unknown? Or should it go looking for a language? Note
that the last step of the language determination algorithm of the same section
says:

]]
 In the absence of any language information, and in cases where the
higher-level protocol reports multiple languages, the language of the node is
unknown (the empty string).
[[

Should a user agent consider an empty lang="" as "absence of any language
information"? Or should it consider that it means that the language is
"unknown"? The above sentence should say that the language is "unknown" also
when the lang="" attribute is set to the empty string. The user agent should
then abort the language detection algorithm and set the language of the node to
"unknown".

Proposal: I think that user agents, internally,  should discern between an
empty lang="" that sets the language to "unknown" and "no language information
can be found".


Comments in more detail, on the language determination algorithm:

]] To determine the language of a node, … [[

PROBLEM: What is the language of a node *before* the user agent starts looking
for its language? Is it "uknown"? If it is "unknown", what should then happen
when the user agent detects that the nearest  lang="" attribute contains the
empty string? Should it go looking for the next non-empty lang attribute and/or
for a content-language header? Or should it stop looking? (Answer: It should
stop looking.)

Please make clear(er) what the User Agent should do when the the lang attribute
contains the empty string.

]] 
If no explicit language is given for any ancestors of the node, including the
root element, but there is a pragma-set default language set, then that is the
language of the node.
[[

Comment:  If the @lang attribute is set to the empty string, does this then
count as "no explicit language is given"? Or does it mean that a explicit
"unknown language" has been set? (I suggest that it should be the latter.)

]]
If there is no pragma-set default language, then language information from a
higher-level protocol (such as HTTP), if any, must be used as the final
fallback language. In the absence of any language information, and in cases
where the higher-level protocol reports multiple languages, the language of the
node is unknown (the empty string).
[[

Please make clear that the pragma-set language and/or the higher protocol MUST
not be used as fallback language whenever the lang="" attirbute has been set to
the empty string. (Currently, Firefox and Safari violate this.) 

I concretely suggest saying something like "then the language of the node is
equal to unknown (equal to the empty string)" instead of the current "the
language of the node is unknown (the empty string)"

Test case to show that Mozilla and Webkit wrongly ignores a lang="" with the
empty string, and instead go looking for the pragma and/or the http header:

 http://software.hixie.ch/utilities/js/live-dom-viewer/saved/406


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 18 March 2010 10:27:57 UTC