- From: Internationalization Core Working Group Issue Tracker <sysbot+tracker@w3.org>
- Date: Wed, 23 Mar 2011 19:47:41 +0000
- To: public-i18n-core@w3.org
I18N-ISSUE-23: [Bug 12278] [polyglot] i18n: Make lang and xml:lang required on the root element [HTML5-mail] http://www.w3.org/International/track/issues/23 Raised by: Richard Ishida On product: HTML5-mail Bugzilla: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12278 Summary: [polyglot] i18n: Make lang and xml:lang required on the root element. Product: HTML WG Version: unspecified Platform: PC URL: http://www.w3.org/TR/2010/WD-html-polyglot-20100624/#a ttributes OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) AssignedTo: eliotgra@microsoft.com ReportedBy: xn--mlform-iua@xn--mlform-iua.no QAContact: public-html-bugzilla@w3.org CC: ishida@w3.org, mike@w3.org, public-html-wg-issue-tracking@w3.org, public-html@w3.org, xn--mlform-iua@xn--mlform-iua.no, public-i18n-core@w3.org, eliotgra@microsoft.com PROBLEM: XML and HTML differ w.r.t. whether the HTTP Content-Language: header MUST or MAY change the language of an element from 'unset' to a specific language. And for http-equiv="Content-Language", then HTML has clear rules, whereas XML is silent. These differences can cause the language to be set on the HTML side, while it remains unset on the XML side. HOW TO SOLVE: EITHER require authors to create polyglot markup that is immune against the possibility that the Content-Language value (from either http-equiv pragma or HTTP header) can change the language from 'unset' to some specific language in an assymmetric way (that is: only on the HTML side): Basically, make @xml:lang/lang required on the root element - at least in some situations. OR accept the differences and document, in the Polyglot Markup specification, how XML and HTML differ. PROBLEM IN DETAIL: A) http-equiv="Content-Language" HTML5 - MUST be used in absence of @lang: ]] If none of the node's ancestors, including the root element, have either attribute set, but there is a pragma-set default language set, then that is the language of the node. [[ http://dev.w3.org/html5/spec/elements#the-lang-and-xml:lang-attributes XML 1.0 - is silent w.r.t. http-equiv. However, some common XHTML user agents DO use http-equiv="content-language". While others don't. If considered as equal to http ... then it is correct to respect it. HTML5 do not consider it equal. Does it, in XML, depend on a DTD? B) HTML5 - higher protocols MUST be used as backup: ]] If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. [[ http://dev.w3.org/html5/spec/elements#the-lang-and-xml:lang-attributes XML 1.0 - external transport protocol MAY be used as backup (we must ASSUME that 'Content-Language' is what is meant): ]] Language information may also be provided by external transport protocols (e.g. HTTP or MIME). When available, this information may be used by XML applications, but the more local information provided by xml:lang should be considered to override it. [[ http://www.w3.org/TR/xml/#sec-lang-tag C) MULTIPLE Content-Language VALUES HTML5 specs that Content-Language (http or http-equiv) only affects the language when its value is a single language tag. There is no general clarafication of this when it comes to XML. SOLUTIONS ON THE TABLE - IN DETAIL: (1) Conditional: REQUIRE @xml:lang/@lang on root when there is a Content-Language (http-equiv pragma or HTTP header) whose value is exactly a single language tag. PRO: Polyglot Markup would follow the same rules as HTML5, except with a stricter conformance requirement. CON: Complexity. Such a rule is a complex for authors to administrate. For example, it would mean that if the HTTP server sends out a single Content-Language header without the author's awareness, then the document is assigned a language - which in turn only HTML user agents would be REQUIRED to detect. ISSUE-88: My Change Proposal for ISSUE-88 suggest that validators will pick up the HTTP Conent-Language header and warn whenever it causes the language to be set. (2) Always REQUIRE @xml:lang/@lang on the root element. PRO: Simple rule. CON: Less flexibillity. The fact that the language can be inherited from the higher protocol can also be an advantage. And also, for XML, if one combines several documents into a bigger one (for example by the use of XINCLUDE), then each <html> element of the new, combined document, might end up with the language explicitly defined. (In contrast, if the root element language was unset, then the <html> elements would inherit the language from the parent element in the new document.) CON: PERHAPS it could increase the tendency to use bogus language declarations. (Many templates comes with "en" as the default.) CON: PERHAPS it could increase the use of the empty string declaration, which is equal to explicitly declaring the language as unknown. <html xml:lang="" lang="" xmlns="*">. Is that bad? If so, why? And when? (3) Accept and document the differences: In absence of element level language declaration, then XML apps MAY and HTML uas MUST make use of Content-Language for setting the language. However, many (or most?) popular Web browsers that are also capable of handling XHTML *DO* seem to pick up the language from Content-Language too (from HTTP header and from http-equiv alike). PRO: Could triger vendors to align XHTML user agents with HTML5 CON: left out in the cold would be specialized non-Web parsers, such as XSLT, and other parsers that respect the MAY in the XML spec. (4) Forbidding HTTP Content-Language headers for polyglot markup: NOT A RELEVANT OPTION. (5) Forbidding http-equiv=Content-Language in polyglot markup: Possible. But only limits the problem. Doesn't remove it. Thus one must still choose between option (1), (2) or (3). PREFERENCE: My preference is option (2) because it is simplest and because it seems safest. CAN ISSUE-88 AFFECT THIS BUG? In short, yes. But ISSUE-88 is only about what syntax that is permitted inside http-equiv. It is not about how HTML user agents should *react* to Content-Language, whether coming from http-equiv or http.
Received on Wednesday, 23 March 2011 19:47:42 UTC