- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Sat, 8 May 2010 03:15:48 +0200
- To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Cc: "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, www-international@w3.org
Lachlan Hunt, Fri, 07 May 2010 11:34:48 +0200: > On 2010-05-05 20:22, Leif Halvard Silli wrote: >> Let multiple language tags continue to be legal. >> (http://www.w3.org/html/wg/wiki/ChangeProposals/ContentLanguages) .... > the i18n WG. Both proposals present similarly flawed arguments, > and so I will refute them together. The i18n WG does not pursue that proposal anymore. > http://www.w3.org/International/wiki/Htmlissue88 > > For this issue, we have 3 options presented:> > > 1. Make Content-Language non-conforming. > 2. Leave Content-Language as Obsolete but Conforming, > permitting only a single language tag. (Current spec) A fundamental flaw in both option 1 and option 2, is that they ride two horses: on one side, they seek to redefine the semantics of the Content-Language http-equiv - aligning it with the @lang attribute, while at the same time defining it as invalid. Do I really need to say more? > 3. Leave Content-Language as Obsolete but Conforming, > permitting a comma separated list of language tags. .... > This [the pragma] use case differs significantly from the primary use > case for the HTTP Content-Language header, which is to indicate the > languages of the intended audience for the document. Regardless: HTML5 defines same behavior w.r.t. fallback, whether Content-Language comes from http-equiv for from http. > For the server, this use case makes some sense because it can be used > for content negotiation based on language. [...] > However, this use case does not make any sense as in-document > metadata because once the user agent has the document, it's already > too late for such negotiation to occur. Since a HTML5 parser *will* affect the fallback language regardless where Content-Language comes from, it makes sense to allow Content-Language both inside a document and on the server, the same way that authors may set the encoding both on the server side as well as in the document itself. > The reality is that the in-document Content-Language directive only > shares its name with the HTTP header field, while, in practice, it's > functionality is closer to that of the lang attribute. The solution > chosen for addressing this issue must take this into account. That issue is taken into account, in all the 3 options, by HTML5's fallback language algorithm. The issue which the *proposals* seek to resolve, is the author conformance requirements and - ultimately - also the semantics. > Although this is unnecessarily duplicated functionality, the problem > being solved is that it is already used on a relatively large number > of legacy pages and its use in this way is harmless. None of the 3 proposals on the table consider this "harmless". > This means that > authors who are migrating existing pages to HTML5 do not have to be > too concerned about the presence of an innocuous element. This is > why the spec currently makes it obsolete, but conforming when a > single value is used. In a HTML5 parser, then multiple tags would literally be harmless, as it would not have any fallback language effect at all. > Summary from Leif: >> == Summary == >> * Multiple language tags (a comma separated list) in @http-equiv >> Content-Language continues to be legal. > > Summary from I18N WG: [ snipped, since they don't support it ] > Neither of these summaries describe any use case for why authors > would want to specify multiple languages in the meta element. Use cases are provided in the Rationale section - not in the Summary. > The only reason given simply states that it should be allowed because > HTML 4.01 and XHTML 1.0 permitted it. That in itself is not a valid > reason. Such a stamp inside the document might e.g. be used for checking that the correct page is served to the correct audience. Forget about the effect, focus on the semantics. Let us say that we could define the *semantics* of Content-Language freely. Then, of course, we could say that Content-Language has the same semantics as @lang. However, as long as the HTTP spec defines the semantics, we create less confusion by simply accepting the fact: that it is the HTTP spec's domain. For example, if you live in Norway and visit the Web site of a Japanese computer producer, then the Web site may direct you to their pages for their Norwegian customers. Which, however, still typically may be in English. If the Content-Language of one such page says "Content-Language: no", and lacks any @lang attribute, then the UA will think that the language of the page is Norwegian ... Which would not be true. > In fact, HTML4 did not say anything explicit about the use of > Content-Language in the meta element. [...] Incorrect. HTML4 says that http-equiv is governed by RFC2616: ]] The http-equiv attribute [...] Please see the HTTP specification ([RFC2616]) for details on valid HTTP headers. [[ http://www.w3.org/TR/html4/struct/global#h-7.4.4.2 ... > The spec does, however, imply some client side processing for the > Content-Language HTTP header [...] [5]: ... > * The HTTP "Content-Language" header (which may be configured in a > server). For example: > But note that it makes no mention of the meta http-equiv, [...] That reading is not without virtue. However, another reading is that the text merely makes the reader aware that Content-Language does not need to be defined in the document - it could be defined in the server. HTML4 does anyhow describe the order of significance, between HTTP and HTTP-EQUIV, when it comes to the Content-Style-Type header, so it is a absolutely not foreign to HTML4 that http-equiv affects the UA directly. See: http://www.w3.org/TR/html4/present/styles#default-style .... > We also have some observational evidence [6] that indicates that a > vast majority of authors only use a single value, If someone does use Content-Language according to the HTTP specification, then it would be correct to use "Content-Language: sv" for an English page aimed at a Swedish audience. And then, according to HTML5, regardless of whether this value is found inside HTTP or HTTP-EQUIV, it *will* be used as the language of that document. This is why the 'Let multiple language tags continue to be legal' proposal says that Content-Language should trigger a warning, every time the fallback language effect kicks in. > Summary continued from Leif: >> * Conformance checkers will emit a warning whenever – and only if – >> the fallback language algorithm kicks in. >> * The fallback warning will kick in regardless of whether the fallback >> comes from HTTP or Content-Language. > > Summary continued from I18N WG: [ snipped again, since they don't pursue it ...] > It's difficult to understand why you are arguing for multiple > languages to be considered conforming, while suggesting that the > defined implementation requirement is to ignore the value if multiple > languages are specified. Your rationale in this case is > self-defeating. a) The multilang proposal does not touch the HTML5 parsing b) The rationale is stated in the Rationale section. [ snipped again reference to I18N WG proposal ] > Rationale from Leif: >> == Rationale == >> The problems with the current specification are >> >> 1. That it prevents authors from legally using multiple values to >> replicate the language fallback effect of doing the same thing >> in a HTTP header. > > The element language fallback behaviour when taken from an HTTP > Content-Language header containing multiple langauges is to default > to unknown. This is not useful behaviour for authors to explicitly > choose by using multiple languages in the meta element. They get the > same result by omitting the Content-Language pragma from the document. HTTP does also define what the lack of a Content-Language header means: It means that the document is for any audience. So to not use Content-Language is not equivalent with using multiple language tags inside Content-Language. The way to get authors to use Content-Language and @lang according to their semantics, it to provide a warning whenever Content-Language's observable side effect kicks in. >> * That no language gets set, as HTML5 requires from multiple tags >> whether they occur in HTTP or in @http-equiv, is still an effect. The >> spec is therefore incorrect in claiming about the latter that “[for >> instance it only supports one language]”. > > Your claim here does not make sense. The HTTP Content-Language > header does allow multiple language tags, whereas the current HTML5 > spec only allows one. So that claim quoted from the spec is indeed > correct, as it currently stands. I use the word "support" not about conformance requirements, but about observable effects. If a web browsers is programmed to have one behavior when Content-Language contains a single tag, and another behavior when it contains multiple tags, then clearly both single and multiple language tags are supported. (Current status, is that only Gecko supports multiple language tags - although it supports it in a different way from what the HTML5 spec requires. The other UAs typically treat multiple tags as an (illegal) single language tag.) The biggest problem with *both* the current spec and the proposal to make Content-Langauge non-conforming, is that both of them consider that Content-Language is a way to define the language. Hence, both proposals suggests that the validator should say "Please use lang instead". And it also sends this warning *even when* then author has used @lang correctly. So it is just confusing. The 'multilang' proposal OTOH, puts the burden on the *validators* to analyze both HTTP header and HTTP-EQUIV, and compare it with the use of @lang: If the document has a lang attribute on the root element, then there is no reason to send any warning. The 'multilang' proposal, OTOH, will create a warning in validators in a smaller percent of existing pages with a Content-Language header or pragma. The proposal quite precisely discern between potentially harming *(side) effects* of http and http-equiv. >> 2. That it prevents @http-equiv from being used as a reference to what >> the HTTP Content-Language is/was meant to be. >> * Consider Firefox’ Page Info panel. > > Firefox's Page Info panel is not a compelling use case for this > information. It's just a diagnostic tool that outputs the specified > values. It is what it is. >> Consider some CMSes. > > CMSs use out of band information for determining the language of the > documents they send, if any. This is more likely to come from > configuration settings, rather than the meta element specified > somewhere in the HTML, like in a page template. CMSes would use Content-Language *not* for setting the language, but for setting the Content-Language. >> Consider simply authors themselves. > > What real benefit do authors themselves gain from using multiple > language values? If you declare the page encoding as a HTTP header, then the <meta charset="*"> is also of no use. However, as we know, pages are authored off-line, and then Content-Type and/or <meta charset="*"> matters. As an author, one may also consult the meta element to check what the encoding is. It is the same with Content-Language. It can be set on the server side - and it has many advantages to do so. But it may still be useful to also set it inside the document. The reason why the author does so, might be for the fallback effect. Or it might be for semantics, as defined in the HTTP spec. >> 3. That it underlines the confusion that may exist today, about the >> nature of @lang versus Content-Language, by requiring: >> * different syntax rules for features that are expected to be >> identical (HTTP and @http-equiv ) > > In reality, as mentioned above, The HTTP header field and meta > element pragma directive only share a common name, while sharing very > little functionality. And the little functionality that they do > share is as a secondary fallback for use in the absence of the lang > attribute. This is to exaggerate: They share semantics, the share the language fallback effect. Of course, the language negotiation effect is not shared. And neither can one use any of HTML5's global attributes on the server side, despite that you can use them on the META element ... So there are things that are naturally shared and things that are naturally different. By the way: content negotiation may involve more than negotiation between different languages. In theory, it could also involve e.g. negotiation between different page encodings. And thus, once again, I want to emphasize how Content-Type and Content-Language are similar: one can set them either inside the document or on the server side. The semantics are the same. But the effects may not be the same. >> * similar syntax rules for features that are different >> (http-equiv and lang) > > In practice, <meta http-equiv="Content-Language" content="en"> and > <html lang="en"> effectively share the same functionality. No. They don't. Not unless you make a private decision to use Content-Language that way. And so one can never take for granted that it has been used that way. To create a specification which both says that it does have the same semantics as @lang, and at the same time forbidding it, is a very confusing way to change its semantics ... >> * a warning message which asks authors to “use @lang instead” – as if >> they were juxtaposable alternatives. > > Use of the pragma directive is obsolete in HTML5. Using a warning to > tell authors to use the better alternative is a good thing. Incorrect. There is no warning about obsoleteness if one uses the Content-Type http-equiv. Otherwise, you failed to take the point: If a page uses "Content-Language: no-no", because the page is for a Norwegian audience even if the text is in English, then it would be incorrect to tell authors to use @lang instead. Tjat is: you cannot tell them to move the "no-no" tag to the lang attribute instead. And also, if the Content-Language contains multiple language tags, what do you say to them then? "Please use lang instad"? That is: move your multiple tags to @lang instead? And what is the logic of telling authors to use @lang instead of multiple languages inside Content-Langauge, when multiple languages inside Content-Language would have no language fallback effect? >> Conformance checking and warnings are in place, but should be about the >> correct things. >> >> 1. The current warning about using @lang instead of Content-Language >> should be changed into a warning which informs that a fallback language >> measure has kicked in, and recommend that authors create a language >> declaration (via @lang) rather than relying on the fallback feature. > > Looking at the cases where the fallback behaviour will or will kick > in, we find the following: > > Case 1: > <html lang="en"> > <head> > <title>Example> > <meta http-equiv="Content-Language" content="en"> > </head> > > The meta element here is completely useless. The default language > for every will be obtained from the lang attribute, either on the > html element or a nearer ancestor. Warning about it being useless > seems completely reasonable. Whether it is useless depends on the author/user/CMS/Firefox. The above says that the page is in English and for an English audience. Another page might be in English but for a Norwegian speaking audience. But if, for a moment, we accept that it is useless, then there are many other useless things the validator could warn about. E.g. it is also useless to put lang="en" on all of the child elements of <html>, since, as you explain, they inherit it from <html lang="en">-. Still, the validator does not warn about such use ... > Case 2: > <html> > <head> > <title>Example> > <meta http-equiv="Content-Language" content="en"> > </head> > > Regardless of the presence of any other lang attributes anywhere else > in the document, the lack of the lang attribute on the html element > means that the fallback behaviour will kick in to determine the > language from the meta element. I agree with that warning in this > case makes sense. Cool! :-) it is only in Case 2 that I want any warning. In case 1, a warning is useless. Also, in case 1, then it is likely that the author did understand the difference between Content-Langauge and lang. >> This warning should be shown regardless of whether the fallback comes >> from @http-equiv or from the higher level (HTTP). Justification: Since >> it is a fallback feature, and with other semantics, there is no >> guarantee that the author has used it for the language effect. > >> 2. To hold the syntax rules of HTTP (which permits multiple language >> tags) as the conforming ones (rather than those of @lang, which forbids >> multiple languages), will have the effect of underlining that @lang and >> Content-Language have different purposes. > > Again, use of the the Content-Langauge in the document has no other purpose. You are allowed to have that opinion. But this is still not the semantics of Content-Language. Thus, as explained, one cannot take for granted that <html lang="*"> can replace Content-Language. >> For instance, since the fallback algorithm doesn’t kick in whenever >> multiple languages are used in the pragma or on the server, there >> would not be any warning in these cases. > > I do not understand what you are trying to say here. The entire change proposal is strictly built around how Ian has defined that the pragma interacts with @lang. In that algorithm, if the pragma *or* the http header contains multiple languages, then no fallback language is defined. Thus, in an HTML5 parser, such Content-Language pragmas/headers will not affect the document in anyway. And thus, there is no reason to warn authors that a fallback language measure has kicked in, since, in these cases, no fallback language measure will kick in. This is an very important point: the multilang change proposal is not identical with the original proposal from the i18n wg. The multilang proposal says that there should be a warning when the *fallback* effect kicks in. What the correct lang attribute is, is up to the author to find out. Hopefully the author understands that he used Content-Langauge for the wrong reason, and removes it. No need to tell him that it is useless. He understands. >> == Details == >> Proposed spec changes, to section [4.2.5.3 Pragma directives]: >> >> Replace the following text >> ]] Conformance checkers will include a warning if this pragma is >> used. Authors are encouraged to use the @lang attribute instead.[HTTP] >> [[ >> >> with the following >> ]] The semantics of this pragma, as well as of the HTTP >> Content-Language header, are different from the semantics of the @lang >> attribute. [HTTP] Thus, there is no guarantee that the author >> consciously used either of them for setting the language. Therefore, >> conformance checkers will include a warning, whenever HTML5’s fallback >> language algorithm is activated, whether it is the higher protocol or >> this pragma that kicks in. Authors are informed about which language >> the document falls back to, and are encouraged to not rely on the >> fallback feature but to instead explicitly use the @lang attribute on >> the root element. [[ > > It's not clear exactly what you're referring to as the "fallback > language algorithm", and what it means for it to be activated. But I > assume you are referring to the requirement that states: Yes, that's about right. I probably should find another wording. > "If none of the node's ancestors, including the root element, have > either attribute set, but there is a pragma-set default language > set, then that is the language of the node. If there is no > pragma-set default language set, then language information from a > higher-level protocol (such as HTTP), if any, must be used as the > final fallback language instead. In the absence of any such language > information, and in cases where the higher-level protocol reports > multiple languages, the language of the node is unknown, and the > corresponding language tag is the empty string." > > This effectively defines the following order of preference for > obtaining the language information: > > 1. lang attribute on the element > 2. lang attribute on ancestor element > 3. pragma-set default language (<meta>) > 4. HTTP Content-Language header field if only one language is specified > 5. Unknown language, the corresponding language tag is the empty string. For step 3, you forgot to say "if only one language is specified". That is: same behavior as for HTTP. > Based on your above rationale, you seem to want the warning to apply > if #3 or #4 is used, even though that's in the middle of the > algorithm that you are referring to. Step 1 to 5 are optional steps. If there is a lang attribute on the element or on the parent, then there is no step 2, 3, 4 or 5. If there is no lang attribute, but there is a pragma-set default then step 3 is the end of the algorithm. If there is not pragma-set default (that is: the pragma is emtpy or contains multiple languages or is is simply lacking), then we jump to step 4, which is treated the same way as step 3. So I don't understand what you say bout "middle of the algorithm". All that a validator needs to do is to check if the root element contains a lang attribute. If it doesn't, then, if either step 3 or 4 results in a fallback language, then a warning should be shown. > It's not clear why you want > the warning if HTTP Content-Language is used with no lang attribute. ??? The Content-Language HTTP header cannot be empty. > And based on the way you phrased the proposed requirement, the > algorithm will have "kicked in" before it gets to #5, but it's > doesn't seem like you actually want a warning in that case. The algorithm is defined not by me but by HTML5, and it only kicks in/affects the document if the root element doesn't have a lang attribute. The only algorithm that *I* have tried to define, is the algorithm for when a validator should show a warning. See above. > We can conclude from this that your proposed replacement text would > be inappropriate for use in the spec, even if the group decides to > permit multiple language values (despite the lack of convincing > rationale for doing so). Fantasai has suggested some improvements, that I probably will incorporate. But the most important thing is to agree on the direction ... then we will find the words. >> Delete the following text: >> ]] This pragma is not exactly equivalent to the HTTP >> header, for instance it only supports one language. [[ > > As explained above, this note is entirely accurate. In practice, the > pragma directive in the meta element is in effect functionally the > same as the lang attribute, with little in common with the HTTP > header. Removing that note would not be useful. The semantics of Content-Language are exactly the same both in http-equiv and in http. The differences in effect are only side effects of the format: http-equiv being an HTML element and http being a http header. Thus we agree that "it is not exactly equivalent". However, not everything that is true, is useful to express. As for the specific thing that you focus on, the fallback language effect, then you are plain wrong - as told above, since 70 percent of browsers in use today do not treat multiple languages the way that HTMl5 say they should. Hence legacy user agent do not support multiple languages. While HTML5 user agents will support it. Please note that Gecko's treatment of multiple language tags and HTML5's treatment of multiple language tags are just two different way of "supporting" multiple language tags. >> == Impact == >> === Positive Effects === >> 1. More stable: same syntax as before continues to be permitted. > > In the face of the above evidence against your proposal, it's not > clear that that is a positive effect. I fail to see that you have provided evidence that I did not consider when writing that proposal. >> 2. More permissive: authors, CMS-es and browsers can continue to take >> advantage of @http-equiv ’s ability to reference what the HTTP header >> is/was supposed to be, including replicating its fallback effect. > > Given the practical effect of the directive, it has no relevance to > what the HTTP header is/was supposed to be. If what you say here was true, then what HTML5 specifies for parsers w.r.t. Content-Language, would be useless: When user agents implement the HTML5 fallback language behavior, then it *will* have a practical effect. Whereas today they will treat <meta http-equiv="Content-Langauge" content="en,ru" > as if the language of the document is a language whose language tag is the five letters "en,ru", they will in the future not define any language, but instead look at the HTTP header, and use the language from the HTTP header, in case it contains a single tag. Thus "no effect", is also an effect. We are looking forward to that day when multiple language tags inside Content-Language will not have any fallback language effect! Currently that is not case. >> 3. More correct: the difference between @lang and Content-Language is >> pointed out, while the link between @http-equiv and HTTP is emphasized. > > Wrong again, for reasons explained above. You cannot be of that opinion? Clearly, if http-equiv and http uses the same syntax, such as HTML4 evidently expects, despite your claims to the opposite, then of course the link between the two are underlined. You might be of the opinion that this link should *not* be underlined - however that doesn't affect the logics of my argument. >> 4. More useful: a warning that a fallback feature has kicked in, is >> more useful than a warning which focuses on one of the places where the >> fallback language could potentially kick in from. Why tell authors to >> “use @lang insetad” if the author has already made sure that the @lang >> attribute is in place? > > The warning from the validator could be phrased in any way the > implementer likes. If the lang attribute is detected, the validator > could simply state that the Content-Language is unnecessary. > Otherwise, the validator could advise to use the lang attribute > instead. But this an implementation decison and no spec change is > needed to attain the desired behaviour in this particular case. Feel free to suggest what you consider as improvements to the two other proposals - I am not the correct person to contact then ... Currently, however, the spec is both very clear and 100% in line with your emphasize on reinterpreting the semantics of Content-Language - it says: ]] Authors are encouraged to use the lang attribute instead. [[ The multilang proposal, however, does not speak about using lang instead of Content-Language, but about using lang instead of relying on the *fallback effect* of content-language. >> === Negative Effects === >> none > > Actually, there are negative effects with your change proposal: > > 1. Perpetuates the myth that the HTTP Content-Language header field > and the in-document pragma directive are equivalent, when they are > not. Absolutely untrue. The opposite is the effect: Each time the fallback effect kicks in, there will be warning. There is not such warning today. Thus it seems very far fetched to say that it perpetuate any myths. Both the other two proposals however creates a new myth - the myth that if you only remove the Content-Language pragma, then you are safe. The multilang proposal instead treats Content-Language the same way regardless of whether it comes from HTTP or from http-equiv. Thus validators will show a warning *every* time the fallback language kicks in. Instead of blindly forbidding the http-equiv version of the COntent-Language, without offering the author any help to understand what is going on. > 2. Fails to warn against the use of a useless and obsolete feature in > all cases. It so called "fails" is because one of the focuses of the proposal, is correct language declaration - just as much as correct content-language declaration. If the language declaration is not affected (that is: if there is no fallback language effect), then there should be no warning. (Validators should however also perform syntax checking - but that is partly another issue.) > 3. Your proposed replacement text is entirely inappropriate for use > in the spec, for the reasons explained above. If will help me to express my intentions better, then your welcome. ;-) > IMHO, this now eliminates option #3 from the list I gave at the top > of this post, and leaves us with a decision between 2 valid > alternatives: We probably disagree about how convincing your arguments were. > 1. Make Content-Language non-conforming. One problem with this option that I have not mentioned, is that it in many ways is identical with the status in HTML4: There will be no syntax checking of the content of the pragma. [...] -- leif halvard silli
Received on Saturday, 8 May 2010 01:16:32 UTC