- From: Richard Ishida <ishida@w3.org>
- Date: Fri, 29 Aug 2014 11:39:32 +0100
- To: "Jens O. Meiert" <jens@meiert.com>, W3C WAI GL <w3c-wai-gl@w3.org>
Hello Jens, Here are some comments from me on your blog post to reply to the question "Hence, what data and evidence did I miss? And what else could, or should, we do?". You say: "Still, the W3C I18N Activity advises against using HTTP headers[1], at least alone: “Use language attributes rather than HTTP to declare the default language for text processing.” (There seem to be no strong reasons given, then, as the language declarations document referenced is rather neutral about HTTP headers.)" [1] http://www.w3.org/TR/i18n-html-tech-lang/#overall [2] http://www.w3.org/International/questions/qa-http-and-lang#http Actually the reasons are given if you follow the link to more detail from [1], and from there the link to http://www.w3.org/International/questions/qa-http-and-lang. One major issue is that the http information is not available if the page is saved to disk, read from a CD, processed by XSLT, AJAX, etc. So you really need to use @lang in anticipation of those situations - but if you're using @lang as a backup for those situations, then that certainly reduces efficiency to declare language on the server too. Another is that many authors can't access server settings easily, or at all. Another is that there is a potential for confusion because the Content-Language allows for multiple-language values (and the results of that are non-interoperable per browsers) - this is valid for metadata about a resource, but not helpful for language processing. And of course, Content-Language doesn't set more than the overall default language for a page - when you need to be more specific about language change you'll need to use @lang. Bear in mind, also, that @lang on the html tag overrides the Content-Language information for all browsers, so ignoring @lang when it's provided automatically by an editor, say, could undo the value of an HTTP declaration. Also, there are tools that can be deployed on content while it is being written, such as spellcheckers and grammar checkers - I write plenty of multilingual documents where an authoring tool that recognises language changes saves a lot of frustrating false positives, however the language information needs to be in the document rather than on the server for that to work. Similar situations apply when working with language-specific styling in an editor that provides a wysiwyg interface. Using @lang on the html or other elements, however, resolves all these issues, and in addition provides consistency between the way you mark up the default language of the page and language changes in document. "Next, and here it gets more interesting, it is completely unclear what tools actually use the information of inner-document language changes. Granted, this may be a knowledge gap on my end—being corrected is one reason why I write all of this down—, but from what I’ve seen so far, what I specifically understand some services like Google not to be doing, and even from my fading memories testing assistive tools, there’s not a great value in marking up changes in language." I and others on this list already sent answers this question. See http://lists.w3.org/Archives/Public/w3c-wai-gl/2014JulSep/0137.html and following emails in that thread. See also http://www.w3.org/International/questions/qa-lang-why "On my mind—and matching Google doctrine—, language detection should be automated. It should be a software responsibility." There may be places where auto-detection of language helps, but it becomes more and more difficult to do the smaller the sample available. There are also possibilities of ambiguity in phrases such as "The French for bread is pain.", which auto-detection is not at all likely to detect, but where you may still want as an author to prevent spell-check errors or incorrect voice browser renderings. It is also unlikely that you'll see ubiquitous deployment of those services for all the places HTML is used, and it's certainly not currently ubiquitously available, so stopping to use @lang now is rather premature. I think every page should specify a default language (in the html tag) if only because it future-proofs the page for new language-specific technologies which are currently on the horizon but which will soon be commonplace (many such are on the way with CSS3, others will come with language technology developments, etc...). On the other hand, identification of language changes should be based on the author's expection of usefulness. For example, will it help with spell-checking, with styling, with identification of fragments, with text-to-speech, etc. There's certainly no need to mark up words like 'status quo'. If you don't want to mark up every change, that may be ok, but note that I think it's far easier to mark up all significant language changes than to each time debate with yourself where an automated approach will or won't succeed. I have to say that I don't agree with your starting point that use of @lang poses significant problems of efficiency for content developers. I don't think it's problematic any more than a bunch of other attributes you would add as a matter of course to provide useful semantics in your markup. In fact, it's pretty easy to add, as markup goes. One last thought: I think it would help to separate the discussion around the ideas of default language for the page (html @lang or Content-Language), and markup of language changes, since different criteria apply on the whole*. Hope that helps, RI * It's a slightly less clear distinction in the case of a multilingual document that uses different languages for large sections of a document, such as a French Canadian page that puts English on the left and French on the right, but it holds generally. On 25/08/2014 16:42, Jens O. Meiert wrote: > For who’s is interested in the topic, I’ve presented my view again in > a different form and in more detail under > http://meiert.com/en/blog/20140825/html-and-language/. > > TL;DR: Compared to current practices, it seems Content-Language could > be preferred over @lang to denote document language, and—more > importantly—detecting changes in language should probably be made a > software responsibility. > > To avoid parallel list discussions (my bad) it may be more convenient > to collect counter-arguments on the post. >
Received on Friday, 29 August 2014 10:40:01 UTC