- From: Glenn Adams <glenn@skynav.com>
- Date: Tue, 28 Aug 2012 08:13:23 +0800
- To: Koji Ishii <kojiishi@gluesoft.co.jp>
- Cc: W3C Style <www-style@w3.org>, "public-i18n-cjk@w3.org" <public-i18n-cjk@w3.org>, "ML public-i18n-core (public-i18n-core@w3.org)" <public-i18n-core@w3.org>
- Message-ID: <CACQ=j+ffs34f85pa2i4SM1=GwZVMhvpqW6nyXwJdeJurKzV9Bg@mail.gmail.com>
On Mon, Aug 27, 2012 at 10:16 PM, Koji Ishii <kojiishi@gluesoft.co.jp>wrote: > +public-i18n-core@w3.org as the discussion is no longer only for CJK. > > >>>>> The phrases "known to be X [language]" are completely undefined as > far as > >>>>> the current text is concerned. If you want to have one note that > covers all X, > >>>>> then by all means do so, but don't just leave it in such an > undefined state. > >>>> > >>>> Did you follow the link? I think it's well-defined in Terminology > section. The > >>>> section also has examples you requested. > >>> > >>> yes; my problem is the phrase "known to be Japanese or Chinese" does > not map > >>> to "if the content language contains 'ja' or 'zh' or equivalent as its > primary language > >>> subtag". same for the phrase "known to be Turkish" which also appears > in another > >>> context in this document > >> > >> I agree that your suggested wording is easier to understand for HTML > authors, but > >> it's not accurate because CSS does not define what the content document > format is > >> and how content document determines the language. CSS Selectors Level > 3[1] > >> informatively recommends content document to use BCP47, but it's still > content > >> document that defines language syntax of the content document. > >> > >> The wording in our Terminology section[2] looks almost the same as the > one in CSS > >> Selectors Level 3 to me; it defines our syntax, but does not define > content > >> document syntax. It's hard for me to find good wording to improve this > without > >> being incorrect. > >> > >> If you have suggested wording, I can run it by fantasai to put into the > spec. > > > > I'm fine with the definition under the terminology section. I'm not fine > with the > > "known to be X [language]" phrases. In the case of "known to be > Japanese", one > > might expect a UA to interpret <p lang="en">この段落は日本語です</span> as > > Japanese, since you and I "know" it to be Japanese regardless of the > @lang attribute. > > > > I would like "known to be X" to be revised to tie it to @lang (or > equivalent), and not > > a textual/linguistic analysis of the text that determines the actual > language of the > > content. > > I'm afraid that if we say so, questions arise like, is HTTP > Content-Language header "@lang or its equivalent"? IE falls back to > Tools/Options setting if no language is specified in HTML, meta, nor in > HTTP, of which initial value is set by system language. Is it included to > "@lang or its equivalent"? > > I understand your motivation to make it easier to understand, and I agree > it's good. But in my understanding, if we try that, we'll be less accurate. > If this is an easy thing, i18n WG won't need to write up a long best > practice notes[1]. > > I'll ask i18n WG for any better wording suggestion. If you have good > suggestion, that's appreciated too. If nobody can come up with better > suggestion, I think we should conclude that the current wording is the best > one. Does this sound reasonable? > The current language is unacceptable and misleading without further clarification, as it implies textual/linguistic analysis. If the following informative text were added in a new Section 1.4 "Conventions", then I would be satisfied: <quote> A phrase of the form "known to be X" where X is a language name, e.g., "known to be Japanese", is intended to be determined using markup alone, and does not imply a requirement to perform linguistic analysis (i.e., language recognition) of associated text content." </quote> > > [1] http://www.w3.org/TR/i18n-html-tech-lang/ > > Regards, > Koji > >
Received on Tuesday, 28 August 2012 00:14:12 UTC