W3C home > Mailing lists > Public > public-i18n-cjk@w3.org > July to September 2012

[css3-text] Better wording than "known to be language X" (was line-break questions/comments

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Mon, 27 Aug 2012 10:16:47 -0400
To: Glenn Adams <glenn@skynav.com>
CC: W3C Style <www-style@w3.org>, "public-i18n-cjk@w3.org" <public-i18n-cjk@w3.org>, "ML public-i18n-core (public-i18n-core@w3.org)" <public-i18n-core@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0D5E63C286@MAILR001.mail.lan>
+public-i18n-core@w3.org as the discussion is no longer only for CJK.

>>>>> The phrases "known to be X [language]" are completely undefined as far as
>>>>> the current text is concerned. If you want to have one note that covers all X,
>>>>> then by all means do so, but don't just leave it in such an undefined state.
>>>>
>>>> Did you follow the link? I think it's well-defined in Terminology section. The
>>>> section also has examples you requested.
>>>
>>> yes; my problem is the phrase "known to be Japanese or Chinese" does not map
>>> to "if the content language contains 'ja' or 'zh' or equivalent as its primary language
>>> subtag". same for the phrase "known to be Turkish" which also appears in another
>>> context in this document
>>
>> I agree that your suggested wording is easier to understand for HTML authors, but
>> it's not accurate because CSS does not define what the content document format is
>> and how content document determines the language. CSS Selectors Level 3[1]
>> informatively recommends content document to use BCP47, but it's still content
>> document that defines language syntax of the content document.
>>
>> The wording in our Terminology section[2] looks almost the same as the one in CSS
>> Selectors Level 3 to me; it defines our syntax, but does not define content
>> document syntax. It's hard for me to find good wording to improve this without
>> being incorrect.
>>
>> If you have suggested wording, I can run it by fantasai to put into the spec.
>
> I'm fine with the definition under the terminology section. I'm not fine with the
> "known to be X [language]" phrases. In the case of "known to be Japanese", one
> might expect a UA to interpret <p lang="en">この段落は日本語です</span> as
> Japanese, since you and I "know" it to be Japanese regardless of the @lang attribute.
>
> I would like "known to be X" to be revised to tie it to @lang (or equivalent), and not
> a textual/linguistic analysis of the text that determines the actual language of the
> content.

I'm afraid that if we say so, questions arise like, is HTTP Content-Language header "@lang or its equivalent"? IE falls back to Tools/Options setting if no language is specified in HTML, meta, nor in HTTP, of which initial value is set by system language. Is it included to "@lang or its equivalent"?

I understand your motivation to make it easier to understand, and I agree it's good. But in my understanding, if we try that, we'll be less accurate. If this is an easy thing, i18n WG won't need to write up a long best practice notes[1].

I'll ask i18n WG for any better wording suggestion. If you have good suggestion, that's appreciated too. If nobody can come up with better suggestion, I think we should conclude that the current wording is the best one. Does this sound reasonable?

[1] http://www.w3.org/TR/i18n-html-tech-lang/


Regards,
Koji

Received on Monday, 27 August 2012 14:17:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 27 August 2012 14:17:46 GMT