Re: [css3-text] Better wording than "known to be language X" (was line-break questions/comments

On Mon, Aug 27, 2012 at 10:16 PM, Koji Ishii <kojiishi@gluesoft.co.jp>wrote:

> +public-i18n-core@w3.org as the discussion is no longer only for CJK.
>
> >>>>> The phrases "known to be X [language]" are completely undefined as
> far as
> >>>>> the current text is concerned. If you want to have one note that
> covers all X,
> >>>>> then by all means do so, but don't just leave it in such an
> undefined state.
> >>>>
> >>>> Did you follow the link? I think it's well-defined in Terminology
> section. The
> >>>> section also has examples you requested.
> >>>
> >>> yes; my problem is the phrase "known to be Japanese or Chinese" does
> not map
> >>> to "if the content language contains 'ja' or 'zh' or equivalent as its
> primary language
> >>> subtag". same for the phrase "known to be Turkish" which also appears
> in another
> >>> context in this document
> >>
> >> I agree that your suggested wording is easier to understand for HTML
> authors, but
> >> it's not accurate because CSS does not define what the content document
> format is
> >> and how content document determines the language. CSS Selectors Level
> 3[1]
> >> informatively recommends content document to use BCP47, but it's still
> content
> >> document that defines language syntax of the content document.
> >>
> >> The wording in our Terminology section[2] looks almost the same as the
> one in CSS
> >> Selectors Level 3 to me; it defines our syntax, but does not define
> content
> >> document syntax. It's hard for me to find good wording to improve this
> without
> >> being incorrect.
> >>
> >> If you have suggested wording, I can run it by fantasai to put into the
> spec.
> >
> > I'm fine with the definition under the terminology section. I'm not fine
> with the
> > "known to be X [language]" phrases. In the case of "known to be
> Japanese", one
> > might expect a UA to interpret <p lang="en">この段落は日本語です</span> as
> > Japanese, since you and I "know" it to be Japanese regardless of the
> @lang attribute.
> >
> > I would like "known to be X" to be revised to tie it to @lang (or
> equivalent), and not
> > a textual/linguistic analysis of the text that determines the actual
> language of the
> > content.
>
> I'm afraid that if we say so, questions arise like, is HTTP
> Content-Language header "@lang or its equivalent"? IE falls back to
> Tools/Options setting if no language is specified in HTML, meta, nor in
> HTTP, of which initial value is set by system language. Is it included to
> "@lang or its equivalent"?
>
> I understand your motivation to make it easier to understand, and I agree
> it's good. But in my understanding, if we try that, we'll be less accurate.
> If this is an easy thing, i18n WG won't need to write up a long best
> practice notes[1].
>
> I'll ask i18n WG for any better wording suggestion. If you have good
> suggestion, that's appreciated too. If nobody can come up with better
> suggestion, I think we should conclude that the current wording is the best
> one. Does this sound reasonable?
>

The current language is unacceptable and misleading without further
clarification, as it implies textual/linguistic analysis. If the following
informative text were added in a new Section 1.4 "Conventions", then I
would be satisfied:

<quote>
A phrase of the form "known to be X" where X is a language name, e.g.,
"known to be Japanese", is intended to be determined using markup alone,
and does not imply a requirement to perform linguistic analysis (i.e.,
language recognition) of associated text content."
</quote>



>
> [1] http://www.w3.org/TR/i18n-html-tech-lang/
>
> Regards,
> Koji
>
>

Received on Tuesday, 28 August 2012 00:14:12 UTC