- From: Dan Chiba <dan.chiba@oracle.com>
- Date: Thu, 19 Jun 2008 15:23:17 -0700
- To: Felix Sasaki <fsasaki@w3.org>
- CC: "Phillips, Addison" <addison@amazon.com>, "www-international@w3.org" <www-international@w3.org>
Felix Sasaki wrote:
>
> Dan Chiba さんは書きました:
>>
>> Phillips, Addison wrote:
>>>>> On the other hand, does it make sense to advertise that a Web
>>>>>
>>>> service supports a locale that it has no messages for? If the
>>>> service normally has no user interface ("formatDate", "addInts",
>>>> "sortStrings"), then the list of available locales might very well
>>>> match the complete set available in the API. At the other end of
>>>> the spectrum are AJAX interactions that build the UI in real time.
>>>> Then only the messages that you actually have available are useful
>>>> to advertise.
>>>> I think it makes sense to advertise the set of supported locales.
>>>>
>>>
>>> That would tend to be the point of this work: we provide a way to
>>> say any of the following:
>>>
>>> - this service is locale-neutral; you may specify a locale, but it
>>> doesn't do anything to the service
>>> - this service has a specific default locale that it uses ("it is
>>> always in German"); they user can specify whatever they want, but
>>> the service always uses this one
>>> - this service has some specific (and specified) list of available
>>> locales (and by inference some default); the user may specify the
>>> locale to use and the service will do its best to match it from the
>>> specified list
>>> - this service is locale sensitive; the user may specify the locale
>>> to use and the service will do its best to match it, noting that a
>>> list is not provided
>>>
>> I think it is very desirable to provide a way to discover supported
>> locales as well. Then it would be possible for the service consumer
>> to specify the desired locale, knowing the locale will be used for
>> the service operation. Generally, the locale should be determined
>> based on the policy
>
> the WS Policy framework does not provide a policy negotiation
> mechanism. I would be very reluctant to spend time on developing such
> a mechanism. Although I understand your desire, I don't think that we
> should spend time on this (see my remark on timing below).
>
I agree, let's not cover this point at this time.
Regards,
-Dan
>
>> defined by the consumer, not by the provider. Otherwise the resulting
>> behavior would become unpredictable; likely to result in user
>> experiences with mixed languages.
>>>> It may
>>>> be the list of available translation languages, formatting locales,
>>>> those locales for which linguistic sorting behavior is supported,
>>>> or something alike.
>>>>
>>>
>>> Yes, and we need to support the service implementer making the
>>> decision about which pattern to advertise and/or use. You and I
>>> might choose entirely different criteria for choosing how we
>>> advertise locale support for a given service.
>>>
>>>
>>>> Because a service cannot determine the appropriate
>>>> locale for the locale sensitive service operation, it needs to be
>>>> made
>>>> possible for the service consumer to discover what locale is
>>>> supported,
>>>> in order for the application to produce the desired UI behavior.
>>>>
>>>
>>> I agree, with a nit:
>>>
>>> - sometimes it doesn't make sense to list everything that is
>>> available. Sometimes it is better (consumes less bandwidth,
>>> processing, etc.) to say: "I'll do my best to match your request".
>>> This can even make sense when the list is quite short.
>>>
>> I agree it is sometimes unnecessary for consumers to know what
>> locales are supported. In other cases, as mentioned, people may
>> dislike mixed languages on UI and an application needs to control the
>> locales in which the service operates. Suppose an application UI had
>> three sections each presenting text information from different
>> services, the user experience may be better if their language is the
>> same. If the information is dated, the date format would be expected
>> to be consistent.
>>>>> My concern here is that many services fall into a sort of middle
>>>>>
>>>> category: they can service many locales, but only have a limited
>>>> set of localizations. Messages from the services are necessarily
>>>> constrained to the smaller set, while the service might actually be
>>>> useful for a larger set.
>>>> Yes that is why we think the translation locale and the locale for
>>>> other purposes should be identified separately.
>>>>
>>>
>>> I understand that's your intent. However I think this will confuse
>>> the vast preponderance of developers who have only a very rough idea
>>> what a "locale" or a "language" is. There are also different ways
>>> that services can be provisioned. It may not be possible to
>>> enumerate one list or the other easily. Having two things that do
>>> roughly the same thing doesn't seem that useful to me. How often do
>>> you actually set LC_MESSAGES separately from LC_ALL?
>>>
>> Whenever a user's preferred locale is supported but preferred
>> language is not, he or she would set LC_MESSAGES explicitly. This is
>> often needed because the set of supported translation languages is
>> usually small. I wonder if you also mean LC_MESSAGES is confusing or
>> not needed because it won't be set often and does not seem so useful.
>>
>> Because the sets of supported languages and locales are usually
>> different, they are practically different. I do agree they are
>> conceptually roughly the same. However, in reality, service consumers
>> are usually interested in serving the user in their most preferred
>> available language and locale, but this is hard to achieve without
>> specifying the locale and language separately. Both of users'
>> preferred locale and language should be honored, but too often
>> language resources are not available and an alternative language
>> different from the language deduced from the preferred locale ought
>> to be used instead. This alternative language needs to be identified
>> and this is why #3 language (and LC_MESSAGES) is needed.
>>>>> My tendency is still to think that this is "locale" and not
>>>>>
>>>> "language". It looks like a bug to get a message like: "There were
>>>> « 1 234 » entries sorted on 14 juin." Where the locale was clearly
>>>> one thing and the messages in another language.
>>>> Having both #1 locale and #3 language does not mean that would
>>>> produce
>>>> the odd message. If using the same locale for the message
>>>> formatting is
>>>> a requirement, the component can use #3 language alone to make the
>>>> message locale consistent.
>>>>
>>>
>>>
>>> But this is inconsistent with the design of WS-I18N, where "locale"
>>> is the "big knob". I tend to think that relatively few people would
>>> know how to write an application like this.
>>>
>> WS-I18N needs a little knob to deal with the fact that translation
>> resources are missing in many use cases. Usually "locale" identifies
>> the user's preferred locale, which is usually supported. "language"
>> may be deduced from "locale", however, support for the preferred
>> language is often not available, so the alternative language must be
>> identified.
>>> A better solution might be: if we provide a list of available
>>> locales, we can provide an additional attribute to indicate which
>>> ones have been provisioned with messages. For example:
>>>
>>> <i18n:locale>
>>> <i18n:option default="true" localized="true">en-GB</i18n:option>
>>> <i18n:option localized="true">de</i18n:option>
>>> <i18n:option>fr</i18n:option>
>>> </i18n:locale>
>>>
>>> Here the default locale is "en-GB". German ("de") is also available,
>>> with localizations, as is French ("fr"), sans localization.
>>>
>>> A request could come in as something like:
>>>
>>> <i18n:locale>en-US,de-CH-1994,fr</i18n:locale> <!-- in this case, it
>>> matches "de" -->
>>>
>>> Or perhaps:
>>>
>>> <i18n:locale>en,zh-yue,ja-JP</i18n:locale> <!-- in this case you get
>>> en-GB as the default -->
>>>
>>> And finally:
>>>
>>> <i18n:locale>fr-FR</i18n:locale> <!-- you get French locale
>>> behavior, but probably en-GB messages; no "fr" is available -->
>>>
>>>
>>>
>>>>> What is missing in the current version is that we don't provide:
>>>>>
>>>>> - a way to enumerate the available items
>>>>> - a way to specify the complete set of preferences
>>>>> - a reference to RFC 4647 Lookup (that is, locale-based resource
>>>>>
>>>> negotiation)
>>>> I agree. Again my understanding is that these are to be
>>>> provided as
>>>> a separate document or a future revision.
>>>>
>>>
>>> Note that WS-I18N in its current incarnation is exactly the second
>>> draft. W3C's first version (2005-09-14) was taken from a trial
>>> balloon I wrote. At that time there was no Lookup algorithm, no LTLI
>>> (okay, there still isn't an LTLI, but that's something to fix), not
>>> much in the way of LDML, and RFC 4646 was still an Internet-Draft
>>> (with several to follow). With these items available to us, we
>>> should do the work to get WS-I18N right (it's actually a fairly
>>> minor set of revisions required, IMO).
>>>
>> I thought that may be months of work. If a comprehensive solution
>> could be included in the next version, that would be great.
>
>
> I don't see a comprehensive solution yet, although there seems to be
> some rough consensus coming up in this thread. So it's hard to make
> time planning at the moment.
>
> Felix
>
>>>
>>>>> I don't say that Unicode is forced upon people (although using
>>>>>
>>>> SOAP is mighty close to forcing UTF-8). What I'm saying is that, as
>>>> a parameter, it usually doesn't make a lot of sense. The data often
>>>> has to be transcoded for the benefit of (for example) the XML
>>>> processor anyway. The fact that data exists as some legacy encoding
>>>> affects the results or operation of the service itself (you still
>>>> can't store Japanese character data in a WE8ISO8859P1 database even
>>>> if the Web service layer permits you to send it some). But it's not
>>>> necessarily something that one can usefully specify at the service
>>>> layer.
>>>>
>>>>> Anyway, I don't want to sound completely absolutist here. I know
>>>>>
>>>> what kinds of cases you're thinking of and think they have merit.
>>>> I do agree character set is generally not so useful as other
>>>> elements
>>>> and not encouraged to use. I just think a character set is
>>>> considered as
>>>> one of the elements of a locale and some people may find it useful
>>>> if WS-I18N defines how to indicate it.
>>>>
>>>
>>> Character set is considered one of the elements of *some* locale
>>> systems. The question is: what does this parameter do or mean? If I
>>> have a <i18n:charset>ISO8859_1</i18n:charset> in my service's
>>> WS-Policy, does that mean I should transcode my SOAP request to
>>> Latin-1? Am I limited to Latin-1 characters in my request? Will I
>>> only receive Latin-1 characters in the response? The charset
>>> limitation may occur on several different levels of the system or it
>>> may simply be an assertion about the data.
>>>
>>> Since most developers wouldn't know what an encoding was if it grew
>>> legs and bit them, that makes me wary. If nothing else, we need to
>>> put a big Health Warning sticker on it :-).
>>>
>>>
>>>> If a requester is only interested in getting responses in a
>>>> specific
>>>> native character set (e.g. the response will be processed in a
>>>> component
>>>> that can only process a native encoding, or it will be stored in a
>>>> database that can only store a specific native character set), the
>>>> service could filter the response based on this information.
>>>>
>>>>
>>>
>>> Degrading the data early is usually a bad option :-). Converting the
>>> data from the UTF-8 used in the transport layer to the local
>>> encoding is usually en effective enough filter---and YOUR code did
>>> it, not my beautiful, pristine service <g>. This, for example, is
>>> true when you find out that "ISO 8859-1" sometimes means
>>> "windows-1252"... but sometimes it doesn't.
>>>
>>> Anyway, I digress. We can probably find a way to accommodate
>>> 'charset'. All I'm saying is: how we do it is important.
>>>
>> All right, I think how i18n:charset can be useful needs to be
>> examined and I am not sure whether it is truly valuable. Due to this
>> lack of support, users who need to deal with a native encoding may
>> decide to migrate to Unicode and that can be a good thing. :-) Please
>> let me withdraw the idea of <charset> element so we can better focus
>> on the other items.
>>
>> Regards,
>> -Dan
>>> Addison
>>>
>>
>>
>>
>
>
Received on Thursday, 19 June 2008 22:23:52 UTC