Re: [Comment on WS-I18N WD] from Felix Sasaki on 2008-06-18 (www-international@w3.org from April to June 2008)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 18 Jun 2008 10:03:15 +0900
To: Dan Chiba <dan.chiba@oracle.com>
CC: "Phillips, Addison" <addison@amazon.com>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <48585ED3.4050306@w3.org>
Dan Chiba さんは書きました:
>
> Phillips, Addison wrote:
>>>> On the other hand, does it make sense to advertise that a Web
>>>>       
>>> service supports a locale that it has no messages for? If the
>>> service normally has no user interface ("formatDate", "addInts",
>>> "sortStrings"), then the list of available locales might very well
>>> match the complete set available in the API. At the other end of
>>> the spectrum are AJAX interactions that build the UI in real time.
>>> Then only the messages that you actually have available are useful
>>> to advertise.
>>>     I think it makes sense to advertise the set of supported locales.
>>>     
>>
>> That would tend to be the point of this work: we provide a way to say 
>> any of the following:
>>
>> - this service is locale-neutral; you may specify a locale, but it 
>> doesn't do anything to the service
>> - this service has a specific default locale that it uses ("it is 
>> always in German"); they user can specify whatever they want, but the 
>> service always uses this one
>> - this service has some specific (and specified) list of available 
>> locales (and by inference some default); the user may specify the 
>> locale to use and the service will do its best to match it from the 
>> specified list
>> - this service is locale sensitive; the user may specify the locale 
>> to use and the service will do its best to match it, noting that a 
>> list is not provided
>>   
> I think it is very desirable to provide a way to discover supported 
> locales as well. Then it would be possible for the service consumer to 
> specify the desired locale, knowing the locale will be used for the 
> service operation. Generally, the locale should be determined based on 
> the policy

the WS Policy framework does not provide a policy negotiation mechanism. 
I would be very reluctant to spend time on developing such a mechanism. 
Although I understand your desire, I don't think that we should spend 
time on this (see my remark on timing below).


> defined by the consumer, not by the provider. Otherwise the resulting 
> behavior would become unpredictable; likely to result in user 
> experiences with mixed languages.
>>> It may
>>> be the list of available translation languages, formatting locales,
>>> those locales for which linguistic sorting behavior is supported,
>>> or something alike.
>>>     
>>
>> Yes, and we need to support the service implementer making the 
>> decision about which pattern to advertise and/or use. You and I might 
>> choose entirely different criteria for choosing how we advertise 
>> locale support for a given service.
>>
>>  
>>> Because a service cannot determine the appropriate
>>> locale for the locale sensitive service operation, it needs to be
>>> made
>>> possible for the service consumer to discover what locale is
>>> supported,
>>> in order for the application to produce the desired UI behavior.
>>>     
>>
>> I agree, with a nit:
>>
>> - sometimes it doesn't make sense to list everything that is 
>> available. Sometimes it is better (consumes less bandwidth, 
>> processing, etc.) to say: "I'll do my best to match your request". 
>> This can even make sense when the list is quite short.
>>   
> I agree it is sometimes unnecessary for consumers to know what locales 
> are supported. In other cases, as mentioned, people may dislike mixed 
> languages on UI and an application needs to control the locales in 
> which the service operates. Suppose an application UI had three 
> sections each presenting text information from different services, the 
> user experience may be better if their language is the same. If the 
> information is dated, the date format would be expected to be consistent.
>>>> My concern here is that many services fall into a sort of middle
>>>>       
>>> category: they can service many locales, but only have a limited
>>> set of localizations. Messages from the services are necessarily
>>> constrained to the smaller set, while the service might actually be
>>> useful for a larger set.
>>>     Yes that is why we think the translation locale and the locale for
>>> other purposes should be identified separately.
>>>     
>>
>> I understand that's your intent. However I think this will confuse 
>> the vast preponderance of developers who have only a very rough idea 
>> what a "locale" or a "language" is. There are also different ways 
>> that services can be provisioned. It may not be possible to enumerate 
>> one list or the other easily. Having two things that do roughly the 
>> same thing doesn't seem that useful to me. How often do you actually 
>> set LC_MESSAGES separately from LC_ALL?
>>   
> Whenever a user's preferred locale is supported but preferred language 
> is not, he or she would set LC_MESSAGES explicitly. This is often 
> needed because the set of supported translation languages is usually 
> small. I wonder if you also mean LC_MESSAGES is confusing or not 
> needed because it won't be set often and does not seem so useful.
>
> Because the sets of supported languages and locales are usually 
> different, they are practically different. I do agree they are 
> conceptually roughly the same. However, in reality, service consumers 
> are usually interested in serving the user in their most preferred 
> available language and locale, but this is hard to achieve without 
> specifying the locale and language separately. Both of users' 
> preferred locale and language should be honored, but too often 
> language resources are not available and an alternative language 
> different from the language deduced from the preferred locale ought to 
> be used instead. This alternative language needs to be identified and 
> this is why #3 language (and LC_MESSAGES) is needed.
>>>> My tendency is still to think that this is "locale" and not
>>>>       
>>> "language". It looks like a bug to get a message like: "There were
>>> « 1 234 » entries sorted on 14 juin." Where the locale was clearly
>>> one thing and the messages in another language.
>>>     Having both #1 locale and #3 language does not mean that would
>>> produce
>>> the odd message. If using the same locale for the message
>>> formatting is
>>> a requirement, the component can use #3 language alone to make the
>>> message locale consistent.
>>>     
>>
>>
>> But this is inconsistent with the design of WS-I18N, where "locale" 
>> is the "big knob". I tend to think that relatively few people would 
>> know how to write an application like this.
>>   
> WS-I18N needs a little knob to deal with the fact that translation 
> resources are missing in many use cases. Usually "locale" identifies 
> the user's preferred locale, which is usually supported. "language" 
> may be deduced from "locale", however, support for the preferred 
> language is often not available, so the alternative language must be 
> identified.
>> A better solution might be: if we provide a list of available 
>> locales, we can provide an additional attribute to indicate which 
>> ones have been provisioned with messages. For example:
>>
>> <i18n:locale>
>>   <i18n:option default="true" localized="true">en-GB</i18n:option>
>>   <i18n:option localized="true">de</i18n:option>
>>   <i18n:option>fr</i18n:option>
>> </i18n:locale>
>>
>> Here the default locale is "en-GB". German ("de") is also available, 
>> with localizations, as is French ("fr"), sans localization.
>>
>> A request could come in as something like:
>>
>> <i18n:locale>en-US,de-CH-1994,fr</i18n:locale> <!-- in this case, it 
>> matches "de" -->
>>
>> Or perhaps:
>>
>> <i18n:locale>en,zh-yue,ja-JP</i18n:locale> <!-- in this case you get 
>> en-GB as the default -->
>>
>> And finally:
>>
>> <i18n:locale>fr-FR</i18n:locale> <!-- you get French locale behavior, 
>> but probably en-GB messages; no "fr" is available -->
>>
>>
>>  
>>>> What is missing in the current version is that we don't provide:
>>>>
>>>> - a way to enumerate the available items
>>>> - a way to specify the complete set of preferences
>>>> - a reference to RFC 4647 Lookup (that is, locale-based resource
>>>>       
>>> negotiation)
>>>     I agree. Again my understanding is that these are to be provided as
>>> a separate document or a future revision.
>>>     
>>
>> Note that WS-I18N in its current incarnation is exactly the second 
>> draft. W3C's first version (2005-09-14) was taken from a trial 
>> balloon I wrote. At that time there was no Lookup algorithm, no LTLI 
>> (okay, there still isn't an LTLI, but that's something to fix), not 
>> much in the way of LDML, and RFC 4646 was still an Internet-Draft 
>> (with several to follow). With these items available to us, we should 
>> do the work to get WS-I18N right (it's actually a fairly minor set of 
>> revisions required, IMO).
>>   
> I thought that may be months of work. If a comprehensive solution 
> could be included in the next version, that would be great.


I don't see a comprehensive solution yet, although there seems to be 
some rough consensus coming up in this thread. So it's hard to make time 
planning at the moment.

Felix

>>  
>>>> I don't say that Unicode is forced upon people (although using
>>>>       
>>> SOAP is mighty close to forcing UTF-8). What I'm saying is that, as
>>> a parameter, it usually doesn't make a lot of sense. The data often
>>> has to be transcoded for the benefit of (for example) the XML
>>> processor anyway. The fact that data exists as some legacy encoding
>>> affects the results or operation of the service itself (you still
>>> can't store Japanese character data in a WE8ISO8859P1 database even
>>> if the Web service layer permits you to send it some). But it's not
>>> necessarily something that one can usefully specify at the service
>>> layer.
>>>    
>>>> Anyway, I don't want to sound completely absolutist here. I know
>>>>       
>>> what kinds of cases you're thinking of and think they have merit.
>>>     I do agree character set is generally not so useful as other
>>> elements
>>> and not encouraged to use. I just think a character set is
>>> considered as
>>> one of the elements of a locale and some people may find it useful
>>> if WS-I18N defines how to indicate it.
>>>     
>>
>> Character set is considered one of the elements of *some* locale 
>> systems. The question is: what does this parameter do or mean? If I 
>> have a <i18n:charset>ISO8859_1</i18n:charset> in my service's 
>> WS-Policy, does that mean I should transcode my SOAP request to 
>> Latin-1? Am I limited to Latin-1 characters in my request? Will I 
>> only receive Latin-1 characters in the response? The charset 
>> limitation may occur on several different levels of the system or it 
>> may simply be an assertion about the data.
>>
>> Since most developers wouldn't know what an encoding was if it grew 
>> legs and bit them, that makes me wary. If nothing else, we need to 
>> put a big Health Warning sticker on it :-).
>>
>>  
>>> If a requester is only interested in getting responses in a
>>> specific
>>> native character set (e.g. the response will be processed in a
>>> component
>>> that can only process a native encoding, or it will be stored in a
>>> database that can only store a specific native character set), the
>>> service could filter the response based on this information.
>>>
>>>     
>>
>> Degrading the data early is usually a bad option :-). Converting the 
>> data from the UTF-8 used in the transport layer to the local encoding 
>> is usually en effective enough filter---and YOUR code did it, not my 
>> beautiful, pristine service <g>. This, for example, is true when you 
>> find out that "ISO 8859-1" sometimes means "windows-1252"... but 
>> sometimes it doesn't.
>>
>> Anyway, I digress. We can probably find a way to accommodate 
>> 'charset'. All I'm saying is: how we do it is important.
>>   
> All right, I think how i18n:charset can be useful needs to be examined 
> and I am not sure whether it is truly valuable. Due to this lack of 
> support, users who need to deal with a native encoding may decide to 
> migrate to Unicode and that can be a good thing. :-) Please let me 
> withdraw the idea of <charset> element so we can better focus on the 
> other items.
>
> Regards,
> -Dan
>> Addison
>>   
>
>
>
Received on Wednesday, 18 June 2008 01:04:09 UTC