Re: [Comment on WS-I18N WD] from Dan Chiba on 2008-06-19 (www-international@w3.org from April to June 2008)

From: Dan Chiba <dan.chiba@oracle.com>
Date: Thu, 19 Jun 2008 15:23:17 -0700
To: Felix Sasaki <fsasaki@w3.org>
CC: "Phillips, Addison" <addison@amazon.com>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <485ADC55.8020407@oracle.com>
Felix Sasaki wrote:
>
> Dan Chiba さんは書きました:
>>
>> Phillips, Addison wrote:
>>>>> On the other hand, does it make sense to advertise that a Web
>>>>>       
>>>> service supports a locale that it has no messages for? If the
>>>> service normally has no user interface ("formatDate", "addInts",
>>>> "sortStrings"), then the list of available locales might very well
>>>> match the complete set available in the API. At the other end of
>>>> the spectrum are AJAX interactions that build the UI in real time.
>>>> Then only the messages that you actually have available are useful
>>>> to advertise.
>>>>     I think it makes sense to advertise the set of supported locales.
>>>>     
>>>
>>> That would tend to be the point of this work: we provide a way to 
>>> say any of the following:
>>>
>>> - this service is locale-neutral; you may specify a locale, but it 
>>> doesn't do anything to the service
>>> - this service has a specific default locale that it uses ("it is 
>>> always in German"); they user can specify whatever they want, but 
>>> the service always uses this one
>>> - this service has some specific (and specified) list of available 
>>> locales (and by inference some default); the user may specify the 
>>> locale to use and the service will do its best to match it from the 
>>> specified list
>>> - this service is locale sensitive; the user may specify the locale 
>>> to use and the service will do its best to match it, noting that a 
>>> list is not provided
>>>   
>> I think it is very desirable to provide a way to discover supported 
>> locales as well. Then it would be possible for the service consumer 
>> to specify the desired locale, knowing the locale will be used for 
>> the service operation. Generally, the locale should be determined 
>> based on the policy
>
> the WS Policy framework does not provide a policy negotiation 
> mechanism. I would be very reluctant to spend time on developing such 
> a mechanism. Although I understand your desire, I don't think that we 
> should spend time on this (see my remark on timing below).
>
I agree, let's not cover this point at this time.

Regards,
-Dan
>
>> defined by the consumer, not by the provider. Otherwise the resulting 
>> behavior would become unpredictable; likely to result in user 
>> experiences with mixed languages.
>>>> It may
>>>> be the list of available translation languages, formatting locales,
>>>> those locales for which linguistic sorting behavior is supported,
>>>> or something alike.
>>>>     
>>>
>>> Yes, and we need to support the service implementer making the 
>>> decision about which pattern to advertise and/or use. You and I 
>>> might choose entirely different criteria for choosing how we 
>>> advertise locale support for a given service.
>>>
>>>  
>>>> Because a service cannot determine the appropriate
>>>> locale for the locale sensitive service operation, it needs to be
>>>> made
>>>> possible for the service consumer to discover what locale is
>>>> supported,
>>>> in order for the application to produce the desired UI behavior.
>>>>     
>>>
>>> I agree, with a nit:
>>>
>>> - sometimes it doesn't make sense to list everything that is 
>>> available. Sometimes it is better (consumes less bandwidth, 
>>> processing, etc.) to say: "I'll do my best to match your request". 
>>> This can even make sense when the list is quite short.
>>>   
>> I agree it is sometimes unnecessary for consumers to know what 
>> locales are supported. In other cases, as mentioned, people may 
>> dislike mixed languages on UI and an application needs to control the 
>> locales in which the service operates. Suppose an application UI had 
>> three sections each presenting text information from different 
>> services, the user experience may be better if their language is the 
>> same. If the information is dated, the date format would be expected 
>> to be consistent.
>>>>> My concern here is that many services fall into a sort of middle
>>>>>       
>>>> category: they can service many locales, but only have a limited
>>>> set of localizations. Messages from the services are necessarily
>>>> constrained to the smaller set, while the service might actually be
>>>> useful for a larger set.
>>>>     Yes that is why we think the translation locale and the locale for
>>>> other purposes should be identified separately.
>>>>     
>>>
>>> I understand that's your intent. However I think this will confuse 
>>> the vast preponderance of developers who have only a very rough idea 
>>> what a "locale" or a "language" is. There are also different ways 
>>> that services can be provisioned. It may not be possible to 
>>> enumerate one list or the other easily. Having two things that do 
>>> roughly the same thing doesn't seem that useful to me. How often do 
>>> you actually set LC_MESSAGES separately from LC_ALL?
>>>   
>> Whenever a user's preferred locale is supported but preferred 
>> language is not, he or she would set LC_MESSAGES explicitly. This is 
>> often needed because the set of supported translation languages is 
>> usually small. I wonder if you also mean LC_MESSAGES is confusing or 
>> not needed because it won't be set often and does not seem so useful.
>>
>> Because the sets of supported languages and locales are usually 
>> different, they are practically different. I do agree they are 
>> conceptually roughly the same. However, in reality, service consumers 
>> are usually interested in serving the user in their most preferred 
>> available language and locale, but this is hard to achieve without 
>> specifying the locale and language separately. Both of users' 
>> preferred locale and language should be honored, but too often 
>> language resources are not available and an alternative language 
>> different from the language deduced from the preferred locale ought 
>> to be used instead. This alternative language needs to be identified 
>> and this is why #3 language (and LC_MESSAGES) is needed.
>>>>> My tendency is still to think that this is "locale" and not
>>>>>       
>>>> "language". It looks like a bug to get a message like: "There were
>>>> « 1 234 » entries sorted on 14 juin." Where the locale was clearly
>>>> one thing and the messages in another language.
>>>>     Having both #1 locale and #3 language does not mean that would
>>>> produce
>>>> the odd message. If using the same locale for the message
>>>> formatting is
>>>> a requirement, the component can use #3 language alone to make the
>>>> message locale consistent.
>>>>     
>>>
>>>
>>> But this is inconsistent with the design of WS-I18N, where "locale" 
>>> is the "big knob". I tend to think that relatively few people would 
>>> know how to write an application like this.
>>>   
>> WS-I18N needs a little knob to deal with the fact that translation 
>> resources are missing in many use cases. Usually "locale" identifies 
>> the user's preferred locale, which is usually supported. "language" 
>> may be deduced from "locale", however, support for the preferred 
>> language is often not available, so the alternative language must be 
>> identified.
>>> A better solution might be: if we provide a list of available 
>>> locales, we can provide an additional attribute to indicate which 
>>> ones have been provisioned with messages. For example:
>>>
>>> <i18n:locale>
>>>   <i18n:option default="true" localized="true">en-GB</i18n:option>
>>>   <i18n:option localized="true">de</i18n:option>
>>>   <i18n:option>fr</i18n:option>
>>> </i18n:locale>
>>>
>>> Here the default locale is "en-GB". German ("de") is also available, 
>>> with localizations, as is French ("fr"), sans localization.
>>>
>>> A request could come in as something like:
>>>
>>> <i18n:locale>en-US,de-CH-1994,fr</i18n:locale> <!-- in this case, it 
>>> matches "de" -->
>>>
>>> Or perhaps:
>>>
>>> <i18n:locale>en,zh-yue,ja-JP</i18n:locale> <!-- in this case you get 
>>> en-GB as the default -->
>>>
>>> And finally:
>>>
>>> <i18n:locale>fr-FR</i18n:locale> <!-- you get French locale 
>>> behavior, but probably en-GB messages; no "fr" is available -->
>>>
>>>
>>>  
>>>>> What is missing in the current version is that we don't provide:
>>>>>
>>>>> - a way to enumerate the available items
>>>>> - a way to specify the complete set of preferences
>>>>> - a reference to RFC 4647 Lookup (that is, locale-based resource
>>>>>       
>>>> negotiation)
>>>>     I agree. Again my understanding is that these are to be 
>>>> provided as
>>>> a separate document or a future revision.
>>>>     
>>>
>>> Note that WS-I18N in its current incarnation is exactly the second 
>>> draft. W3C's first version (2005-09-14) was taken from a trial 
>>> balloon I wrote. At that time there was no Lookup algorithm, no LTLI 
>>> (okay, there still isn't an LTLI, but that's something to fix), not 
>>> much in the way of LDML, and RFC 4646 was still an Internet-Draft 
>>> (with several to follow). With these items available to us, we 
>>> should do the work to get WS-I18N right (it's actually a fairly 
>>> minor set of revisions required, IMO).
>>>   
>> I thought that may be months of work. If a comprehensive solution 
>> could be included in the next version, that would be great.
>
>
> I don't see a comprehensive solution yet, although there seems to be 
> some rough consensus coming up in this thread. So it's hard to make 
> time planning at the moment.
>
> Felix
>
>>>  
>>>>> I don't say that Unicode is forced upon people (although using
>>>>>       
>>>> SOAP is mighty close to forcing UTF-8). What I'm saying is that, as
>>>> a parameter, it usually doesn't make a lot of sense. The data often
>>>> has to be transcoded for the benefit of (for example) the XML
>>>> processor anyway. The fact that data exists as some legacy encoding
>>>> affects the results or operation of the service itself (you still
>>>> can't store Japanese character data in a WE8ISO8859P1 database even
>>>> if the Web service layer permits you to send it some). But it's not
>>>> necessarily something that one can usefully specify at the service
>>>> layer.
>>>>   
>>>>> Anyway, I don't want to sound completely absolutist here. I know
>>>>>       
>>>> what kinds of cases you're thinking of and think they have merit.
>>>>     I do agree character set is generally not so useful as other
>>>> elements
>>>> and not encouraged to use. I just think a character set is
>>>> considered as
>>>> one of the elements of a locale and some people may find it useful
>>>> if WS-I18N defines how to indicate it.
>>>>     
>>>
>>> Character set is considered one of the elements of *some* locale 
>>> systems. The question is: what does this parameter do or mean? If I 
>>> have a <i18n:charset>ISO8859_1</i18n:charset> in my service's 
>>> WS-Policy, does that mean I should transcode my SOAP request to 
>>> Latin-1? Am I limited to Latin-1 characters in my request? Will I 
>>> only receive Latin-1 characters in the response? The charset 
>>> limitation may occur on several different levels of the system or it 
>>> may simply be an assertion about the data.
>>>
>>> Since most developers wouldn't know what an encoding was if it grew 
>>> legs and bit them, that makes me wary. If nothing else, we need to 
>>> put a big Health Warning sticker on it :-).
>>>
>>>  
>>>> If a requester is only interested in getting responses in a
>>>> specific
>>>> native character set (e.g. the response will be processed in a
>>>> component
>>>> that can only process a native encoding, or it will be stored in a
>>>> database that can only store a specific native character set), the
>>>> service could filter the response based on this information.
>>>>
>>>>     
>>>
>>> Degrading the data early is usually a bad option :-). Converting the 
>>> data from the UTF-8 used in the transport layer to the local 
>>> encoding is usually en effective enough filter---and YOUR code did 
>>> it, not my beautiful, pristine service <g>. This, for example, is 
>>> true when you find out that "ISO 8859-1" sometimes means 
>>> "windows-1252"... but sometimes it doesn't.
>>>
>>> Anyway, I digress. We can probably find a way to accommodate 
>>> 'charset'. All I'm saying is: how we do it is important.
>>>   
>> All right, I think how i18n:charset can be useful needs to be 
>> examined and I am not sure whether it is truly valuable. Due to this 
>> lack of support, users who need to deal with a native encoding may 
>> decide to migrate to Unicode and that can be a good thing. :-) Please 
>> let me withdraw the idea of <charset> element so we can better focus 
>> on the other items.
>>
>> Regards,
>> -Dan
>>> Addison
>>>   
>>
>>
>>
>
>
Received on Thursday, 19 June 2008 22:23:52 UTC