Re: [Comment on WS-I18N WD] from Dan Chiba on 2008-06-17 (www-international@w3.org from April to June 2008)

From: Dan Chiba <dan.chiba@oracle.com>
Date: Tue, 17 Jun 2008 07:46:34 -0700
To: "Phillips, Addison" <addison@amazon.com>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <4857CE4A.6040605@oracle.com>
Phillips, Addison wrote:
>>> On the other hand, does it make sense to advertise that a Web
>>>       
>> service supports a locale that it has no messages for? If the
>> service normally has no user interface ("formatDate", "addInts",
>> "sortStrings"), then the list of available locales might very well
>> match the complete set available in the API. At the other end of
>> the spectrum are AJAX interactions that build the UI in real time.
>> Then only the messages that you actually have available are useful
>> to advertise.
>>     
>> I think it makes sense to advertise the set of supported locales.
>>     
>
> That would tend to be the point of this work: we provide a way to say any of the following:
>
> - this service is locale-neutral; you may specify a locale, but it doesn't do anything to the service
> - this service has a specific default locale that it uses ("it is always in German"); they user can specify whatever they want, but the service always uses this one
> - this service has some specific (and specified) list of available locales (and by inference some default); the user may specify the locale to use and the service will do its best to match it from the specified list
> - this service is locale sensitive; the user may specify the locale to use and the service will do its best to match it, noting that a list is not provided
>   
I think it is very desirable to provide a way to discover supported 
locales as well. Then it would be possible for the service consumer to 
specify the desired locale, knowing the locale will be used for the 
service operation. Generally, the locale should be determined based on 
the policy defined by the consumer, not by the provider. Otherwise the 
resulting behavior would become unpredictable; likely to result in user 
experiences with mixed languages.
>> It may
>> be the list of available translation languages, formatting locales,
>> those locales for which linguistic sorting behavior is supported,
>> or something alike.
>>     
>
> Yes, and we need to support the service implementer making the decision about which pattern to advertise and/or use. You and I might choose entirely different criteria for choosing how we advertise locale support for a given service.
>
>   
>> Because a service cannot determine the appropriate
>> locale for the locale sensitive service operation, it needs to be
>> made
>> possible for the service consumer to discover what locale is
>> supported,
>> in order for the application to produce the desired UI behavior.
>>     
>
> I agree, with a nit:
>
> - sometimes it doesn't make sense to list everything that is available. Sometimes it is better (consumes less bandwidth, processing, etc.) to say: "I'll do my best to match your request". This can even make sense when the list is quite short.
>   
I agree it is sometimes unnecessary for consumers to know what locales 
are supported. In other cases, as mentioned, people may dislike mixed 
languages on UI and an application needs to control the locales in which 
the service operates. Suppose an application UI had three sections each 
presenting text information from different services, the user experience 
may be better if their language is the same. If the information is 
dated, the date format would be expected to be consistent.
>>> My concern here is that many services fall into a sort of middle
>>>       
>> category: they can service many locales, but only have a limited
>> set of localizations. Messages from the services are necessarily
>> constrained to the smaller set, while the service might actually be
>> useful for a larger set.
>>     
>> Yes that is why we think the translation locale and the locale for
>> other purposes should be identified separately.
>>     
>
> I understand that's your intent. However I think this will confuse the vast preponderance of developers who have only a very rough idea what a "locale" or a "language" is. There are also different ways that services can be provisioned. It may not be possible to enumerate one list or the other easily. Having two things that do roughly the same thing doesn't seem that useful to me. How often do you actually set LC_MESSAGES separately from LC_ALL?
>   
Whenever a user's preferred locale is supported but preferred language 
is not, he or she would set LC_MESSAGES explicitly. This is often needed 
because the set of supported translation languages is usually small. I 
wonder if you also mean LC_MESSAGES is confusing or not needed because 
it won't be set often and does not seem so useful.

Because the sets of supported languages and locales are usually 
different, they are practically different. I do agree they are 
conceptually roughly the same. However, in reality, service consumers 
are usually interested in serving the user in their most preferred 
available language and locale, but this is hard to achieve without 
specifying the locale and language separately. Both of users' preferred 
locale and language should be honored, but too often language resources 
are not available and an alternative language different from the 
language deduced from the preferred locale ought to be used instead. 
This alternative language needs to be identified and this is why #3 
language (and LC_MESSAGES) is needed.
>>> My tendency is still to think that this is "locale" and not
>>>       
>> "language". It looks like a bug to get a message like: "There were
>> « 1 234 » entries sorted on 14 juin." Where the locale was clearly
>> one thing and the messages in another language.
>>     
>> Having both #1 locale and #3 language does not mean that would
>> produce
>> the odd message. If using the same locale for the message
>> formatting is
>> a requirement, the component can use #3 language alone to make the
>> message locale consistent.
>>     
>
>
> But this is inconsistent with the design of WS-I18N, where "locale" is the "big knob". I tend to think that relatively few people would know how to write an application like this.
>   
WS-I18N needs a little knob to deal with the fact that translation 
resources are missing in many use cases. Usually "locale" identifies the 
user's preferred locale, which is usually supported. "language" may be 
deduced from "locale", however, support for the preferred language is 
often not available, so the alternative language must be identified.
> A better solution might be: if we provide a list of available locales, we can provide an additional attribute to indicate which ones have been provisioned with messages. For example:
>
> <i18n:locale>
>   <i18n:option default="true" localized="true">en-GB</i18n:option>
>   <i18n:option localized="true">de</i18n:option>
>   <i18n:option>fr</i18n:option>
> </i18n:locale>
>
> Here the default locale is "en-GB". German ("de") is also available, with localizations, as is French ("fr"), sans localization.
>
> A request could come in as something like:
>
> <i18n:locale>en-US,de-CH-1994,fr</i18n:locale> <!-- in this case, it matches "de" -->
>
> Or perhaps:
>
> <i18n:locale>en,zh-yue,ja-JP</i18n:locale> <!-- in this case you get en-GB as the default -->
>
> And finally:
>
> <i18n:locale>fr-FR</i18n:locale> <!-- you get French locale behavior, but probably en-GB messages; no "fr" is available -->
>
>
>   
>>> What is missing in the current version is that we don't provide:
>>>
>>> - a way to enumerate the available items
>>> - a way to specify the complete set of preferences
>>> - a reference to RFC 4647 Lookup (that is, locale-based resource
>>>       
>> negotiation)
>>     
>> I agree. Again my understanding is that these are to be provided as
>> a separate document or a future revision.
>>     
>
> Note that WS-I18N in its current incarnation is exactly the second draft. W3C's first version (2005-09-14) was taken from a trial balloon I wrote. At that time there was no Lookup algorithm, no LTLI (okay, there still isn't an LTLI, but that's something to fix), not much in the way of LDML, and RFC 4646 was still an Internet-Draft (with several to follow). With these items available to us, we should do the work to get WS-I18N right (it's actually a fairly minor set of revisions required, IMO).
>   
I thought that may be months of work. If a comprehensive solution could 
be included in the next version, that would be great.
>   
>>> I don't say that Unicode is forced upon people (although using
>>>       
>> SOAP is mighty close to forcing UTF-8). What I'm saying is that, as
>> a parameter, it usually doesn't make a lot of sense. The data often
>> has to be transcoded for the benefit of (for example) the XML
>> processor anyway. The fact that data exists as some legacy encoding
>> affects the results or operation of the service itself (you still
>> can't store Japanese character data in a WE8ISO8859P1 database even
>> if the Web service layer permits you to send it some). But it's not
>> necessarily something that one can usefully specify at the service
>> layer.
>>     
>>> Anyway, I don't want to sound completely absolutist here. I know
>>>       
>> what kinds of cases you're thinking of and think they have merit.
>>     
>> I do agree character set is generally not so useful as other
>> elements
>> and not encouraged to use. I just think a character set is
>> considered as
>> one of the elements of a locale and some people may find it useful
>> if WS-I18N defines how to indicate it.
>>     
>
> Character set is considered one of the elements of *some* locale systems. The question is: what does this parameter do or mean? If I have a <i18n:charset>ISO8859_1</i18n:charset> in my service's WS-Policy, does that mean I should transcode my SOAP request to Latin-1? Am I limited to Latin-1 characters in my request? Will I only receive Latin-1 characters in the response? The charset limitation may occur on several different levels of the system or it may simply be an assertion about the data.
>
> Since most developers wouldn't know what an encoding was if it grew legs and bit them, that makes me wary. If nothing else, we need to put a big Health Warning sticker on it :-).
>
>   
>> If a requester is only interested in getting responses in a
>> specific
>> native character set (e.g. the response will be processed in a
>> component
>> that can only process a native encoding, or it will be stored in a
>> database that can only store a specific native character set), the
>> service could filter the response based on this information.
>>
>>     
>
> Degrading the data early is usually a bad option :-). Converting the data from the UTF-8 used in the transport layer to the local encoding is usually en effective enough filter---and YOUR code did it, not my beautiful, pristine service <g>. This, for example, is true when you find out that "ISO 8859-1" sometimes means "windows-1252"... but sometimes it doesn't.
>
> Anyway, I digress. We can probably find a way to accommodate 'charset'. All I'm saying is: how we do it is important.
>   
All right, I think how i18n:charset can be useful needs to be examined 
and I am not sure whether it is truly valuable. Due to this lack of 
support, users who need to deal with a native encoding may decide to 
migrate to Unicode and that can be a good thing. :-) Please let me 
withdraw the idea of <charset> element so we can better focus on the 
other items.

Regards,
-Dan
> Addison
>
Received on Tuesday, 17 June 2008 14:48:06 UTC