W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > August 2016

Re: [i18n review comment] BP3 should recommend locale-neutral representation #187

From: Phil Archer <phila@w3.org>
Date: Mon, 22 Aug 2016 10:33:36 +0100
To: Deirdre Lee <deirdre@derilinx.com>, "Phillips, Addison" <addison@lab126.com>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Annette Greiner <amgreiner@lbl.gov>
Cc: "ishida@w3.org" <ishida@w3.org>, "public-dwbp-comments@w3.org" <public-dwbp-comments@w3.org>, www International <www-international@w3.org>
Message-ID: <8d248fb7-5148-fd01-26fb-0b6ad9415093@w3.org>
Dear all,

I have taken further steps on this. The result can be seen at

1. Addision's text used more or less verbatim;
1a. taken account of Annette's suggestion;
1b. replaced inline links to BCP47 and CLDR with references
2. title of the BP changed to Use locale-neutral data representations
3. moved to Data Formats section as resolved in WG meeting on Friday;
4. added R-FormatMachineRead to list of evidence and thereby updated the 
UCR cross matching;
5. updated the Challenges SVG diagram;
6. updated my Pull request.

NB, I *retained* the old ID for the BP so that any links to 
#LocaleParametersMetadata will still work. I know there are some of 
these, for example, in the Share-PSI project.



On 22/08/2016 08:52, Deirdre Lee wrote:
> HI,
> Thank you for your comments Addison. I think they make sense and should
> be straight-forward to incorporate.
> The title of the BP should probably also be updated to something like
> 'Provide locale-neutral data'
> Phil and DWBP editors, in Friday's meeting we also agreed to move BP3 to
> the Data Formats section from the Metadata section, which would make it
> BP14, right?
> Kind regards,
> Deirdre
> On 19/08/2016 17:39, Phillips, Addison wrote:
>> Hi Phil,
>> Thanks for starting on this. I think the pull request is a good start.
>> I have some comments on it.
>> My main concern is that this BP is really backwards. It recommends to
>> "locale parameter metadata" and then says that the simplest way to do
>> this is to use locale-neutral formats. The recommendation should be
>> more like "use locale-neutral formats or provide locale/language
>> information where that's not possible". The pull request captures the
>> use of locale-neutral, but doesn't really explain about when to
>> provide locale and language information.
>> I would change this:
>> --
>> <p class="practicedesc">Provide metadata about locale parameters
>> (date, time, and number formats, language).</p>
>> --
>> To say:
>> --
>> <p class="practicedesc">Use locale-neutral data structures and values,
>> or, where that is not possible, provide metadata about the locale used
>> by data values.</p>
>> --
>> I would change:
>> --
>> <p>The simplest method is to use local-neutral representations of the
>> actual data, and then add metadata to provide relevant locale
>> information. For example, rather than storing "€2000.00" as a string,
>> it's strongly preferred to exchange a data structure such as:</p>
>> --
>> To say:
>> --
>> <p>Most common data representations are locale neutral. For example,
>> XML Schema types such as xsd:integer and xsd: date are intended for
>> locale-neutral data interchange. Using locale-neutral representations
>> allows the data values to be processed accurately without complex
>> parsing or misinterpretation and also allows the data to be presented
>> in the format most comfortable for the consumer of the data. For
>> example, rather than storing "€2000,00" as a string, it's strongly
>> preferred to exchange a data structure such as:</p>
>> --
>> Also, note the misspelling of "locale-neutral" in the pull request.
>> I would then go on to add some text about when locale parameters are
>> needed. Something like:
>> --
>> Some datasets contain values that are not or cannot be rendered into a
>> locale-neutral format. This is particularly true of any natural
>> language text values. For each data field that can contain locale
>> affected or natural language text, there should be an associated
>> language tag used to indicate the language and locale of the data.
>> This locale information can be used in parsing the data or to ensure
>> proper presentation and processing of the value by the consumer.
>> --
>> (Sorry for not generating a pull request of my own)
>> Addison
>>> -----Original Message-----
>>> From: Phil Archer [mailto:phila@w3.org]
>>> Sent: Friday, August 19, 2016 8:37 AM
>>> To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>; Annette Greiner
>>> <amgreiner@lbl.gov>
>>> Cc: Phillips, Addison <addison@lab126.com>; ishida@w3.org; public-dwbp-
>>> comments@w3.org; www International <www-international@w3.org>
>>> Subject: Re: [i18n review comment] BP3 should recommend locale-neutral
>>> representation #187
>>> I took an action on today's call to try and address this in BP3. You
>>> can see the
>>> results at
>>> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata
>>> This uses some of Addison's text directly and highlights the value of
>>> the xsd
>>> datatypes - but retains enough of the original BP for it to be an
>>> amendment
>>> rather than a whole new one - I hope.
>>> This addresses most of the resolution taken today [1] but I have not
>>> moved
>>> the BP to the formats section. I leave that to the editors who may
>>> want to
>>> make further changes - or argue for it to be left where it is, or add
>>> references
>>> from the formats section or, or, or...
>>> I've created the Pull Request https://github.com/w3c/dwbp/pull/447
>>> Phil.
>>> [1] https://www.w3.org/2016/08/19-dwbp-minutes#resolution02
>>> On 15/08/2016 17:28, Bernadette Farias Lóscio wrote:
>>>> Dear Ishida,
>>>> This comment [1] is still under discussion [4] and we'd like to ask
>>>> your opinion about two of our proposals:
>>>> 1. to include locale-neutral representation ideas as part of BP3 [2],
>>>> or 2. to include a paragraph at the introduction of Section 8.8 Data
>>>> Formats [3] to discuss the relevance of having local-neutral
>>>> representations.
>>>> We also discussed the proposal of having a new BP and we agreed that
>>>> we won't have a lot of time for a broader review of the new BP and to
>>>> collect feedback from the community.
>>>> Thanks a lot!
>>>> DWBP editors
>>>> [1] https://lists.w3.org/Archives/Public/public-dwbp-comments/
>>>> 2016Jul/0028.html
>>>> [2]http://agreiner.github.io/dwbp/bp.html#LocaleParametersMetadata
>>>> [3] https://www.w3.org/TR/dwbp/#dataFormats
>>>> [4]
>>>> https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Aug/0009.html
>>>> 2016-08-04 23:26 GMT+02:00 Annette Greiner <amgreiner@lbl.gov>:
>>>>> Hi Addison,
>>>>> Thanks for your response, and it does make sense. I think what I am
>>>>> still missing is whether there is guidance we can point to as to how
>>>>> to represent the "locale-neutral" data so that it can most easily be
>>>>> made locale specific by existing tools. You mention "pre-made
>>>>> standards for the basic data types". Is there a recommended list we
>>>>> could
>>> reference?
>>>>> Thanks for your help!
>>>>> -Annette
>>>>> On 8/4/16 12:31 PM, Phillips, Addison wrote:
>>>>>> Hi Annette,
>>>>>> Thanks for the note. This is a personal reply not on behalf of the
>>>>>> WG.
>>>>>> Locale neutral formats are quite common on the Web and the Internet
>>>>>> in general. One familiar format referenced by your document, for
>>>>>> example, is XML Schema. While the representations of numbers, dates,
>>>>>> and the like in XML Schema would be "more appropriate" for some
>>>>>> languages/locales than others if given as plain text, what
>>>>>> distinguishes them is that they are all machine readable and
>>>>>> intended to
>>> be read by machines for later processing.
>>>>>> The display of values is a separate, local, concern for the data's
>>>>>> consumer. This necessarily means choosing specific separators (such
>>>>>> as decimal separators) over other, more localized values. Save for
>>>>>> "free
>>> text"
>>>>>> (natural language) data, most data formats are locale neutral and
>>>>>> these include things like JSON-LD, XML Schema, CSV, and so forth.
>>>>>> Not every possible data structure or data value is, of course,
>>>>>> covered fully. For example, in my day job (I work at Amazon), we
>>>>>> have many different common measurement units defined internally. To
>>>>>> transmit these in a locale-neutral manner, we need to construct our
>>>>>> own data schemas and identifiers. There are profoundly many ways to
>>>>>> measure shoes, dresses, auto parts, hats, drone propellers, and so
>>>>>> forth. But it would be a nightmare to have to deal with localized
>>> presentation formats on top of that.
>>>>>> But there are pre-made standards for the basic data types and these
>>>>>> are what are needed to build almost any data structure necessary for
>>>>>> global interchange of data.
>>>>>> Does that make sense?
>>>>>> Addison
>>>>>> Addison Phillips
>>>>>> Principal SDE, I18N Architect (Amazon) Chair (W3C I18N WG)
>>>>>> Internationalization is not a feature.
>>>>>> It is an architecture.
>>>>>> -----Original Message-----
>>>>>>> From: Annette Greiner [mailto:amgreiner@lbl.gov]
>>>>>>> Sent: Thursday, August 04, 2016 12:04 PM
>>>>>>> To: ishida@w3.org; public-dwbp-comments@w3.org
>>>>>>> Cc: www International <www-international@w3.org>
>>>>>>> Subject: Re: [i18n review comment] BP3 should recommend
>>>>>>> locale-neutral representation #187
>>>>>>> Hello on behalf of the DWBP WG,
>>>>>>> We're interested in pursuing this concept in our best practice
>>>>>>> document, but we would like some clarification of the practice of
>>>>>>> locale neutrality.
>>>>>>> You
>>>>>>> mention the variation across locales in decimal symbol, grouping
>>>>>>> symbol, number of grouping digits, digit shapes, etc., and you give
>>>>>>> an example of a locale-neutral data structure for monetary values.
>>>>>>> But this structure alone does not appear to address differences in
>>>>>>> decimal symbol, grouping symbol, number of grouping digits, or
>>>>>>> digit shapes. It does provide a mechanism to separately specify the
>>>>>>> units, and the example uses an ISO-4217 currency code, both of
>>>>>>> which we agree are good ideas. Is there a broad standard (beyond
>>>>>>> just monetary) for addressing the other symbol/representation
>>>>>>> issues you raised that we can address briefly in our best practice?
>>>>>>> Do you consider SI units consistent with a locale-neutral approach?
>>>>>>> Is there a locale-neutral standard for representing decimal numbers
>>>>>>> (perhaps using a period and no grouping, as in your example)?
>>>>>>> -Annette
>>>>>>> On 7/22/16 5:32 AM, ishida@w3.org wrote:
>>>>>>>> [raised by aphillips]
>>>>>>>> https://www.w3.org/TR/dwbp/#LocaleParametersMetadata
>>>>>>>> Best practice #3 introduces itself as:
>>>>>>>> Providing locale parameters helps humans and computer applications
>>>>>>>> to work accurately with things like dates, currencies and numbers
>>>>>>>> that may look similar but have different meanings in different
>>>>>>>> locales.
>>>>>>>> But the actual best practice is to use **locale-neutral**
>>>>>>>> representations that are interpreted/displayed to end-users in a
>>>>>>>> locale-appropriate manner. For example, instead of storing the
>>>>>>>> string "€2000.00", exchanging a data structure like the following
>>>>>>>> is strongly
>>>>>>>> preferred:
>>>>>>>> ```
>>>>>>>> "price" {
>>>>>>>>      "value": 2000.00,
>>>>>>>>      "currency": "EUR"
>>>>>>>> }
>>>>>>>> ```
>>>>>>>> The date examples given are all in xsd:date format, which is an
>>>>>>>> excellent example of using a locale-neutral format.
>>>>>>>> Many things are dependent on locale: decimal symbol, grouping
>>>>>>>> symbol, number of grouping digits, digit shapes, etc. It's because
>>>>>>>> there can be wide variation (sometimes open to misinterpretation)
>>>>>>>> that sending a locale neutral format is preferred for data values.
>>>>>>>> Note also btw that the position of the currency symbol is
>>>>>>>> dependent on the locale. In France it would be normal to write
>>> 2000.00 € rather than €2000.00.
>>>>>>>> Same even when talking about USD when using $, ie. 2000.00 $.
>>>>>>>> --
>>>>>>> Annette Greiner
>>>>>>> NERSC Data and Analytics Services
>>>>>>> Lawrence Berkeley National Laboratory
>>>>> --
>>>>> Annette Greiner
>>>>> NERSC Data and Analytics Services
>>>>> Lawrence Berkeley National Laboratory
>>> --
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1


Phil Archer
W3C Data Activity Lead

+44 (0)7887 767755
Received on Monday, 22 August 2016 09:31:06 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:38:13 UTC