Re: [i18n review comment] BP3 should recommend locale-neutral representation #187

Looks good, thanks Phil.


On 22/08/2016 10:33, Phil Archer wrote:
> Dear all,
>
> I have taken further steps on this. The result can be seen at
> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata
>
> 1. Addision's text used more or less verbatim;
> 1a. taken account of Annette's suggestion;
> 1b. replaced inline links to BCP47 and CLDR with references
> 2. title of the BP changed to Use locale-neutral data representations
> 3. moved to Data Formats section as resolved in WG meeting on Friday;
> 4. added R-FormatMachineRead to list of evidence and thereby updated 
> the UCR cross matching;
> 5. updated the Challenges SVG diagram;
> 6. updated my Pull request.
>
> NB, I *retained* the old ID for the BP so that any links to 
> #LocaleParametersMetadata will still work. I know there are some of 
> these, for example, in the Share-PSI project.
>
> HTH
>
> Phil.
>
>
>
> On 22/08/2016 08:52, Deirdre Lee wrote:
>> HI,
>>
>> Thank you for your comments Addison. I think they make sense and should
>> be straight-forward to incorporate.
>>
>> The title of the BP should probably also be updated to something like
>> 'Provide locale-neutral data'
>>
>> Phil and DWBP editors, in Friday's meeting we also agreed to move BP3 to
>> the Data Formats section from the Metadata section, which would make it
>> BP14, right?
>>
>> Kind regards,
>>
>> Deirdre
>>
>>
>>
>> On 19/08/2016 17:39, Phillips, Addison wrote:
>>> Hi Phil,
>>>
>>> Thanks for starting on this. I think the pull request is a good start.
>>> I have some comments on it.
>>>
>>> My main concern is that this BP is really backwards. It recommends to
>>> "locale parameter metadata" and then says that the simplest way to do
>>> this is to use locale-neutral formats. The recommendation should be
>>> more like "use locale-neutral formats or provide locale/language
>>> information where that's not possible". The pull request captures the
>>> use of locale-neutral, but doesn't really explain about when to
>>> provide locale and language information.
>>>
>>> I would change this:
>>>
>>> -- 
>>> <p class="practicedesc">Provide metadata about locale parameters
>>> (date, time, and number formats, language).</p>
>>> -- 
>>>
>>> To say:
>>>
>>> -- 
>>> <p class="practicedesc">Use locale-neutral data structures and values,
>>> or, where that is not possible, provide metadata about the locale used
>>> by data values.</p>
>>> -- 
>>>
>>> I would change:
>>>
>>> -- 
>>> <p>The simplest method is to use local-neutral representations of the
>>> actual data, and then add metadata to provide relevant locale
>>> information. For example, rather than storing "€2000.00" as a string,
>>> it's strongly preferred to exchange a data structure such as:</p>
>>> -- 
>>>
>>> To say:
>>>
>>> -- 
>>> <p>Most common data representations are locale neutral. For example,
>>> XML Schema types such as xsd:integer and xsd: date are intended for
>>> locale-neutral data interchange. Using locale-neutral representations
>>> allows the data values to be processed accurately without complex
>>> parsing or misinterpretation and also allows the data to be presented
>>> in the format most comfortable for the consumer of the data. For
>>> example, rather than storing "€2000,00" as a string, it's strongly
>>> preferred to exchange a data structure such as:</p>
>>> -- 
>>>
>>> Also, note the misspelling of "locale-neutral" in the pull request.
>>>
>>> I would then go on to add some text about when locale parameters are
>>> needed. Something like:
>>>
>>> -- 
>>> Some datasets contain values that are not or cannot be rendered into a
>>> locale-neutral format. This is particularly true of any natural
>>> language text values. For each data field that can contain locale
>>> affected or natural language text, there should be an associated
>>> language tag used to indicate the language and locale of the data.
>>> This locale information can be used in parsing the data or to ensure
>>> proper presentation and processing of the value by the consumer.
>>> -- 
>>>
>>> (Sorry for not generating a pull request of my own)
>>>
>>> Addison
>>>
>>>> -----Original Message-----
>>>> From: Phil Archer [mailto:phila@w3.org]
>>>> Sent: Friday, August 19, 2016 8:37 AM
>>>> To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>; Annette Greiner
>>>> <amgreiner@lbl.gov>
>>>> Cc: Phillips, Addison <addison@lab126.com>; ishida@w3.org; 
>>>> public-dwbp-
>>>> comments@w3.org; www International <www-international@w3.org>
>>>> Subject: Re: [i18n review comment] BP3 should recommend locale-neutral
>>>> representation #187
>>>>
>>>> I took an action on today's call to try and address this in BP3. You
>>>> can see the
>>>> results at
>>>> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata
>>>>
>>>> This uses some of Addison's text directly and highlights the value of
>>>> the xsd
>>>> datatypes - but retains enough of the original BP for it to be an
>>>> amendment
>>>> rather than a whole new one - I hope.
>>>>
>>>> This addresses most of the resolution taken today [1] but I have not
>>>> moved
>>>> the BP to the formats section. I leave that to the editors who may
>>>> want to
>>>> make further changes - or argue for it to be left where it is, or add
>>>> references
>>>> from the formats section or, or, or...
>>>>
>>>> I've created the Pull Request https://github.com/w3c/dwbp/pull/447
>>>>
>>>> Phil.
>>>>
>>>> [1] https://www.w3.org/2016/08/19-dwbp-minutes#resolution02
>>>>
>>>> On 15/08/2016 17:28, Bernadette Farias Lóscio wrote:
>>>>> Dear Ishida,
>>>>>
>>>>> This comment [1] is still under discussion [4] and we'd like to ask
>>>>> your opinion about two of our proposals:
>>>>>
>>>>> 1. to include locale-neutral representation ideas as part of BP3 [2],
>>>>> or 2. to include a paragraph at the introduction of Section 8.8 Data
>>>>> Formats [3] to discuss the relevance of having local-neutral
>>>>> representations.
>>>>>
>>>>> We also discussed the proposal of having a new BP and we agreed that
>>>>> we won't have a lot of time for a broader review of the new BP and to
>>>>> collect feedback from the community.
>>>>>
>>>>> Thanks a lot!
>>>>> DWBP editors
>>>>>
>>>>> [1] https://lists.w3.org/Archives/Public/public-dwbp-comments/
>>>>> 2016Jul/0028.html
>>>>> [2]http://agreiner.github.io/dwbp/bp.html#LocaleParametersMetadata
>>>>> [3] https://www.w3.org/TR/dwbp/#dataFormats
>>>>> [4]
>>>>> https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Aug/0009.html
>>>>>
>>>>>
>>>>> 2016-08-04 23:26 GMT+02:00 Annette Greiner <amgreiner@lbl.gov>:
>>>>>
>>>>>> Hi Addison,
>>>>>>
>>>>>> Thanks for your response, and it does make sense. I think what I am
>>>>>> still missing is whether there is guidance we can point to as to how
>>>>>> to represent the "locale-neutral" data so that it can most easily be
>>>>>> made locale specific by existing tools. You mention "pre-made
>>>>>> standards for the basic data types". Is there a recommended list we
>>>>>> could
>>>> reference?
>>>>>> Thanks for your help!
>>>>>> -Annette
>>>>>>
>>>>>>
>>>>>> On 8/4/16 12:31 PM, Phillips, Addison wrote:
>>>>>>
>>>>>>> Hi Annette,
>>>>>>>
>>>>>>> Thanks for the note. This is a personal reply not on behalf of the
>>>>>>> WG.
>>>>>>>
>>>>>>> Locale neutral formats are quite common on the Web and the Internet
>>>>>>> in general. One familiar format referenced by your document, for
>>>>>>> example, is XML Schema. While the representations of numbers, 
>>>>>>> dates,
>>>>>>> and the like in XML Schema would be "more appropriate" for some
>>>>>>> languages/locales than others if given as plain text, what
>>>>>>> distinguishes them is that they are all machine readable and
>>>>>>> intended to
>>>> be read by machines for later processing.
>>>>>>> The display of values is a separate, local, concern for the data's
>>>>>>> consumer. This necessarily means choosing specific separators (such
>>>>>>> as decimal separators) over other, more localized values. Save for
>>>>>>> "free
>>>> text"
>>>>>>> (natural language) data, most data formats are locale neutral and
>>>>>>> these include things like JSON-LD, XML Schema, CSV, and so forth.
>>>>>>>
>>>>>>> Not every possible data structure or data value is, of course,
>>>>>>> covered fully. For example, in my day job (I work at Amazon), we
>>>>>>> have many different common measurement units defined internally. To
>>>>>>> transmit these in a locale-neutral manner, we need to construct our
>>>>>>> own data schemas and identifiers. There are profoundly many ways to
>>>>>>> measure shoes, dresses, auto parts, hats, drone propellers, and so
>>>>>>> forth. But it would be a nightmare to have to deal with localized
>>>> presentation formats on top of that.
>>>>>>> But there are pre-made standards for the basic data types and these
>>>>>>> are what are needed to build almost any data structure necessary 
>>>>>>> for
>>>>>>> global interchange of data.
>>>>>>>
>>>>>>> Does that make sense?
>>>>>>>
>>>>>>> Addison
>>>>>>>
>>>>>>> Addison Phillips
>>>>>>> Principal SDE, I18N Architect (Amazon) Chair (W3C I18N WG)
>>>>>>>
>>>>>>> Internationalization is not a feature.
>>>>>>> It is an architecture.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>>> From: Annette Greiner [mailto:amgreiner@lbl.gov]
>>>>>>>> Sent: Thursday, August 04, 2016 12:04 PM
>>>>>>>> To: ishida@w3.org; public-dwbp-comments@w3.org
>>>>>>>> Cc: www International <www-international@w3.org>
>>>>>>>> Subject: Re: [i18n review comment] BP3 should recommend
>>>>>>>> locale-neutral representation #187
>>>>>>>>
>>>>>>>> Hello on behalf of the DWBP WG,
>>>>>>>>
>>>>>>>> We're interested in pursuing this concept in our best practice
>>>>>>>> document, but we would like some clarification of the practice of
>>>>>>>> locale neutrality.
>>>>>>>> You
>>>>>>>> mention the variation across locales in decimal symbol, grouping
>>>>>>>> symbol, number of grouping digits, digit shapes, etc., and you 
>>>>>>>> give
>>>>>>>> an example of a locale-neutral data structure for monetary values.
>>>>>>>> But this structure alone does not appear to address differences in
>>>>>>>> decimal symbol, grouping symbol, number of grouping digits, or
>>>>>>>> digit shapes. It does provide a mechanism to separately specify 
>>>>>>>> the
>>>>>>>> units, and the example uses an ISO-4217 currency code, both of
>>>>>>>> which we agree are good ideas. Is there a broad standard (beyond
>>>>>>>> just monetary) for addressing the other symbol/representation
>>>>>>>> issues you raised that we can address briefly in our best 
>>>>>>>> practice?
>>>>>>>> Do you consider SI units consistent with a locale-neutral 
>>>>>>>> approach?
>>>>>>>> Is there a locale-neutral standard for representing decimal 
>>>>>>>> numbers
>>>>>>>> (perhaps using a period and no grouping, as in your example)?
>>>>>>>>
>>>>>>>> -Annette
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/22/16 5:32 AM, ishida@w3.org wrote:
>>>>>>>>
>>>>>>>>> [raised by aphillips]
>>>>>>>>>
>>>>>>>>> https://www.w3.org/TR/dwbp/#LocaleParametersMetadata
>>>>>>>>>
>>>>>>>>> Best practice #3 introduces itself as:
>>>>>>>>>
>>>>>>>>> Providing locale parameters helps humans and computer 
>>>>>>>>> applications
>>>>>>>>> to work accurately with things like dates, currencies and numbers
>>>>>>>>> that may look similar but have different meanings in different
>>>>>>>>> locales.
>>>>>>>>>
>>>>>>>>> But the actual best practice is to use **locale-neutral**
>>>>>>>>> representations that are interpreted/displayed to end-users in a
>>>>>>>>> locale-appropriate manner. For example, instead of storing the
>>>>>>>>> string "€2000.00", exchanging a data structure like the following
>>>>>>>>> is strongly
>>>>>>>>> preferred:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> "price" {
>>>>>>>>>      "value": 2000.00,
>>>>>>>>>      "currency": "EUR"
>>>>>>>>> }
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> The date examples given are all in xsd:date format, which is an
>>>>>>>>> excellent example of using a locale-neutral format.
>>>>>>>>>
>>>>>>>>> Many things are dependent on locale: decimal symbol, grouping
>>>>>>>>> symbol, number of grouping digits, digit shapes, etc. It's 
>>>>>>>>> because
>>>>>>>>> there can be wide variation (sometimes open to misinterpretation)
>>>>>>>>> that sending a locale neutral format is preferred for data 
>>>>>>>>> values.
>>>>>>>>> Note also btw that the position of the currency symbol is
>>>>>>>>> dependent on the locale. In France it would be normal to write
>>>> 2000.00 € rather than €2000.00.
>>>>>>>>> Same even when talking about USD when using $, ie. 2000.00 $.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>> Annette Greiner
>>>>>>>> NERSC Data and Analytics Services
>>>>>>>> Lawrence Berkeley National Laboratory
>>>>>>>>
>>>>>>>>
>>>>>> -- 
>>>>>> Annette Greiner
>>>>>> NERSC Data and Analytics Services
>>>>>> Lawrence Berkeley National Laboratory
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>>
>>>>
>>>> Phil Archer
>>>> W3C Data Activity Lead
>>>> http://www.w3.org/2013/data/
>>>>
>>>> http://philarcher.org
>>>> +44 (0)7887 767755
>>>> @philarcher1
>>
>

-- 
------------------------------------
Deirdre Lee, CEO & Founder
Derilinx - Linked & Open Data Solutions
  
Web:      www.derilinx.com
Email:    deirdre@derilinx.com
Address:  11/12 Baggot Court, Dublin 2, D02 F891
Tel:      +353 (0)1 254 4316
Mob:      +353 (0)87 417 2318
Linkedin: ie.linkedin.com/in/leedeirdre/
Twitter:  @deirdrelee

Received on Monday, 22 August 2016 12:48:41 UTC