Re: [i18n review comment] BP3 should recommend locale-neutral representation #187

Thanks Annette,

As time is tight - I want to put the CR doc in place - I've gone ahead 
and responded to this as indicated inline below:

On 23/08/2016 18:30, Annette Greiner wrote:
> Hi folks,
>
> Sorry I haven't been able to jump in before now. Since this has been
> changing a bunch, let me say that this comment is on the version at
> http://w3c.github.io/dwbp/bp.html#dataFormats as of 9:37am PDT August 23.
>
> The "Why" still devotes more text to the metadata approach than to the
> locale-neutral approach, though a little reshuffling would fix that.
> Here's a suggested rewrite:
>
> "Data values that are machine-readable and not specific to any
> particular language or culture are more durable and less open to
> misinterpretation than values that use one of the many different
> cultural representations. Things like dates, currencies and numbers may
> look similar but have different meanings in different locales. For
> example, the 'date' 4/7 can be read as 7th of April or the 4th of July
> depending on where the data was created. Similarly, €2,000 is either two
> thousand Euros or an over-precise representation of two Euros. By using
> a locale-neutral format, systems avoid the need to establish specific
> interchange rules that vary according to the language or location of the
> user. When the data is already in a locale-specific format, making the
> locale and language explicit by providing locale
> <http://w3c.github.io/dwbp/bp.html#locale_parameter> parameters allows
> users to determine how readily they can work with the data and may
> enable automated translation services."

No problem AFAICT - text changed to this. I very much doubt Addsion will 
object.

>
> I also don't believe this is true: "Most common data representations are
> locale neutral." I would say most common data serialization formats are
> locale neutral, but it seems to me quite common to see them used in
> locale-specific ways.

OK, text changed, Pull request made and merged.

>
> Finally, the example marked prominently as Example 13 looks like the
> primary suggestion for implementing the BP, which it isn't anymore. I
> think the 2000 Euro example should be at least as prominently marked.

I sympathise but I'm going to have to leave that to the editors. It can 
be done by simply adding class="example" to the <pre> element. But, 
doing that then means that the example numbers will be out of step with 
the BP numbers from that that point on, which I *think* editors have 
been anxious to avoid?

Berna, Newton, Carol - can you look at this today?

Cheers

Phil

>
> -Annette
>
>
> On 8/23/16 7:11 AM, Phillips, Addison wrote:
>> Hi Phil,
>>
>> Thanks. This looks good to me.
>>
>> Addison
>>
>>> -----Original Message-----
>>> From: Phil Archer [mailto:phila@w3.org]
>>> Sent: Tuesday, August 23, 2016 3:29 AM
>>> To: Phillips, Addison <addison@lab126.com>; Deirdre Lee
>>> <deirdre@derilinx.com>; Bernadette Farias Lóscio <bfl@cin.ufpe.br>;
>>> Annette Greiner <amgreiner@lbl.gov>
>>> Cc: ishida@w3.org; public-dwbp-comments@w3.org; www International
>>> <www-international@w3.org>
>>> Subject: Re: [i18n review comment] BP3 should recommend locale-neutral
>>> representation #187
>>>
>>> Thanks again Addison,
>>>
>>> Pls see below.
>>>
>>> On 22/08/2016 18:36, Phillips, Addison wrote:
>>>> Hi Phil,
>>>>
>>>> This looks good. A few comments.
>>>>
>>>> 1. Rather than providing your own definition for 'locale', you might
>>>> make
>>> use of the one we provide in LTLI [1].
>>>
>>> Done
>>> http://w3c.github.io/dwbp/bp.html#locale_parameter
>>>
>>>> 2. The "why" is still missing something. I would suggest adding a
>>>> new first
>>> paragraph explaining locale-neutral first. Something like:
>>>> --
>>>> Data values that are machine-readable and not specific to any
>>>> particular
>>> language or culture are more durable and less open to
>>> misinterpretation than
>>> values that use one of the many different cultural representations.
>>> By using a
>>> locale-neutral format, systems avoid the need to establish specific
>>> interchange rules that vary according to the language or location of
>>> the user.
>>>> When the data is already in a locale-specific format, providing locale
>>>> parameters... <rest of existing text>
>>>
>>> Done, exactly as you suggest
>>> http://w3c.github.io/dwbp/bp.html#LocaleParametersMetadata
>>>
>>> With luck... the doc gets a green light from you?
>>>
>>> Thanks again
>>>
>>> Phil.
>>>
>>>> --
>>>>
>>>> Hope that helps,
>>>>
>>>> Addison
>>>>
>>>> [1] https://www.w3.org/TR/ltli/#locale
>>>>
>>>>> -----Original Message-----
>>>>> From: Phil Archer [mailto:phila@w3.org]
>>>>> Sent: Monday, August 22, 2016 2:34 AM
>>>>> To: Deirdre Lee <deirdre@derilinx.com>; Phillips, Addison
>>>>> <addison@lab126.com>; Bernadette Farias Lóscio <bfl@cin.ufpe.br>;
>>>>> Annette Greiner <amgreiner@lbl.gov>
>>>>> Cc: ishida@w3.org; public-dwbp-comments@w3.org; www International
>>>>> <www-international@w3.org>
>>>>> Subject: Re: [i18n review comment] BP3 should recommend
>>>>> locale-neutral representation #187
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I have taken further steps on this. The result can be seen at
>>>>> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata
>>>>>
>>>>> 1. Addision's text used more or less verbatim; 1a. taken account of
>>>>> Annette's suggestion; 1b. replaced inline links to BCP47 and CLDR with
>>> references 2.
>>>>> title of the BP changed to Use locale-neutral data representations 3.
>>>>> moved to Data Formats section as resolved in WG meeting on Friday; 4.
>>>>> added R- FormatMachineRead to list of evidence and thereby updated
>>>>> the UCR cross matching; 5. updated the Challenges SVG diagram; 6.
>>>>> updated my Pull request.
>>>>>
>>>>> NB, I *retained* the old ID for the BP so that any links to
>>>>> #LocaleParametersMetadata will still work. I know there are some of
>>>>> these, for example, in the Share-PSI project.
>>>>>
>>>>> HTH
>>>>>
>>>>> Phil.
>>>>>
>>>>>
>>>>>
>>>>> On 22/08/2016 08:52, Deirdre Lee wrote:
>>>>>> HI,
>>>>>>
>>>>>> Thank you for your comments Addison. I think they make sense and
>>>>>> should be straight-forward to incorporate.
>>>>>>
>>>>>> The title of the BP should probably also be updated to something
>>>>>> like 'Provide locale-neutral data'
>>>>>>
>>>>>> Phil and DWBP editors, in Friday's meeting we also agreed to move
>>>>>> BP3 to the Data Formats section from the Metadata section, which
>>>>>> would make it BP14, right?
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Deirdre
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 19/08/2016 17:39, Phillips, Addison wrote:
>>>>>>> Hi Phil,
>>>>>>>
>>>>>>> Thanks for starting on this. I think the pull request is a good
>>>>>>> start.
>>>>>>> I have some comments on it.
>>>>>>>
>>>>>>> My main concern is that this BP is really backwards. It recommends
>>>>>>> to "locale parameter metadata" and then says that the simplest way
>>>>>>> to do this is to use locale-neutral formats. The recommendation
>>>>>>> should be more like "use locale-neutral formats or provide
>>>>>>> locale/language information where that's not possible". The pull
>>>>>>> request captures the use of locale-neutral, but doesn't really
>>>>>>> explain about when to provide locale and language information.
>>>>>>>
>>>>>>> I would change this:
>>>>>>>
>>>>>>> --
>>>>>>> <p class="practicedesc">Provide metadata about locale parameters
>>>>>>> (date, time, and number formats, language).</p>
>>>>>>> --
>>>>>>>
>>>>>>> To say:
>>>>>>>
>>>>>>> --
>>>>>>> <p class="practicedesc">Use locale-neutral data structures and
>>>>>>> values, or, where that is not possible, provide metadata about the
>>>>>>> locale used by data values.</p>
>>>>>>> --
>>>>>>>
>>>>>>> I would change:
>>>>>>>
>>>>>>> --
>>>>>>> <p>The simplest method is to use local-neutral representations of
>>>>>>> the actual data, and then add metadata to provide relevant locale
>>>>>>> information. For example, rather than storing "€2000.00" as a
>>>>>>> string, it's strongly preferred to exchange a data structure such
>>>>>>> as:</p>
>>>>>>> --
>>>>>>>
>>>>>>> To say:
>>>>>>>
>>>>>>> --
>>>>>>> <p>Most common data representations are locale neutral. For
>>>>>>> example, XML Schema types such as xsd:integer and xsd: date are
>>>>>>> intended for locale-neutral data interchange. Using locale-neutral
>>>>>>> representations allows the data values to be processed accurately
>>>>>>> without complex parsing or misinterpretation and also allows the
>>>>>>> data to be presented in the format most comfortable for the
>>>>>>> consumer of the data. For example, rather than storing "€2000,00"
>>>>>>> as a string, it's strongly preferred to exchange a data structure
>>>>>>> such as:</p>
>>>>>>> --
>>>>>>>
>>>>>>> Also, note the misspelling of "locale-neutral" in the pull request.
>>>>>>>
>>>>>>> I would then go on to add some text about when locale parameters
>>>>>>> are needed. Something like:
>>>>>>>
>>>>>>> --
>>>>>>> Some datasets contain values that are not or cannot be rendered
>>>>>>> into a locale-neutral format. This is particularly true of any
>>>>>>> natural language text values. For each data field that can contain
>>>>>>> locale affected or natural language text, there should be an
>>>>>>> associated language tag used to indicate the language and locale
>>>>>>> of the
>>> data.
>>>>>>> This locale information can be used in parsing the data or to
>>>>>>> ensure proper presentation and processing of the value by the
>>> consumer.
>>>>>>> --
>>>>>>>
>>>>>>> (Sorry for not generating a pull request of my own)
>>>>>>>
>>>>>>> Addison
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Phil Archer [mailto:phila@w3.org]
>>>>>>>> Sent: Friday, August 19, 2016 8:37 AM
>>>>>>>> To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>; Annette Greiner
>>>>>>>> <amgreiner@lbl.gov>
>>>>>>>> Cc: Phillips, Addison <addison@lab126.com>; ishida@w3.org;
>>>>>>>> public-dwbp- comments@w3.org; www International
>>>>>>>> <www-international@w3.org>
>>>>>>>> Subject: Re: [i18n review comment] BP3 should recommend
>>>>>>>> locale-neutral representation #187
>>>>>>>>
>>>>>>>> I took an action on today's call to try and address this in BP3.
>>>>>>>> You can see the results at
>>>>>>>>
>>> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata
>>>>>>>> This uses some of Addison's text directly and highlights the value
>>>>>>>> of the xsd datatypes - but retains enough of the original BP for
>>>>>>>> it to be an amendment rather than a whole new one - I hope.
>>>>>>>>
>>>>>>>> This addresses most of the resolution taken today [1] but I have
>>>>>>>> not moved the BP to the formats section. I leave that to the
>>>>>>>> editors who may want to make further changes - or argue for it to
>>>>>>>> be left where it is, or add references from the formats section
>>>>>>>> or, or,
>>> or...
>>>>>>>> I've created the Pull Request https://github.com/w3c/dwbp/pull/447
>>>>>>>>
>>>>>>>> Phil.
>>>>>>>>
>>>>>>>> [1] https://www.w3.org/2016/08/19-dwbp-minutes#resolution02
>>>>>>>>
>>>>>>>> On 15/08/2016 17:28, Bernadette Farias Lóscio wrote:
>>>>>>>>> Dear Ishida,
>>>>>>>>>
>>>>>>>>> This comment [1] is still under discussion [4] and we'd like to
>>>>>>>>> ask your opinion about two of our proposals:
>>>>>>>>>
>>>>>>>>> 1. to include locale-neutral representation ideas as part of BP3
>>>>>>>>> [2], or 2. to include a paragraph at the introduction of Section
>>>>>>>>> 8.8 Data Formats [3] to discuss the relevance of having
>>>>>>>>> local-neutral representations.
>>>>>>>>>
>>>>>>>>> We also discussed the proposal of having a new BP and we agreed
>>>>>>>>> that we won't have a lot of time for a broader review of the new
>>>>>>>>> BP and to collect feedback from the community.
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>> DWBP editors
>>>>>>>>>
>>>>>>>>> [1] https://lists.w3.org/Archives/Public/public-dwbp-comments/
>>>>>>>>> 2016Jul/0028.html
>>>>>>>>>
>>>>> [2]http://agreiner.github.io/dwbp/bp.html#LocaleParametersMetadata
>>>>>>>>> [3] https://www.w3.org/TR/dwbp/#dataFormats
>>>>>>>>> [4]
>>>>>>>>> https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Aug/0009.
>>>>>>>>> ht
>>>>>>>>> ml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2016-08-04 23:26 GMT+02:00 Annette Greiner <amgreiner@lbl.gov>:
>>>>>>>>>
>>>>>>>>>> Hi Addison,
>>>>>>>>>>
>>>>>>>>>> Thanks for your response, and it does make sense. I think what I
>>>>>>>>>> am still missing is whether there is guidance we can point to as
>>>>>>>>>> to how to represent the "locale-neutral" data so that it can
>>>>>>>>>> most easily be made locale specific by existing tools. You
>>>>>>>>>> mention "pre-made standards for the basic data types". Is there
>>>>>>>>>> a recommended list we could
>>>>>>>> reference?
>>>>>>>>>> Thanks for your help!
>>>>>>>>>> -Annette
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/4/16 12:31 PM, Phillips, Addison wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Annette,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the note. This is a personal reply not on behalf of
>>>>>>>>>>> the WG.
>>>>>>>>>>>
>>>>>>>>>>> Locale neutral formats are quite common on the Web and the
>>>>>>>>>>> Internet in general. One familiar format referenced by your
>>>>>>>>>>> document, for example, is XML Schema. While the
>>> representations
>>>>>>>>>>> of numbers, dates, and the like in XML Schema would be "more
>>>>>>>>>>> appropriate" for some languages/locales than others if given as
>>>>>>>>>>> plain text, what distinguishes them is that they are all
>>>>>>>>>>> machine readable and intended to
>>>>>>>> be read by machines for later processing.
>>>>>>>>>>> The display of values is a separate, local, concern for the
>>>>>>>>>>> data's consumer. This necessarily means choosing specific
>>>>>>>>>>> separators (such as decimal separators) over other, more
>>>>>>>>>>> localized values. Save for "free
>>>>>>>> text"
>>>>>>>>>>> (natural language) data, most data formats are locale neutral
>>>>>>>>>>> and these include things like JSON-LD, XML Schema, CSV, and so
>>> forth.
>>>>>>>>>>> Not every possible data structure or data value is, of course,
>>>>>>>>>>> covered fully. For example, in my day job (I work at Amazon),
>>>>>>>>>>> we have many different common measurement units defined
>>> internally.
>>>>>>>>>>> To transmit these in a locale-neutral manner, we need to
>>>>>>>>>>> construct our own data schemas and identifiers. There are
>>>>>>>>>>> profoundly many ways to measure shoes, dresses, auto parts,
>>>>>>>>>>> hats, drone propellers, and so forth. But it would be a
>>>>>>>>>>> nightmare to have to deal with localized
>>>>>>>> presentation formats on top of that.
>>>>>>>>>>> But there are pre-made standards for the basic data types and
>>>>>>>>>>> these are what are needed to build almost any data structure
>>>>>>>>>>> necessary for global interchange of data.
>>>>>>>>>>>
>>>>>>>>>>> Does that make sense?
>>>>>>>>>>>
>>>>>>>>>>> Addison
>>>>>>>>>>>
>>>>>>>>>>> Addison Phillips
>>>>>>>>>>> Principal SDE, I18N Architect (Amazon) Chair (W3C I18N WG)
>>>>>>>>>>>
>>>>>>>>>>> Internationalization is not a feature.
>>>>>>>>>>> It is an architecture.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Annette Greiner [mailto:amgreiner@lbl.gov]
>>>>>>>>>>>> Sent: Thursday, August 04, 2016 12:04 PM
>>>>>>>>>>>> To: ishida@w3.org; public-dwbp-comments@w3.org
>>>>>>>>>>>> Cc: www International <www-international@w3.org>
>>>>>>>>>>>> Subject: Re: [i18n review comment] BP3 should recommend
>>>>>>>>>>>> locale-neutral representation #187
>>>>>>>>>>>>
>>>>>>>>>>>> Hello on behalf of the DWBP WG,
>>>>>>>>>>>>
>>>>>>>>>>>> We're interested in pursuing this concept in our best practice
>>>>>>>>>>>> document, but we would like some clarification of the practice
>>>>>>>>>>>> of locale neutrality.
>>>>>>>>>>>> You
>>>>>>>>>>>> mention the variation across locales in decimal symbol,
>>>>>>>>>>>> grouping symbol, number of grouping digits, digit shapes,
>>>>>>>>>>>> etc., and you give an example of a locale-neutral data
>>>>>>>>>>>> structure for monetary
>>>>> values.
>>>>>>>>>>>> But this structure alone does not appear to address
>>>>>>>>>>>> differences in decimal symbol, grouping symbol, number of
>>>>>>>>>>>> grouping digits, or digit shapes. It does provide a mechanism
>>>>>>>>>>>> to separately specify the units, and the example uses an
>>>>>>>>>>>> ISO-4217 currency code, both of which we agree are good ideas.
>>>>>>>>>>>> Is there a broad standard (beyond just monetary) for
>>>>>>>>>>>> addressing the other symbol/representation issues you raised
>>>>>>>>>>>> that we can address
>>>>> briefly in our best practice?
>>>>>>>>>>>> Do you consider SI units consistent with a locale-neutral
>>> approach?
>>>>>>>>>>>> Is there a locale-neutral standard for representing decimal
>>>>>>>>>>>> numbers (perhaps using a period and no grouping, as in your
>>>>> example)?
>>>>>>>>>>>> -Annette
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/22/16 5:32 AM, ishida@w3.org wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> [raised by aphillips]
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://www.w3.org/TR/dwbp/#LocaleParametersMetadata
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best practice #3 introduces itself as:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Providing locale parameters helps humans and computer
>>>>>>>>>>>>> applications to work accurately with things like dates,
>>>>>>>>>>>>> currencies and numbers that may look similar but have
>>>>>>>>>>>>> different meanings in different locales.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But the actual best practice is to use **locale-neutral**
>>>>>>>>>>>>> representations that are interpreted/displayed to end-users
>>>>>>>>>>>>> in a locale-appropriate manner. For example, instead of
>>>>>>>>>>>>> storing the string "€2000.00", exchanging a data structure
>>>>>>>>>>>>> like the following is strongly
>>>>>>>>>>>>> preferred:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> "price" {
>>>>>>>>>>>>>       "value": 2000.00,
>>>>>>>>>>>>>       "currency": "EUR"
>>>>>>>>>>>>> }
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> The date examples given are all in xsd:date format, which is
>>>>>>>>>>>>> an excellent example of using a locale-neutral format.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Many things are dependent on locale: decimal symbol,
>>> grouping
>>>>>>>>>>>>> symbol, number of grouping digits, digit shapes, etc. It's
>>>>>>>>>>>>> because there can be wide variation (sometimes open to
>>>>>>>>>>>>> misinterpretation) that sending a locale neutral format is
>>>>> preferred for data values.
>>>>>>>>>>>>> Note also btw that the position of the currency symbol is
>>>>>>>>>>>>> dependent on the locale. In France it would be normal to
>>>>>>>>>>>>> write
>>>>>>>> 2000.00 € rather than €2000.00.
>>>>>>>>>>>>> Same even when talking about USD when using $, ie. 2000.00 $.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Annette Greiner
>>>>>>>>>>>> NERSC Data and Analytics Services Lawrence Berkeley National
>>>>>>>>>>>> Laboratory
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Annette Greiner
>>>>>>>>>> NERSC Data and Analytics Services Lawrence Berkeley National
>>>>>>>>>> Laboratory
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> Phil Archer
>>>>>>>> W3C Data Activity Lead
>>>>>>>> http://www.w3.org/2013/data/
>>>>>>>>
>>>>>>>> http://philarcher.org
>>>>>>>> +44 (0)7887 767755
>>>>>>>> @philarcher1
>>>>> --
>>>>>
>>>>>
>>>>> Phil Archer
>>>>> W3C Data Activity Lead
>>>>> http://www.w3.org/2013/data/
>>>>>
>>>>> http://philarcher.org
>>>>> +44 (0)7887 767755
>>>>> @philarcher1
>>> --
>>>
>>>
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Wednesday, 24 August 2016 10:41:36 UTC