- From: Phil Archer <phila@w3.org>
- Date: Wed, 24 Aug 2016 11:44:09 +0100
- To: Annette Greiner <amgreiner@lbl.gov>, "Phillips, Addison" <addison@lab126.com>, Deirdre Lee <deirdre@derilinx.com>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
- Cc: "ishida@w3.org" <ishida@w3.org>, "public-dwbp-comments@w3.org" <public-dwbp-comments@w3.org>, www International <www-international@w3.org>
Thanks Annette, As time is tight - I want to put the CR doc in place - I've gone ahead and responded to this as indicated inline below: On 23/08/2016 18:30, Annette Greiner wrote: > Hi folks, > > Sorry I haven't been able to jump in before now. Since this has been > changing a bunch, let me say that this comment is on the version at > http://w3c.github.io/dwbp/bp.html#dataFormats as of 9:37am PDT August 23. > > The "Why" still devotes more text to the metadata approach than to the > locale-neutral approach, though a little reshuffling would fix that. > Here's a suggested rewrite: > > "Data values that are machine-readable and not specific to any > particular language or culture are more durable and less open to > misinterpretation than values that use one of the many different > cultural representations. Things like dates, currencies and numbers may > look similar but have different meanings in different locales. For > example, the 'date' 4/7 can be read as 7th of April or the 4th of July > depending on where the data was created. Similarly, €2,000 is either two > thousand Euros or an over-precise representation of two Euros. By using > a locale-neutral format, systems avoid the need to establish specific > interchange rules that vary according to the language or location of the > user. When the data is already in a locale-specific format, making the > locale and language explicit by providing locale > <http://w3c.github.io/dwbp/bp.html#locale_parameter> parameters allows > users to determine how readily they can work with the data and may > enable automated translation services." No problem AFAICT - text changed to this. I very much doubt Addsion will object. > > I also don't believe this is true: "Most common data representations are > locale neutral." I would say most common data serialization formats are > locale neutral, but it seems to me quite common to see them used in > locale-specific ways. OK, text changed, Pull request made and merged. > > Finally, the example marked prominently as Example 13 looks like the > primary suggestion for implementing the BP, which it isn't anymore. I > think the 2000 Euro example should be at least as prominently marked. I sympathise but I'm going to have to leave that to the editors. It can be done by simply adding class="example" to the <pre> element. But, doing that then means that the example numbers will be out of step with the BP numbers from that that point on, which I *think* editors have been anxious to avoid? Berna, Newton, Carol - can you look at this today? Cheers Phil > > -Annette > > > On 8/23/16 7:11 AM, Phillips, Addison wrote: >> Hi Phil, >> >> Thanks. This looks good to me. >> >> Addison >> >>> -----Original Message----- >>> From: Phil Archer [mailto:phila@w3.org] >>> Sent: Tuesday, August 23, 2016 3:29 AM >>> To: Phillips, Addison <addison@lab126.com>; Deirdre Lee >>> <deirdre@derilinx.com>; Bernadette Farias Lóscio <bfl@cin.ufpe.br>; >>> Annette Greiner <amgreiner@lbl.gov> >>> Cc: ishida@w3.org; public-dwbp-comments@w3.org; www International >>> <www-international@w3.org> >>> Subject: Re: [i18n review comment] BP3 should recommend locale-neutral >>> representation #187 >>> >>> Thanks again Addison, >>> >>> Pls see below. >>> >>> On 22/08/2016 18:36, Phillips, Addison wrote: >>>> Hi Phil, >>>> >>>> This looks good. A few comments. >>>> >>>> 1. Rather than providing your own definition for 'locale', you might >>>> make >>> use of the one we provide in LTLI [1]. >>> >>> Done >>> http://w3c.github.io/dwbp/bp.html#locale_parameter >>> >>>> 2. The "why" is still missing something. I would suggest adding a >>>> new first >>> paragraph explaining locale-neutral first. Something like: >>>> -- >>>> Data values that are machine-readable and not specific to any >>>> particular >>> language or culture are more durable and less open to >>> misinterpretation than >>> values that use one of the many different cultural representations. >>> By using a >>> locale-neutral format, systems avoid the need to establish specific >>> interchange rules that vary according to the language or location of >>> the user. >>>> When the data is already in a locale-specific format, providing locale >>>> parameters... <rest of existing text> >>> >>> Done, exactly as you suggest >>> http://w3c.github.io/dwbp/bp.html#LocaleParametersMetadata >>> >>> With luck... the doc gets a green light from you? >>> >>> Thanks again >>> >>> Phil. >>> >>>> -- >>>> >>>> Hope that helps, >>>> >>>> Addison >>>> >>>> [1] https://www.w3.org/TR/ltli/#locale >>>> >>>>> -----Original Message----- >>>>> From: Phil Archer [mailto:phila@w3.org] >>>>> Sent: Monday, August 22, 2016 2:34 AM >>>>> To: Deirdre Lee <deirdre@derilinx.com>; Phillips, Addison >>>>> <addison@lab126.com>; Bernadette Farias Lóscio <bfl@cin.ufpe.br>; >>>>> Annette Greiner <amgreiner@lbl.gov> >>>>> Cc: ishida@w3.org; public-dwbp-comments@w3.org; www International >>>>> <www-international@w3.org> >>>>> Subject: Re: [i18n review comment] BP3 should recommend >>>>> locale-neutral representation #187 >>>>> >>>>> Dear all, >>>>> >>>>> I have taken further steps on this. The result can be seen at >>>>> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata >>>>> >>>>> 1. Addision's text used more or less verbatim; 1a. taken account of >>>>> Annette's suggestion; 1b. replaced inline links to BCP47 and CLDR with >>> references 2. >>>>> title of the BP changed to Use locale-neutral data representations 3. >>>>> moved to Data Formats section as resolved in WG meeting on Friday; 4. >>>>> added R- FormatMachineRead to list of evidence and thereby updated >>>>> the UCR cross matching; 5. updated the Challenges SVG diagram; 6. >>>>> updated my Pull request. >>>>> >>>>> NB, I *retained* the old ID for the BP so that any links to >>>>> #LocaleParametersMetadata will still work. I know there are some of >>>>> these, for example, in the Share-PSI project. >>>>> >>>>> HTH >>>>> >>>>> Phil. >>>>> >>>>> >>>>> >>>>> On 22/08/2016 08:52, Deirdre Lee wrote: >>>>>> HI, >>>>>> >>>>>> Thank you for your comments Addison. I think they make sense and >>>>>> should be straight-forward to incorporate. >>>>>> >>>>>> The title of the BP should probably also be updated to something >>>>>> like 'Provide locale-neutral data' >>>>>> >>>>>> Phil and DWBP editors, in Friday's meeting we also agreed to move >>>>>> BP3 to the Data Formats section from the Metadata section, which >>>>>> would make it BP14, right? >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> Deirdre >>>>>> >>>>>> >>>>>> >>>>>> On 19/08/2016 17:39, Phillips, Addison wrote: >>>>>>> Hi Phil, >>>>>>> >>>>>>> Thanks for starting on this. I think the pull request is a good >>>>>>> start. >>>>>>> I have some comments on it. >>>>>>> >>>>>>> My main concern is that this BP is really backwards. It recommends >>>>>>> to "locale parameter metadata" and then says that the simplest way >>>>>>> to do this is to use locale-neutral formats. The recommendation >>>>>>> should be more like "use locale-neutral formats or provide >>>>>>> locale/language information where that's not possible". The pull >>>>>>> request captures the use of locale-neutral, but doesn't really >>>>>>> explain about when to provide locale and language information. >>>>>>> >>>>>>> I would change this: >>>>>>> >>>>>>> -- >>>>>>> <p class="practicedesc">Provide metadata about locale parameters >>>>>>> (date, time, and number formats, language).</p> >>>>>>> -- >>>>>>> >>>>>>> To say: >>>>>>> >>>>>>> -- >>>>>>> <p class="practicedesc">Use locale-neutral data structures and >>>>>>> values, or, where that is not possible, provide metadata about the >>>>>>> locale used by data values.</p> >>>>>>> -- >>>>>>> >>>>>>> I would change: >>>>>>> >>>>>>> -- >>>>>>> <p>The simplest method is to use local-neutral representations of >>>>>>> the actual data, and then add metadata to provide relevant locale >>>>>>> information. For example, rather than storing "€2000.00" as a >>>>>>> string, it's strongly preferred to exchange a data structure such >>>>>>> as:</p> >>>>>>> -- >>>>>>> >>>>>>> To say: >>>>>>> >>>>>>> -- >>>>>>> <p>Most common data representations are locale neutral. For >>>>>>> example, XML Schema types such as xsd:integer and xsd: date are >>>>>>> intended for locale-neutral data interchange. Using locale-neutral >>>>>>> representations allows the data values to be processed accurately >>>>>>> without complex parsing or misinterpretation and also allows the >>>>>>> data to be presented in the format most comfortable for the >>>>>>> consumer of the data. For example, rather than storing "€2000,00" >>>>>>> as a string, it's strongly preferred to exchange a data structure >>>>>>> such as:</p> >>>>>>> -- >>>>>>> >>>>>>> Also, note the misspelling of "locale-neutral" in the pull request. >>>>>>> >>>>>>> I would then go on to add some text about when locale parameters >>>>>>> are needed. Something like: >>>>>>> >>>>>>> -- >>>>>>> Some datasets contain values that are not or cannot be rendered >>>>>>> into a locale-neutral format. This is particularly true of any >>>>>>> natural language text values. For each data field that can contain >>>>>>> locale affected or natural language text, there should be an >>>>>>> associated language tag used to indicate the language and locale >>>>>>> of the >>> data. >>>>>>> This locale information can be used in parsing the data or to >>>>>>> ensure proper presentation and processing of the value by the >>> consumer. >>>>>>> -- >>>>>>> >>>>>>> (Sorry for not generating a pull request of my own) >>>>>>> >>>>>>> Addison >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Phil Archer [mailto:phila@w3.org] >>>>>>>> Sent: Friday, August 19, 2016 8:37 AM >>>>>>>> To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>; Annette Greiner >>>>>>>> <amgreiner@lbl.gov> >>>>>>>> Cc: Phillips, Addison <addison@lab126.com>; ishida@w3.org; >>>>>>>> public-dwbp- comments@w3.org; www International >>>>>>>> <www-international@w3.org> >>>>>>>> Subject: Re: [i18n review comment] BP3 should recommend >>>>>>>> locale-neutral representation #187 >>>>>>>> >>>>>>>> I took an action on today's call to try and address this in BP3. >>>>>>>> You can see the results at >>>>>>>> >>> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata >>>>>>>> This uses some of Addison's text directly and highlights the value >>>>>>>> of the xsd datatypes - but retains enough of the original BP for >>>>>>>> it to be an amendment rather than a whole new one - I hope. >>>>>>>> >>>>>>>> This addresses most of the resolution taken today [1] but I have >>>>>>>> not moved the BP to the formats section. I leave that to the >>>>>>>> editors who may want to make further changes - or argue for it to >>>>>>>> be left where it is, or add references from the formats section >>>>>>>> or, or, >>> or... >>>>>>>> I've created the Pull Request https://github.com/w3c/dwbp/pull/447 >>>>>>>> >>>>>>>> Phil. >>>>>>>> >>>>>>>> [1] https://www.w3.org/2016/08/19-dwbp-minutes#resolution02 >>>>>>>> >>>>>>>> On 15/08/2016 17:28, Bernadette Farias Lóscio wrote: >>>>>>>>> Dear Ishida, >>>>>>>>> >>>>>>>>> This comment [1] is still under discussion [4] and we'd like to >>>>>>>>> ask your opinion about two of our proposals: >>>>>>>>> >>>>>>>>> 1. to include locale-neutral representation ideas as part of BP3 >>>>>>>>> [2], or 2. to include a paragraph at the introduction of Section >>>>>>>>> 8.8 Data Formats [3] to discuss the relevance of having >>>>>>>>> local-neutral representations. >>>>>>>>> >>>>>>>>> We also discussed the proposal of having a new BP and we agreed >>>>>>>>> that we won't have a lot of time for a broader review of the new >>>>>>>>> BP and to collect feedback from the community. >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> DWBP editors >>>>>>>>> >>>>>>>>> [1] https://lists.w3.org/Archives/Public/public-dwbp-comments/ >>>>>>>>> 2016Jul/0028.html >>>>>>>>> >>>>> [2]http://agreiner.github.io/dwbp/bp.html#LocaleParametersMetadata >>>>>>>>> [3] https://www.w3.org/TR/dwbp/#dataFormats >>>>>>>>> [4] >>>>>>>>> https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Aug/0009. >>>>>>>>> ht >>>>>>>>> ml >>>>>>>>> >>>>>>>>> >>>>>>>>> 2016-08-04 23:26 GMT+02:00 Annette Greiner <amgreiner@lbl.gov>: >>>>>>>>> >>>>>>>>>> Hi Addison, >>>>>>>>>> >>>>>>>>>> Thanks for your response, and it does make sense. I think what I >>>>>>>>>> am still missing is whether there is guidance we can point to as >>>>>>>>>> to how to represent the "locale-neutral" data so that it can >>>>>>>>>> most easily be made locale specific by existing tools. You >>>>>>>>>> mention "pre-made standards for the basic data types". Is there >>>>>>>>>> a recommended list we could >>>>>>>> reference? >>>>>>>>>> Thanks for your help! >>>>>>>>>> -Annette >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 8/4/16 12:31 PM, Phillips, Addison wrote: >>>>>>>>>> >>>>>>>>>>> Hi Annette, >>>>>>>>>>> >>>>>>>>>>> Thanks for the note. This is a personal reply not on behalf of >>>>>>>>>>> the WG. >>>>>>>>>>> >>>>>>>>>>> Locale neutral formats are quite common on the Web and the >>>>>>>>>>> Internet in general. One familiar format referenced by your >>>>>>>>>>> document, for example, is XML Schema. While the >>> representations >>>>>>>>>>> of numbers, dates, and the like in XML Schema would be "more >>>>>>>>>>> appropriate" for some languages/locales than others if given as >>>>>>>>>>> plain text, what distinguishes them is that they are all >>>>>>>>>>> machine readable and intended to >>>>>>>> be read by machines for later processing. >>>>>>>>>>> The display of values is a separate, local, concern for the >>>>>>>>>>> data's consumer. This necessarily means choosing specific >>>>>>>>>>> separators (such as decimal separators) over other, more >>>>>>>>>>> localized values. Save for "free >>>>>>>> text" >>>>>>>>>>> (natural language) data, most data formats are locale neutral >>>>>>>>>>> and these include things like JSON-LD, XML Schema, CSV, and so >>> forth. >>>>>>>>>>> Not every possible data structure or data value is, of course, >>>>>>>>>>> covered fully. For example, in my day job (I work at Amazon), >>>>>>>>>>> we have many different common measurement units defined >>> internally. >>>>>>>>>>> To transmit these in a locale-neutral manner, we need to >>>>>>>>>>> construct our own data schemas and identifiers. There are >>>>>>>>>>> profoundly many ways to measure shoes, dresses, auto parts, >>>>>>>>>>> hats, drone propellers, and so forth. But it would be a >>>>>>>>>>> nightmare to have to deal with localized >>>>>>>> presentation formats on top of that. >>>>>>>>>>> But there are pre-made standards for the basic data types and >>>>>>>>>>> these are what are needed to build almost any data structure >>>>>>>>>>> necessary for global interchange of data. >>>>>>>>>>> >>>>>>>>>>> Does that make sense? >>>>>>>>>>> >>>>>>>>>>> Addison >>>>>>>>>>> >>>>>>>>>>> Addison Phillips >>>>>>>>>>> Principal SDE, I18N Architect (Amazon) Chair (W3C I18N WG) >>>>>>>>>>> >>>>>>>>>>> Internationalization is not a feature. >>>>>>>>>>> It is an architecture. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Annette Greiner [mailto:amgreiner@lbl.gov] >>>>>>>>>>>> Sent: Thursday, August 04, 2016 12:04 PM >>>>>>>>>>>> To: ishida@w3.org; public-dwbp-comments@w3.org >>>>>>>>>>>> Cc: www International <www-international@w3.org> >>>>>>>>>>>> Subject: Re: [i18n review comment] BP3 should recommend >>>>>>>>>>>> locale-neutral representation #187 >>>>>>>>>>>> >>>>>>>>>>>> Hello on behalf of the DWBP WG, >>>>>>>>>>>> >>>>>>>>>>>> We're interested in pursuing this concept in our best practice >>>>>>>>>>>> document, but we would like some clarification of the practice >>>>>>>>>>>> of locale neutrality. >>>>>>>>>>>> You >>>>>>>>>>>> mention the variation across locales in decimal symbol, >>>>>>>>>>>> grouping symbol, number of grouping digits, digit shapes, >>>>>>>>>>>> etc., and you give an example of a locale-neutral data >>>>>>>>>>>> structure for monetary >>>>> values. >>>>>>>>>>>> But this structure alone does not appear to address >>>>>>>>>>>> differences in decimal symbol, grouping symbol, number of >>>>>>>>>>>> grouping digits, or digit shapes. It does provide a mechanism >>>>>>>>>>>> to separately specify the units, and the example uses an >>>>>>>>>>>> ISO-4217 currency code, both of which we agree are good ideas. >>>>>>>>>>>> Is there a broad standard (beyond just monetary) for >>>>>>>>>>>> addressing the other symbol/representation issues you raised >>>>>>>>>>>> that we can address >>>>> briefly in our best practice? >>>>>>>>>>>> Do you consider SI units consistent with a locale-neutral >>> approach? >>>>>>>>>>>> Is there a locale-neutral standard for representing decimal >>>>>>>>>>>> numbers (perhaps using a period and no grouping, as in your >>>>> example)? >>>>>>>>>>>> -Annette >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 7/22/16 5:32 AM, ishida@w3.org wrote: >>>>>>>>>>>> >>>>>>>>>>>>> [raised by aphillips] >>>>>>>>>>>>> >>>>>>>>>>>>> https://www.w3.org/TR/dwbp/#LocaleParametersMetadata >>>>>>>>>>>>> >>>>>>>>>>>>> Best practice #3 introduces itself as: >>>>>>>>>>>>> >>>>>>>>>>>>> Providing locale parameters helps humans and computer >>>>>>>>>>>>> applications to work accurately with things like dates, >>>>>>>>>>>>> currencies and numbers that may look similar but have >>>>>>>>>>>>> different meanings in different locales. >>>>>>>>>>>>> >>>>>>>>>>>>> But the actual best practice is to use **locale-neutral** >>>>>>>>>>>>> representations that are interpreted/displayed to end-users >>>>>>>>>>>>> in a locale-appropriate manner. For example, instead of >>>>>>>>>>>>> storing the string "€2000.00", exchanging a data structure >>>>>>>>>>>>> like the following is strongly >>>>>>>>>>>>> preferred: >>>>>>>>>>>>> >>>>>>>>>>>>> ``` >>>>>>>>>>>>> "price" { >>>>>>>>>>>>> "value": 2000.00, >>>>>>>>>>>>> "currency": "EUR" >>>>>>>>>>>>> } >>>>>>>>>>>>> ``` >>>>>>>>>>>>> >>>>>>>>>>>>> The date examples given are all in xsd:date format, which is >>>>>>>>>>>>> an excellent example of using a locale-neutral format. >>>>>>>>>>>>> >>>>>>>>>>>>> Many things are dependent on locale: decimal symbol, >>> grouping >>>>>>>>>>>>> symbol, number of grouping digits, digit shapes, etc. It's >>>>>>>>>>>>> because there can be wide variation (sometimes open to >>>>>>>>>>>>> misinterpretation) that sending a locale neutral format is >>>>> preferred for data values. >>>>>>>>>>>>> Note also btw that the position of the currency symbol is >>>>>>>>>>>>> dependent on the locale. In France it would be normal to >>>>>>>>>>>>> write >>>>>>>> 2000.00 € rather than €2000.00. >>>>>>>>>>>>> Same even when talking about USD when using $, ie. 2000.00 $. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>> Annette Greiner >>>>>>>>>>>> NERSC Data and Analytics Services Lawrence Berkeley National >>>>>>>>>>>> Laboratory >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Annette Greiner >>>>>>>>>> NERSC Data and Analytics Services Lawrence Berkeley National >>>>>>>>>> Laboratory >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> Phil Archer >>>>>>>> W3C Data Activity Lead >>>>>>>> http://www.w3.org/2013/data/ >>>>>>>> >>>>>>>> http://philarcher.org >>>>>>>> +44 (0)7887 767755 >>>>>>>> @philarcher1 >>>>> -- >>>>> >>>>> >>>>> Phil Archer >>>>> W3C Data Activity Lead >>>>> http://www.w3.org/2013/data/ >>>>> >>>>> http://philarcher.org >>>>> +44 (0)7887 767755 >>>>> @philarcher1 >>> -- >>> >>> >>> Phil Archer >>> W3C Data Activity Lead >>> http://www.w3.org/2013/data/ >>> >>> http://philarcher.org >>> +44 (0)7887 767755 >>> @philarcher1 > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Wednesday, 24 August 2016 10:41:36 UTC