- From: Annette Greiner <amgreiner@lbl.gov>
- Date: Fri, 26 Aug 2016 10:31:30 -0700
- To: "Phillips, Addison" <addison@lab126.com>
- Cc: Phil Archer <phila@w3.org>, Deirdre Lee <deirdre@derilinx.com>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, "ishida@w3.org" <ishida@w3.org>, "public-dwbp-comments@w3.org" <public-dwbp-comments@w3.org>, www International <www-international@w3.org>
Addison, or anyone in i18n that knows, I have a question about how to make data locale neutral when all the values are of the same format. It seems a hard sell to tell people to expand every datetime value into a value and a format when the format is always the same. For example, if I have taken a terabyte worth of data (not at all uncommon where I work) that consists of pairs of a datetime plus a sensor reading, and the time is always in UNIX format (seconds since January 1, 1970), is it still considered locale-neutral if I simply indicate in the column metadata that the column holds UNIX time values? UNIX time, sensor reading (mV) 1471995721,4.7 1471995731,7.5 1471995721,6.2 If not, is there a practical way to make it locale neutral without repeating the format for each value? I’m concerned that, for scientific datasets especially, it makes little sense to inflate the size of the dataset with repeated information. For large data, that becomes impractical, which tempts me to suggest that we say in the BP that locale parameters should be used only when a locale-neutral representation is not *practical* (rather than not *possible*), unless my example above would qualify as locale neutral. -Annette > On Aug 22, 2016, at 10:36 AM, Phillips, Addison <addison@lab126.com> wrote: > > Hi Phil, > > This looks good. A few comments. > > 1. Rather than providing your own definition for 'locale', you might make use of the one we provide in LTLI [1]. > > 2. The "why" is still missing something. I would suggest adding a new first paragraph explaining locale-neutral first. Something like: > > -- > Data values that are machine-readable and not specific to any particular language or culture are more durable and less open to misinterpretation than values that use one of the many different cultural representations. By using a locale-neutral format, systems avoid the need to establish specific interchange rules that vary according to the language or location of the user. > > When the data is already in a locale-specific format, providing locale parameters... <rest of existing text> > -- > > Hope that helps, > > Addison > > [1] https://www.w3.org/TR/ltli/#locale > >> -----Original Message----- >> From: Phil Archer [mailto:phila@w3.org] >> Sent: Monday, August 22, 2016 2:34 AM >> To: Deirdre Lee <deirdre@derilinx.com>; Phillips, Addison >> <addison@lab126.com>; Bernadette Farias Lóscio <bfl@cin.ufpe.br>; >> Annette Greiner <amgreiner@lbl.gov> >> Cc: ishida@w3.org; public-dwbp-comments@w3.org; www International >> <www-international@w3.org> >> Subject: Re: [i18n review comment] BP3 should recommend locale-neutral >> representation #187 >> >> Dear all, >> >> I have taken further steps on this. The result can be seen at >> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata >> >> 1. Addision's text used more or less verbatim; 1a. taken account of Annette's >> suggestion; 1b. replaced inline links to BCP47 and CLDR with references 2. >> title of the BP changed to Use locale-neutral data representations 3. moved >> to Data Formats section as resolved in WG meeting on Friday; 4. added R- >> FormatMachineRead to list of evidence and thereby updated the UCR cross >> matching; 5. updated the Challenges SVG diagram; 6. updated my Pull >> request. >> >> NB, I *retained* the old ID for the BP so that any links to >> #LocaleParametersMetadata will still work. I know there are some of these, >> for example, in the Share-PSI project. >> >> HTH >> >> Phil. >> >> >> >> On 22/08/2016 08:52, Deirdre Lee wrote: >>> HI, >>> >>> Thank you for your comments Addison. I think they make sense and >>> should be straight-forward to incorporate. >>> >>> The title of the BP should probably also be updated to something like >>> 'Provide locale-neutral data' >>> >>> Phil and DWBP editors, in Friday's meeting we also agreed to move BP3 >>> to the Data Formats section from the Metadata section, which would >>> make it BP14, right? >>> >>> Kind regards, >>> >>> Deirdre >>> >>> >>> >>> On 19/08/2016 17:39, Phillips, Addison wrote: >>>> Hi Phil, >>>> >>>> Thanks for starting on this. I think the pull request is a good start. >>>> I have some comments on it. >>>> >>>> My main concern is that this BP is really backwards. It recommends to >>>> "locale parameter metadata" and then says that the simplest way to do >>>> this is to use locale-neutral formats. The recommendation should be >>>> more like "use locale-neutral formats or provide locale/language >>>> information where that's not possible". The pull request captures the >>>> use of locale-neutral, but doesn't really explain about when to >>>> provide locale and language information. >>>> >>>> I would change this: >>>> >>>> -- >>>> <p class="practicedesc">Provide metadata about locale parameters >>>> (date, time, and number formats, language).</p> >>>> -- >>>> >>>> To say: >>>> >>>> -- >>>> <p class="practicedesc">Use locale-neutral data structures and >>>> values, or, where that is not possible, provide metadata about the >>>> locale used by data values.</p> >>>> -- >>>> >>>> I would change: >>>> >>>> -- >>>> <p>The simplest method is to use local-neutral representations of the >>>> actual data, and then add metadata to provide relevant locale >>>> information. For example, rather than storing "€2000.00" as a string, >>>> it's strongly preferred to exchange a data structure such as:</p> >>>> -- >>>> >>>> To say: >>>> >>>> -- >>>> <p>Most common data representations are locale neutral. For example, >>>> XML Schema types such as xsd:integer and xsd: date are intended for >>>> locale-neutral data interchange. Using locale-neutral representations >>>> allows the data values to be processed accurately without complex >>>> parsing or misinterpretation and also allows the data to be presented >>>> in the format most comfortable for the consumer of the data. For >>>> example, rather than storing "€2000,00" as a string, it's strongly >>>> preferred to exchange a data structure such as:</p> >>>> -- >>>> >>>> Also, note the misspelling of "locale-neutral" in the pull request. >>>> >>>> I would then go on to add some text about when locale parameters are >>>> needed. Something like: >>>> >>>> -- >>>> Some datasets contain values that are not or cannot be rendered into >>>> a locale-neutral format. This is particularly true of any natural >>>> language text values. For each data field that can contain locale >>>> affected or natural language text, there should be an associated >>>> language tag used to indicate the language and locale of the data. >>>> This locale information can be used in parsing the data or to ensure >>>> proper presentation and processing of the value by the consumer. >>>> -- >>>> >>>> (Sorry for not generating a pull request of my own) >>>> >>>> Addison >>>> >>>>> -----Original Message----- >>>>> From: Phil Archer [mailto:phila@w3.org] >>>>> Sent: Friday, August 19, 2016 8:37 AM >>>>> To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>; Annette Greiner >>>>> <amgreiner@lbl.gov> >>>>> Cc: Phillips, Addison <addison@lab126.com>; ishida@w3.org; >>>>> public-dwbp- comments@w3.org; www International >>>>> <www-international@w3.org> >>>>> Subject: Re: [i18n review comment] BP3 should recommend >>>>> locale-neutral representation #187 >>>>> >>>>> I took an action on today's call to try and address this in BP3. You >>>>> can see the results at >>>>> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata >>>>> >>>>> This uses some of Addison's text directly and highlights the value >>>>> of the xsd datatypes - but retains enough of the original BP for it >>>>> to be an amendment rather than a whole new one - I hope. >>>>> >>>>> This addresses most of the resolution taken today [1] but I have not >>>>> moved the BP to the formats section. I leave that to the editors who >>>>> may want to make further changes - or argue for it to be left where >>>>> it is, or add references from the formats section or, or, or... >>>>> >>>>> I've created the Pull Request https://github.com/w3c/dwbp/pull/447 >>>>> >>>>> Phil. >>>>> >>>>> [1] https://www.w3.org/2016/08/19-dwbp-minutes#resolution02 >>>>> >>>>> On 15/08/2016 17:28, Bernadette Farias Lóscio wrote: >>>>>> Dear Ishida, >>>>>> >>>>>> This comment [1] is still under discussion [4] and we'd like to ask >>>>>> your opinion about two of our proposals: >>>>>> >>>>>> 1. to include locale-neutral representation ideas as part of BP3 >>>>>> [2], or 2. to include a paragraph at the introduction of Section >>>>>> 8.8 Data Formats [3] to discuss the relevance of having >>>>>> local-neutral representations. >>>>>> >>>>>> We also discussed the proposal of having a new BP and we agreed >>>>>> that we won't have a lot of time for a broader review of the new BP >>>>>> and to collect feedback from the community. >>>>>> >>>>>> Thanks a lot! >>>>>> DWBP editors >>>>>> >>>>>> [1] https://lists.w3.org/Archives/Public/public-dwbp-comments/ >>>>>> 2016Jul/0028.html >>>>>> >> [2]http://agreiner.github.io/dwbp/bp.html#LocaleParametersMetadata >>>>>> [3] https://www.w3.org/TR/dwbp/#dataFormats >>>>>> [4] >>>>>> https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Aug/0009.ht >>>>>> ml >>>>>> >>>>>> >>>>>> 2016-08-04 23:26 GMT+02:00 Annette Greiner <amgreiner@lbl.gov>: >>>>>> >>>>>>> Hi Addison, >>>>>>> >>>>>>> Thanks for your response, and it does make sense. I think what I >>>>>>> am still missing is whether there is guidance we can point to as >>>>>>> to how to represent the "locale-neutral" data so that it can most >>>>>>> easily be made locale specific by existing tools. You mention >>>>>>> "pre-made standards for the basic data types". Is there a >>>>>>> recommended list we could >>>>> reference? >>>>>>> Thanks for your help! >>>>>>> -Annette >>>>>>> >>>>>>> >>>>>>> On 8/4/16 12:31 PM, Phillips, Addison wrote: >>>>>>> >>>>>>>> Hi Annette, >>>>>>>> >>>>>>>> Thanks for the note. This is a personal reply not on behalf of >>>>>>>> the WG. >>>>>>>> >>>>>>>> Locale neutral formats are quite common on the Web and the >>>>>>>> Internet in general. One familiar format referenced by your >>>>>>>> document, for example, is XML Schema. While the representations >>>>>>>> of numbers, dates, and the like in XML Schema would be "more >>>>>>>> appropriate" for some languages/locales than others if given as >>>>>>>> plain text, what distinguishes them is that they are all machine >>>>>>>> readable and intended to >>>>> be read by machines for later processing. >>>>>>>> The display of values is a separate, local, concern for the >>>>>>>> data's consumer. This necessarily means choosing specific >>>>>>>> separators (such as decimal separators) over other, more >>>>>>>> localized values. Save for "free >>>>> text" >>>>>>>> (natural language) data, most data formats are locale neutral and >>>>>>>> these include things like JSON-LD, XML Schema, CSV, and so forth. >>>>>>>> >>>>>>>> Not every possible data structure or data value is, of course, >>>>>>>> covered fully. For example, in my day job (I work at Amazon), we >>>>>>>> have many different common measurement units defined internally. >>>>>>>> To transmit these in a locale-neutral manner, we need to >>>>>>>> construct our own data schemas and identifiers. There are >>>>>>>> profoundly many ways to measure shoes, dresses, auto parts, hats, >>>>>>>> drone propellers, and so forth. But it would be a nightmare to >>>>>>>> have to deal with localized >>>>> presentation formats on top of that. >>>>>>>> But there are pre-made standards for the basic data types and >>>>>>>> these are what are needed to build almost any data structure >>>>>>>> necessary for global interchange of data. >>>>>>>> >>>>>>>> Does that make sense? >>>>>>>> >>>>>>>> Addison >>>>>>>> >>>>>>>> Addison Phillips >>>>>>>> Principal SDE, I18N Architect (Amazon) Chair (W3C I18N WG) >>>>>>>> >>>>>>>> Internationalization is not a feature. >>>>>>>> It is an architecture. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>>> From: Annette Greiner [mailto:amgreiner@lbl.gov] >>>>>>>>> Sent: Thursday, August 04, 2016 12:04 PM >>>>>>>>> To: ishida@w3.org; public-dwbp-comments@w3.org >>>>>>>>> Cc: www International <www-international@w3.org> >>>>>>>>> Subject: Re: [i18n review comment] BP3 should recommend >>>>>>>>> locale-neutral representation #187 >>>>>>>>> >>>>>>>>> Hello on behalf of the DWBP WG, >>>>>>>>> >>>>>>>>> We're interested in pursuing this concept in our best practice >>>>>>>>> document, but we would like some clarification of the practice >>>>>>>>> of locale neutrality. >>>>>>>>> You >>>>>>>>> mention the variation across locales in decimal symbol, grouping >>>>>>>>> symbol, number of grouping digits, digit shapes, etc., and you >>>>>>>>> give an example of a locale-neutral data structure for monetary >> values. >>>>>>>>> But this structure alone does not appear to address differences >>>>>>>>> in decimal symbol, grouping symbol, number of grouping digits, >>>>>>>>> or digit shapes. It does provide a mechanism to separately >>>>>>>>> specify the units, and the example uses an ISO-4217 currency >>>>>>>>> code, both of which we agree are good ideas. Is there a broad >>>>>>>>> standard (beyond just monetary) for addressing the other >>>>>>>>> symbol/representation issues you raised that we can address >> briefly in our best practice? >>>>>>>>> Do you consider SI units consistent with a locale-neutral approach? >>>>>>>>> Is there a locale-neutral standard for representing decimal >>>>>>>>> numbers (perhaps using a period and no grouping, as in your >> example)? >>>>>>>>> >>>>>>>>> -Annette >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7/22/16 5:32 AM, ishida@w3.org wrote: >>>>>>>>> >>>>>>>>>> [raised by aphillips] >>>>>>>>>> >>>>>>>>>> https://www.w3.org/TR/dwbp/#LocaleParametersMetadata >>>>>>>>>> >>>>>>>>>> Best practice #3 introduces itself as: >>>>>>>>>> >>>>>>>>>> Providing locale parameters helps humans and computer >>>>>>>>>> applications to work accurately with things like dates, >>>>>>>>>> currencies and numbers that may look similar but have different >>>>>>>>>> meanings in different locales. >>>>>>>>>> >>>>>>>>>> But the actual best practice is to use **locale-neutral** >>>>>>>>>> representations that are interpreted/displayed to end-users in >>>>>>>>>> a locale-appropriate manner. For example, instead of storing >>>>>>>>>> the string "€2000.00", exchanging a data structure like the >>>>>>>>>> following is strongly >>>>>>>>>> preferred: >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> "price" { >>>>>>>>>> "value": 2000.00, >>>>>>>>>> "currency": "EUR" >>>>>>>>>> } >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> The date examples given are all in xsd:date format, which is an >>>>>>>>>> excellent example of using a locale-neutral format. >>>>>>>>>> >>>>>>>>>> Many things are dependent on locale: decimal symbol, grouping >>>>>>>>>> symbol, number of grouping digits, digit shapes, etc. It's >>>>>>>>>> because there can be wide variation (sometimes open to >>>>>>>>>> misinterpretation) that sending a locale neutral format is >> preferred for data values. >>>>>>>>>> Note also btw that the position of the currency symbol is >>>>>>>>>> dependent on the locale. In France it would be normal to write >>>>> 2000.00 € rather than €2000.00. >>>>>>>>>> Same even when talking about USD when using $, ie. 2000.00 $. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> Annette Greiner >>>>>>>>> NERSC Data and Analytics Services Lawrence Berkeley National >>>>>>>>> Laboratory >>>>>>>>> >>>>>>>>> >>>>>>> -- >>>>>>> Annette Greiner >>>>>>> NERSC Data and Analytics Services >>>>>>> Lawrence Berkeley National Laboratory >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> >>>>> >>>>> Phil Archer >>>>> W3C Data Activity Lead >>>>> http://www.w3.org/2013/data/ >>>>> >>>>> http://philarcher.org >>>>> +44 (0)7887 767755 >>>>> @philarcher1 >>> >> >> -- >> >> >> Phil Archer >> W3C Data Activity Lead >> http://www.w3.org/2013/data/ >> >> http://philarcher.org >> +44 (0)7887 767755 >> @philarcher1
Received on Friday, 26 August 2016 17:32:05 UTC