- From: Jeni Tennison <jeni@theodi.org>
- Date: Tue, 27 May 2014 12:31:13 +0100
- To: public-csv-wg@w3.org
- Cc: "Tim Davies (Web Foundation)" <timdavies@webfoundation.org>, david.megginson@megginson.com
Some extra use cases re internationalisation of CSVs. Jeni ------------------------------------------------------ From: Tim Davies timdavies@webfoundation.org Reply: Tim Davies timdavies@webfoundation.org Date: 20 May 2014 at 23:36:13 To: Jeni Tennison jeni@theodi.org, david.megginson@megginson.com david.megginson@megginson.com Subject: Re: CSV use case > Hello Jeni, > > Good to hear from you. Yes, so there are two main cases and two approaches > here. One based on the work David Megginson is doing on Humanitarian > Exchange Language (I've copied David in so he can correct me when I > misrepresent their work...;) - and one based on the 360 Giving Data > Standard I worked on. > > > *Issue 1:*Tabular data needs to be created, read by and exchanged between > people speaking different languages. Many of these are basic spreadsheet > users who will find it far easier to use data with natural and clear > language in the column headings. Having the column headings in their own > language will make creating and interpreting the data a lot easier. > > > *Issue 2:* > Tabular data needs to be created that contains literal values in multiple > languages. For example, the name of a town in English, French and Arabic. > The total number of languages that the data might be expressed in cannot be > easily determined in advance, and it should be possible for a user to > introduce a new language variant of a column easily. > > *The HXL approach* > See https://groups.google.com/forum/#!topic/hxlproject/8cLoE5cqV1Y > > - A data dictionary is created with numerical codes equating to field > definitions > - Providing the column header contains the numerical code, all other > values in the column heading can be arbitrary (i.e. can be in plain > language of the template creators choice) > - A parser extracts just the code and uses this to interpret the meaning > of the column > - Language codes can be attached onto the end of column codes to > indicate a language variant. E.g. if 010 is 'Source description' then there > can an '010/en' column with 'Doctors without Borders' and an '010/fr' > column containing 'Medicine sans fronteirs' > > This had advantage of being robust to people messing around with column > titles (extra spaces etc.) as long as they don't mess with the ID. > > *The 360 Giving Approach* > > See http://threesixtygiving.github.io/standard/ > > As yet - not multilingual version of this is implemented - but the idea is > that: > > - The CSV serialisation is based on an underlying Ontology (available at > https://github.com/ThreeSixtyGiving/prototype-tools) which means there > is a URI for each column (the final part of which provides a > machine-readable column ID), and labels, which can be expressed in various > languages. > - When a version of the spreadsheet for humans is created, the column ID > is replaced with the English language label, or labels from some other > language. > - A conversion tool is created to map between IDs and labels. > > As yet a way to address to Issue 2 has not been proposed in this approach. > > I'm personally leaning more towards the HXL approach over the long-run, > though perhaps linked to an ontology with IDs for fields also rather than > just a data dictionary to support more idiomatically friendly JSON and XML > representations. > > > Let me know if this covers what you needed, or if write up in some other > style would be useful, > > Would also welcome any feedback on whether we're missing good ideas and > approaches from the wider CSV standardisation work that we should be > thinking about... > > All the best > > Tim > > > On Sun, May 18, 2014 at 5:28 PM, Jeni Tennison wrote: > > > Tim, > > > > I hope you’re well? > > > > When we met up a little while ago, you talked about a CSV-based format > > that you were putting together where you wanted the general format to be > > the same across languages, but wanted the headers to be different so that > > they were understandable to non-English-language-speakers. > > > > I wonder if you could write a little description of the issue and send me > > a couple of example files that show how that works, so that I can include > > them as a use case for the CSV WG? > > > > Thanks, > > > > Jeni > > -- > > Jeni Tennison, Technical Director theODI.org > > +44 (0) 7974 420 482 @JeniT > > > > > > > -- > -- > Tim Davies > Research Coordinator, Open Data Research Network > +44 7834 856 303 > @timdavies | @odrnetwork | www.opendataresearch.org > > *World Wide Web Foundation | **1110 Vermont Ave NW, Suite 500, Washington > DC 20005, USA** | www.webfoundation.org | > Twitter: @webfoundation* > -- Jeni Tennison, Technical Director theODI.org +44 (0) 7974 420 482 @JeniT
Received on Tuesday, 27 May 2014 11:31:45 UTC