Fw: Re: CSV use case from Jeni Tennison on 2014-05-27 (public-csv-wg@w3.org from May 2014)

From: Jeni Tennison <jeni@theodi.org>
Date: Tue, 27 May 2014 12:31:13 +0100
To: public-csv-wg@w3.org
Cc: "Tim Davies (Web Foundation)" <timdavies@webfoundation.org>, david.megginson@megginson.com
Message-ID: <etPan.53847781.25a70bf7.154@jenit.local>
Some extra use cases re internationalisation of CSVs.

Jeni

------------------------------------------------------
From: Tim Davies timdavies@webfoundation.org
Reply: Tim Davies timdavies@webfoundation.org
Date: 20 May 2014 at 23:36:13
To: Jeni Tennison jeni@theodi.org, david.megginson@megginson.com david.megginson@megginson.com
Subject:  Re: CSV use case

> Hello Jeni,
>  
> Good to hear from you. Yes, so there are two main cases and two approaches
> here. One based on the work David Megginson is doing on Humanitarian
> Exchange Language (I've copied David in so he can correct me when I
> misrepresent their work...;) - and one based on the 360 Giving Data
> Standard I worked on.
>  
>  
> *Issue 1:*Tabular data needs to be created, read by and exchanged between
> people speaking different languages. Many of these are basic spreadsheet
> users who will find it far easier to use data with natural and clear
> language in the column headings. Having the column headings in their own
> language will make creating and interpreting the data a lot easier.
>  
>  
> *Issue 2:*
> Tabular data needs to be created that contains literal values in multiple
> languages. For example, the name of a town in English, French and Arabic.
> The total number of languages that the data might be expressed in cannot be
> easily determined in advance, and it should be possible for a user to
> introduce a new language variant of a column easily.
>  
> *The HXL approach*
> See https://groups.google.com/forum/#!topic/hxlproject/8cLoE5cqV1Y
>  
> - A data dictionary is created with numerical codes equating to field
> definitions
> - Providing the column header contains the numerical code, all other
> values in the column heading can be arbitrary (i.e. can be in plain
> language of the template creators choice)
> - A parser extracts just the code and uses this to interpret the meaning
> of the column
> - Language codes can be attached onto the end of column codes to
> indicate a language variant. E.g. if 010 is 'Source description' then there
> can an '010/en' column with 'Doctors without Borders' and an '010/fr'
> column containing 'Medicine sans fronteirs'
>  
> This had advantage of being robust to people messing around with column
> titles (extra spaces etc.) as long as they don't mess with the ID.
>  
> *The 360 Giving Approach*
>  
> See http://threesixtygiving.github.io/standard/
>  
> As yet - not multilingual version of this is implemented - but the idea is
> that:
>  
> - The CSV serialisation is based on an underlying Ontology (available at
> https://github.com/ThreeSixtyGiving/prototype-tools) which means there
> is a URI for each column (the final part of which provides a
> machine-readable column ID), and labels, which can be expressed in various
> languages.
> - When a version of the spreadsheet for humans is created, the column ID
> is replaced with the English language label, or labels from some other
> language.
> - A conversion tool is created to map between IDs and labels.
>  
> As yet a way to address to Issue 2 has not been proposed in this approach.
>  
> I'm personally leaning more towards the HXL approach over the long-run,
> though perhaps linked to an ontology with IDs for fields also rather than
> just a data dictionary to support more idiomatically friendly JSON and XML
> representations.
>  
>  
> Let me know if this covers what you needed, or if write up in some other
> style would be useful,
>  
> Would also welcome any feedback on whether we're missing good ideas and
> approaches from the wider CSV standardisation work that we should be
> thinking about...
>  
> All the best
>  
> Tim
>  
>  
> On Sun, May 18, 2014 at 5:28 PM, Jeni Tennison wrote:
>  
> > Tim,
> >
> > I hope you’re well?
> >
> > When we met up a little while ago, you talked about a CSV-based format
> > that you were putting together where you wanted the general format to be
> > the same across languages, but wanted the headers to be different so that
> > they were understandable to non-English-language-speakers.
> >
> > I wonder if you could write a little description of the issue and send me
> > a couple of example files that show how that works, so that I can include
> > them as a use case for the CSV WG?
> >
> > Thanks,
> >
> > Jeni
> > --
> > Jeni Tennison, Technical Director theODI.org
> > +44 (0) 7974 420 482 @JeniT
> >
> >
>  
>  
> --
> --
> Tim Davies
> Research Coordinator, Open Data Research Network
> +44 7834 856 303
> @timdavies | @odrnetwork | www.opendataresearch.org
>  
> *World Wide Web Foundation | **1110 Vermont Ave NW, Suite 500, Washington
> DC 20005, USA** | www.webfoundation.org |
> Twitter: @webfoundation*
>  

--  
Jeni Tennison, Technical Director theODI.org  
+44 (0) 7974 420 482 @JeniT
Received on Tuesday, 27 May 2014 11:31:45 UTC