RE: Re: CSV use case from Tandy, Jeremy on 2014-05-30 (public-csv-wg@w3.org from May 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Fri, 30 May 2014 10:31:22 +0000
To: Jeni Tennison <jeni@theodi.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE2088432C7@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Hi Jeni - would you like these incorporated into the UCR doc. I think that there is a new requirement embedded in this regarding provision of labels for multiple locales within the metadata document?

Jeremy

> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@theodi.org]
> Sent: 27 May 2014 12:31
> To: public-csv-wg@w3.org
> Cc: Tim Davies (Web Foundation); david.megginson@megginson.com
> Subject: Fw: Re: CSV use case
> 
> Some extra use cases re internationalisation of CSVs.
> 
> Jeni
> 
> ------------------------------------------------------
> From: Tim Davies timdavies@webfoundation.org
> Reply: Tim Davies timdavies@webfoundation.org
> Date: 20 May 2014 at 23:36:13
> To: Jeni Tennison jeni@theodi.org, david.megginson@megginson.com
> david.megginson@megginson.com
> Subject:  Re: CSV use case
> 
> > Hello Jeni,
> >
> > Good to hear from you. Yes, so there are two main cases and two
> > approaches here. One based on the work David Megginson is doing on
> > Humanitarian Exchange Language (I've copied David in so he can
> correct
> > me when I misrepresent their work...;) - and one based on the 360
> > Giving Data Standard I worked on.
> >
> >
> > *Issue 1:*Tabular data needs to be created, read by and exchanged
> > between people speaking different languages. Many of these are basic
> > spreadsheet users who will find it far easier to use data with
> natural
> > and clear language in the column headings. Having the column headings
> > in their own language will make creating and interpreting the data a
> lot easier.
> >
> >
> > *Issue 2:*
> > Tabular data needs to be created that contains literal values in
> > multiple languages. For example, the name of a town in English,
> French and Arabic.
> > The total number of languages that the data might be expressed in
> > cannot be easily determined in advance, and it should be possible for
> > a user to introduce a new language variant of a column easily.
> >
> > *The HXL approach*
> > See https://groups.google.com/forum/#!topic/hxlproject/8cLoE5cqV1Y

> >
> > - A data dictionary is created with numerical codes equating to field
> > definitions
> > - Providing the column header contains the numerical code, all other
> > values in the column heading can be arbitrary (i.e. can be in plain
> > language of the template creators choice)
> > - A parser extracts just the code and uses this to interpret the
> > meaning of the column
> > - Language codes can be attached onto the end of column codes to
> > indicate a language variant. E.g. if 010 is 'Source description' then
> > there can an '010/en' column with 'Doctors without Borders' and an
> '010/fr'
> > column containing 'Medicine sans fronteirs'
> >
> > This had advantage of being robust to people messing around with
> > column titles (extra spaces etc.) as long as they don't mess with the
> ID.
> >
> > *The 360 Giving Approach*
> >
> > See http://threesixtygiving.github.io/standard/

> >
> > As yet - not multilingual version of this is implemented - but the
> > idea is
> > that:
> >
> > - The CSV serialisation is based on an underlying Ontology (available
> > at
> > https://github.com/ThreeSixtyGiving/prototype-tools) which means
> there
> > is a URI for each column (the final part of which provides a
> > machine-readable column ID), and labels, which can be expressed in
> > various languages.
> > - When a version of the spreadsheet for humans is created, the column
> > ID is replaced with the English language label, or labels from some
> > other language.
> > - A conversion tool is created to map between IDs and labels.
> >
> > As yet a way to address to Issue 2 has not been proposed in this
> approach.
> >
> > I'm personally leaning more towards the HXL approach over the
> > long-run, though perhaps linked to an ontology with IDs for fields
> > also rather than just a data dictionary to support more idiomatically
> > friendly JSON and XML representations.
> >
> >
> > Let me know if this covers what you needed, or if write up in some
> > other style would be useful,
> >
> > Would also welcome any feedback on whether we're missing good ideas
> > and approaches from the wider CSV standardisation work that we should
> > be thinking about...
> >
> > All the best
> >
> > Tim
> >
> >
> > On Sun, May 18, 2014 at 5:28 PM, Jeni Tennison wrote:
> >
> > > Tim,
> > >
> > > I hope you’re well?
> > >
> > > When we met up a little while ago, you talked about a CSV-based
> > > format that you were putting together where you wanted the general
> > > format to be the same across languages, but wanted the headers to
> be
> > > different so that they were understandable to non-English-language-
> speakers.
> > >
> > > I wonder if you could write a little description of the issue and
> > > send me a couple of example files that show how that works, so that
> > > I can include them as a use case for the CSV WG?
> > >
> > > Thanks,
> > >
> > > Jeni
> > > --
> > > Jeni Tennison, Technical Director theODI.org
> > > +44 (0) 7974 420 482 @JeniT
> > >
> > >
> >
> >
> > --
> > --
> > Tim Davies
> > Research Coordinator, Open Data Research Network
> > +44 7834 856 303
> > @timdavies | @odrnetwork | www.opendataresearch.org
> >
> > *World Wide Web Foundation | **1110 Vermont Ave NW, Suite 500,
> > Washington DC 20005, USA** | www.webfoundation.org |
> > Twitter: @webfoundation*
> >
> 
> --
> Jeni Tennison, Technical Director theODI.org
> +44 (0) 7974 420 482 @JeniT
>
Received on Friday, 30 May 2014 10:31:55 UTC