RE: New i18n use case [WAS: CSV use case] from Tandy, Jeremy on 2014-06-02 (public-csv-wg@w3.org from June 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Mon, 2 Jun 2014 09:56:32 +0000
To: David Megginson <david.megginson@megginson.com>
CC: Jeni Tennison <jeni@theodi.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>, "Tim Davies (Web Foundation)" <timdavies@webfoundation.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE208843AAE@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Hi David – thanks for the feedback. I’ve amended the use case to remove the strong dependency on RDF as _the_ data format & just noted that RDF could be an export format.

Many thanks, Jeremy

From: dpm@megginson.com [mailto:dpm@megginson.com] On Behalf Of David Megginson
Sent: 30 May 2014 22:19
To: Tandy, Jeremy
Cc: Jeni Tennison; public-csv-wg@w3.org; Tim Davies (Web Foundation)
Subject: Re: New i18n use case [WAS: CSV use case]

Looks good — it's nice to see the progress with the CSV WG.

I can confirm that HXL is not currently based on an RDF model, though we're opening to defining an RDF export format at some point in the future. At present, our model is entirely tabular.


Cheers, David

On Fri, May 30, 2014 at 1:04 PM, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk<mailto:jeremy.tandy@metoffice.gov.uk>> wrote:
Hi - following Jeni's earlier message, I have now added another use case to the document to describe the concerns raised: " Use Case #23 - Collating humanitarian information for crisis response" <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-CollatingHumanitarianResponseInformation> ...

You'll see this has introduced two new requirements:

- <http://w3c.github.io/csvw/use-cases-and-requirements/#R-MultilingualContent>
- <http://w3c.github.io/csvw/use-cases-and-requirements/#R-ListsAsRepeatedFields>

Comments welcome ... especially from Tim Davies and David Megginson :-)
One issue I have raised is whether HXL is still predicated on RDF; whether the conversion from tabular HXL into an RDF format is an accurate portrayal.

Jeremy

PS: you'll also notice that I've removed the references to "DDR" (as pointed out by AndyS recently, this was unhelpful additional terminology) and removed the empty "Terminology" section.

> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@theodi.org<mailto:jeni@theodi.org>]
> Sent: 27 May 2014 12:31
> To: public-csv-wg@w3.org<mailto:public-csv-wg@w3.org>
> Cc: Tim Davies (Web Foundation); david.megginson@megginson.com<mailto:david.megginson@megginson.com>
> Subject: Fw: Re: CSV use case
>
> Some extra use cases re internationalisation of CSVs.
>
> Jeni
>
> ------------------------------------------------------
> From: Tim Davies timdavies@webfoundation.org<mailto:timdavies@webfoundation.org>
> Reply: Tim Davies timdavies@webfoundation.org<mailto:timdavies@webfoundation.org>
> Date: 20 May 2014 at 23:36:13
> To: Jeni Tennison jeni@theodi.org<mailto:jeni@theodi.org>, david.megginson@megginson.com<mailto:david.megginson@megginson.com>
> david.megginson@megginson.com<mailto:david.megginson@megginson.com>
> Subject:  Re: CSV use case
>
> > Hello Jeni,
> >
> > Good to hear from you. Yes, so there are two main cases and two
> > approaches here. One based on the work David Megginson is doing on
> > Humanitarian Exchange Language (I've copied David in so he can
> correct
> > me when I misrepresent their work...;) - and one based on the 360
> > Giving Data Standard I worked on.
> >
> >
> > *Issue 1:*Tabular data needs to be created, read by and exchanged
> > between people speaking different languages. Many of these are basic
> > spreadsheet users who will find it far easier to use data with
> natural
> > and clear language in the column headings. Having the column headings
> > in their own language will make creating and interpreting the data a
> lot easier.
> >
> >
> > *Issue 2:*
> > Tabular data needs to be created that contains literal values in
> > multiple languages. For example, the name of a town in English,
> French and Arabic.
> > The total number of languages that the data might be expressed in
> > cannot be easily determined in advance, and it should be possible for
> > a user to introduce a new language variant of a column easily.
> >
> > *The HXL approach*
> > See https://groups.google.com/forum/#!topic/hxlproject/8cLoE5cqV1Y

> >
> > - A data dictionary is created with numerical codes equating to field
> > definitions
> > - Providing the column header contains the numerical code, all other
> > values in the column heading can be arbitrary (i.e. can be in plain
> > language of the template creators choice)
> > - A parser extracts just the code and uses this to interpret the
> > meaning of the column
> > - Language codes can be attached onto the end of column codes to
> > indicate a language variant. E.g. if 010 is 'Source description' then
> > there can an '010/en' column with 'Doctors without Borders' and an
> '010/fr'
> > column containing 'Medicine sans fronteirs'
> >
> > This had advantage of being robust to people messing around with
> > column titles (extra spaces etc.) as long as they don't mess with the
> ID.
> >
> > *The 360 Giving Approach*
> >
> > See http://threesixtygiving.github.io/standard/

> >
> > As yet - not multilingual version of this is implemented - but the
> > idea is
> > that:
> >
> > - The CSV serialisation is based on an underlying Ontology (available
> > at
> > https://github.com/ThreeSixtyGiving/prototype-tools) which means
> there
> > is a URI for each column (the final part of which provides a
> > machine-readable column ID), and labels, which can be expressed in
> > various languages.
> > - When a version of the spreadsheet for humans is created, the column
> > ID is replaced with the English language label, or labels from some
> > other language.
> > - A conversion tool is created to map between IDs and labels.
> >
> > As yet a way to address to Issue 2 has not been proposed in this
> approach.
> >
> > I'm personally leaning more towards the HXL approach over the
> > long-run, though perhaps linked to an ontology with IDs for fields
> > also rather than just a data dictionary to support more idiomatically
> > friendly JSON and XML representations.
> >
> >
> > Let me know if this covers what you needed, or if write up in some
> > other style would be useful,
> >
> > Would also welcome any feedback on whether we're missing good ideas
> > and approaches from the wider CSV standardisation work that we should
> > be thinking about...
> >
> > All the best
> >
> > Tim
> >
> >
> > On Sun, May 18, 2014 at 5:28 PM, Jeni Tennison wrote:
> >
> > > Tim,
> > >
> > > I hope you’re well?
> > >
> > > When we met up a little while ago, you talked about a CSV-based
> > > format that you were putting together where you wanted the general
> > > format to be the same across languages, but wanted the headers to
> be
> > > different so that they were understandable to non-English-language-
> speakers.
> > >
> > > I wonder if you could write a little description of the issue and
> > > send me a couple of example files that show how that works, so that
> > > I can include them as a use case for the CSV WG?
> > >
> > > Thanks,
> > >
> > > Jeni
> > > --
> > > Jeni Tennison, Technical Director theODI.org
> > > +44 (0) 7974 420 482<tel:%2B44%20%280%29%207974%20420%20482> @JeniT
> > >
> > >
> >
> >
> > --
> > --
> > Tim Davies
> > Research Coordinator, Open Data Research Network
> > +44 7834 856 303<tel:%2B44%207834%20856%20303>
> > @timdavies | @odrnetwork | www.opendataresearch.org<http://www.opendataresearch.org>
> >
> > *World Wide Web Foundation | **1110 Vermont Ave NW, Suite 500,
> > Washington DC 20005, USA** | www.webfoundation.org<http://www.webfoundation.org> |
> > Twitter: @webfoundation*
> >
>
> --
> Jeni Tennison, Technical Director theODI.org
> +44 (0) 7974 420 482<tel:%2B44%20%280%29%207974%20420%20482> @JeniT
>
Received on Monday, 2 June 2014 09:57:04 UTC