Re: New i18n use case [WAS: CSV use case] from Andy Seaborne on 2014-05-31 (public-csv-wg@w3.org from May 2014)

From: Andy Seaborne <andy@apache.org>
Date: Sat, 31 May 2014 15:07:00 +0100
To: public-csv-wg@w3.org
Message-ID: <5389E204.7020804@apache.org>
On 30/05/14 20:36, Tandy, Jeremy wrote:
> Oh - and I should say that I focused on the HXL example rather than the "360 giving" one because it touched on both the issues raised in the email from Tim Davies.
>
> Jeremy
>
>> -----Original Message-----
>> From: Tandy, Jeremy [mailto:jeremy.tandy@metoffice.gov.uk]
>> Sent: 30 May 2014 18:04
>> To: Jeni Tennison; public-csv-wg@w3.org
>> Cc: Tim Davies (Web Foundation); david.megginson@megginson.com
>> Subject: New i18n use case [WAS: CSV use case]
>>
>> Hi - following Jeni's earlier message, I have now added another use
>> case to the document to describe the concerns raised: " Use Case #23 -
>> Collating humanitarian information for crisis response"
>> <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
>> CollatingHumanitarianResponseInformation> ...
>>
>> You'll see this has introduced two new requirements:
>>
>> - <http://w3c.github.io/csvw/use-cases-and-requirements/#R-
>> MultilingualContent>

"specify the language / locale relevant to each field"

Minor terminology point (Rufus has mentioned something similar), "field" 
here is referring to all the cells in a column? (I'm reading from the 
general context it isn't a particular (x,y) cell though that isn't 
unimaginable).

>> - <http://w3c.github.io/csvw/use-cases-and-requirements/#R-
>> ListsAsRepeatedFields>

It could be either list or repeated objects (in RDF speak)?

The other case of repeated fields is a repeated row with blanks means 
"same as above".  This relates to hierarchies:

concept subconcept
         subconcept
concept subconcept
         subconcept
         subconcept

of which org charts are an example.

 Andy

>>
>> Comments welcome ... especially from Tim Davies and David Megginson :-)
>> One issue I have raised is whether HXL is still predicated on RDF;
>> whether the conversion from tabular HXL into an RDF format is an
>> accurate portrayal.
>>
>> Jeremy
>>
>> PS: you'll also notice that I've removed the references to "DDR" (as
>> pointed out by AndyS recently, this was unhelpful additional
>> terminology) and removed the empty "Terminology" section.
>>
>>> -----Original Message-----
>>> From: Jeni Tennison [mailto:jeni@theodi.org]
>>> Sent: 27 May 2014 12:31
>>> To: public-csv-wg@w3.org
>>> Cc: Tim Davies (Web Foundation); david.megginson@megginson.com
>>> Subject: Fw: Re: CSV use case
>>>
>>> Some extra use cases re internationalisation of CSVs.
>>>
>>> Jeni
>>>
>>> ------------------------------------------------------
>>> From: Tim Davies timdavies@webfoundation.org
>>> Reply: Tim Davies timdavies@webfoundation.org
>>> Date: 20 May 2014 at 23:36:13
>>> To: Jeni Tennison jeni@theodi.org, david.megginson@megginson.com
>>> david.megginson@megginson.com
>>> Subject:  Re: CSV use case
>>>
>>>> Hello Jeni,
>>>>
>>>> Good to hear from you. Yes, so there are two main cases and two
>>>> approaches here. One based on the work David Megginson is doing on
>>>> Humanitarian Exchange Language (I've copied David in so he can
>>> correct
>>>> me when I misrepresent their work...;) - and one based on the 360
>>>> Giving Data Standard I worked on.
>>>>
>>>>
>>>> *Issue 1:*Tabular data needs to be created, read by and exchanged
>>>> between people speaking different languages. Many of these are
>> basic
>>>> spreadsheet users who will find it far easier to use data with
>>> natural
>>>> and clear language in the column headings. Having the column
>>>> headings in their own language will make creating and interpreting
>>>> the data a
>>> lot easier.
>>>>
>>>>
>>>> *Issue 2:*
>>>> Tabular data needs to be created that contains literal values in
>>>> multiple languages. For example, the name of a town in English,
>>> French and Arabic.
>>>> The total number of languages that the data might be expressed in
>>>> cannot be easily determined in advance, and it should be possible
>>>> for a user to introduce a new language variant of a column easily.
>>>>
>>>> *The HXL approach*
>>>> See https://groups.google.com/forum/#!topic/hxlproject/8cLoE5cqV1Y
>>>>
>>>> - A data dictionary is created with numerical codes equating to
>>>> field definitions
>>>> - Providing the column header contains the numerical code, all
>> other
>>>> values in the column heading can be arbitrary (i.e. can be in plain
>>>> language of the template creators choice)
>>>> - A parser extracts just the code and uses this to interpret the
>>>> meaning of the column
>>>> - Language codes can be attached onto the end of column codes to
>>>> indicate a language variant. E.g. if 010 is 'Source description'
>>>> then there can an '010/en' column with 'Doctors without Borders'
>> and
>>>> an
>>> '010/fr'
>>>> column containing 'Medicine sans fronteirs'
>>>>
>>>> This had advantage of being robust to people messing around with
>>>> column titles (extra spaces etc.) as long as they don't mess with
>>>> the
>>> ID.
>>>>
>>>> *The 360 Giving Approach*
>>>>
>>>> See http://threesixtygiving.github.io/standard/
>>>>
>>>> As yet - not multilingual version of this is implemented - but the
>>>> idea is
>>>> that:
>>>>
>>>> - The CSV serialisation is based on an underlying Ontology
>>>> (available at
>>>> https://github.com/ThreeSixtyGiving/prototype-tools) which means
>>> there
>>>> is a URI for each column (the final part of which provides a
>>>> machine-readable column ID), and labels, which can be expressed in
>>>> various languages.
>>>> - When a version of the spreadsheet for humans is created, the
>>>> column ID is replaced with the English language label, or labels
>>>> from some other language.
>>>> - A conversion tool is created to map between IDs and labels.
>>>>
>>>> As yet a way to address to Issue 2 has not been proposed in this
>>> approach.
>>>>
>>>> I'm personally leaning more towards the HXL approach over the
>>>> long-run, though perhaps linked to an ontology with IDs for fields
>>>> also rather than just a data dictionary to support more
>>>> idiomatically friendly JSON and XML representations.
>>>>
>>>>
>>>> Let me know if this covers what you needed, or if write up in some
>>>> other style would be useful,
>>>>
>>>> Would also welcome any feedback on whether we're missing good ideas
>>>> and approaches from the wider CSV standardisation work that we
>>>> should be thinking about...
>>>>
>>>> All the best
>>>>
>>>> Tim
>>>>
>>>>
>>>> On Sun, May 18, 2014 at 5:28 PM, Jeni Tennison wrote:
>>>>
>>>>> Tim,
>>>>>
>>>>> I hope you’re well?
>>>>>
>>>>> When we met up a little while ago, you talked about a CSV-based
>>>>> format that you were putting together where you wanted the
>> general
>>>>> format to be the same across languages, but wanted the headers to
>>> be
>>>>> different so that they were understandable to
>>>>> non-English-language-
>>> speakers.
>>>>>
>>>>> I wonder if you could write a little description of the issue and
>>>>> send me a couple of example files that show how that works, so
>>>>> that I can include them as a use case for the CSV WG?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jeni
>>>>> --
>>>>> Jeni Tennison, Technical Director theODI.org
>>>>> +44 (0) 7974 420 482 @JeniT
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Tim Davies
>>>> Research Coordinator, Open Data Research Network
>>>> +44 7834 856 303
>>>> @timdavies | @odrnetwork | www.opendataresearch.org
>>>>
>>>> *World Wide Web Foundation | **1110 Vermont Ave NW, Suite 500,
>>>> Washington DC 20005, USA** | www.webfoundation.org |
>>>> Twitter: @webfoundation*
>>>>
>>>
>>> --
>>> Jeni Tennison, Technical Director theODI.org
>>> +44 (0) 7974 420 482 @JeniT
>>>
>
Received on Saturday, 31 May 2014 14:07:30 UTC