RE: New i18n use case [WAS: CSV use case] from Tandy, Jeremy on 2014-06-02 (public-csv-wg@w3.org from June 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Mon, 2 Jun 2014 09:19:29 +0000
To: Andy Seaborne <andy@apache.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE208843A51@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Hi Andy - thanks for the comments. Some further clarification required; see below ...

Jeremy

> -----Original Message-----
> From: Andy Seaborne [mailto:andy@apache.org]
> Sent: 31 May 2014 15:07
> To: public-csv-wg@w3.org
> Subject: Re: New i18n use case [WAS: CSV use case]
> 
> On 30/05/14 20:36, Tandy, Jeremy wrote:
> > Oh - and I should say that I focused on the HXL example rather than
> the "360 giving" one because it touched on both the issues raised in
> the email from Tim Davies.
> >
> > Jeremy
> >
> >> -----Original Message-----
> >> From: Tandy, Jeremy [mailto:jeremy.tandy@metoffice.gov.uk]
> >> Sent: 30 May 2014 18:04
> >> To: Jeni Tennison; public-csv-wg@w3.org
> >> Cc: Tim Davies (Web Foundation); david.megginson@megginson.com
> >> Subject: New i18n use case [WAS: CSV use case]
> >>
> >> Hi - following Jeni's earlier message, I have now added another use
> >> case to the document to describe the concerns raised: " Use Case #23
> >> - Collating humanitarian information for crisis response"
> >> <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-

> >> CollatingHumanitarianResponseInformation> ...
> >>
> >> You'll see this has introduced two new requirements:
> >>
> >> - <http://w3c.github.io/csvw/use-cases-and-requirements/#R-

> >> MultilingualContent>
> 
> "specify the language / locale relevant to each field"
> 
> Minor terminology point (Rufus has mentioned something similar),
> "field"
> here is referring to all the cells in a column? (I'm reading from the
> general context it isn't a particular (x,y) cell though that isn't
> unimaginable).

Fixed

> 
> >> - <http://w3c.github.io/csvw/use-cases-and-requirements/#R-

> >> ListsAsRepeatedFields>
> 
> It could be either list or repeated objects (in RDF speak)?

When thinking about this I wasn't projecting any ideas about the target 
RDF implementation. I hadn't considered the use of RDF Collections 
<http://www.w3.org/TR/rdf-schema/#ch_collectionvocab> ... although I suppose 
I was thinking that the RDF would be simple repeated properties, so assuming 
all the 'geocode' columns map to, say, 
<http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> and the geocodes
Themselves are somehow mapped to a URI (not really part of this example, but 
makes for a more "real" transformation), then the example ...

geocode #1,geocode #2,geocode #3
    530012,    530013,    530015

... becomes ...

ex:resource
  <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> ex:530012, ex:530013, ex:530015 .


Do you have any recommendations about modifying the text of the requirement?
Certainly, I can include this trivial mapping. But I guess choice of target 
RDF is down to how the template is implemented?

> 
> The other case of repeated fields is a repeated row with blanks means
> "same as above".  This relates to hierarchies:
> 
> concept subconcept
>          subconcept
> concept subconcept
>          subconcept
>          subconcept
> 
> of which org charts are an example.

It seems that this requirement could be optimised to include all "repeated 
property" behaviours, including that which is displayed by the org chart.

In order to do this, it would be great to have a simple "org chart" example
to refer to. At present, we have _no_ examples of "blank field x means that
the subject of the row is the same as the previous one". 

Do you have an example to hand we could use (e.g. from Dave Reynold's 
Work as editor on the Org vocab <http://www.w3.org/TR/vocab-org/>?  

Thinking a bit further, this affects the "row by row" parsing of the data, 
meaning that the parser would need to retain state from a previous row.

Thoughts?

> 
>  Andy
> 
> >>
> >> Comments welcome ... especially from Tim Davies and David Megginson
> >> :-) One issue I have raised is whether HXL is still predicated on
> >> RDF; whether the conversion from tabular HXL into an RDF format is
> an
> >> accurate portrayal.
> >>
> >> Jeremy
> >>
> >> PS: you'll also notice that I've removed the references to "DDR" (as
> >> pointed out by AndyS recently, this was unhelpful additional
> >> terminology) and removed the empty "Terminology" section.
> >>
> >>> -----Original Message-----
> >>> From: Jeni Tennison [mailto:jeni@theodi.org]
> >>> Sent: 27 May 2014 12:31
> >>> To: public-csv-wg@w3.org
> >>> Cc: Tim Davies (Web Foundation); david.megginson@megginson.com
> >>> Subject: Fw: Re: CSV use case
> >>>
> >>> Some extra use cases re internationalisation of CSVs.
> >>>
> >>> Jeni
> >>>
> >>> ------------------------------------------------------
> >>> From: Tim Davies timdavies@webfoundation.org
> >>> Reply: Tim Davies timdavies@webfoundation.org
> >>> Date: 20 May 2014 at 23:36:13
> >>> To: Jeni Tennison jeni@theodi.org, david.megginson@megginson.com
> >>> david.megginson@megginson.com
> >>> Subject:  Re: CSV use case
> >>>
> >>>> Hello Jeni,
> >>>>
> >>>> Good to hear from you. Yes, so there are two main cases and two
> >>>> approaches here. One based on the work David Megginson is doing on
> >>>> Humanitarian Exchange Language (I've copied David in so he can
> >>> correct
> >>>> me when I misrepresent their work...;) - and one based on the 360
> >>>> Giving Data Standard I worked on.
> >>>>
> >>>>
> >>>> *Issue 1:*Tabular data needs to be created, read by and exchanged
> >>>> between people speaking different languages. Many of these are
> >> basic
> >>>> spreadsheet users who will find it far easier to use data with
> >>> natural
> >>>> and clear language in the column headings. Having the column
> >>>> headings in their own language will make creating and interpreting
> >>>> the data a
> >>> lot easier.
> >>>>
> >>>>
> >>>> *Issue 2:*
> >>>> Tabular data needs to be created that contains literal values in
> >>>> multiple languages. For example, the name of a town in English,
> >>> French and Arabic.
> >>>> The total number of languages that the data might be expressed in
> >>>> cannot be easily determined in advance, and it should be possible
> >>>> for a user to introduce a new language variant of a column easily.
> >>>>
> >>>> *The HXL approach*
> >>>> See https://groups.google.com/forum/#!topic/hxlproject/8cLoE5cqV1Y

> >>>>
> >>>> - A data dictionary is created with numerical codes equating to
> >>>> field definitions
> >>>> - Providing the column header contains the numerical code, all
> >> other
> >>>> values in the column heading can be arbitrary (i.e. can be in
> plain
> >>>> language of the template creators choice)
> >>>> - A parser extracts just the code and uses this to interpret the
> >>>> meaning of the column
> >>>> - Language codes can be attached onto the end of column codes to
> >>>> indicate a language variant. E.g. if 010 is 'Source description'
> >>>> then there can an '010/en' column with 'Doctors without Borders'
> >> and
> >>>> an
> >>> '010/fr'
> >>>> column containing 'Medicine sans fronteirs'
> >>>>
> >>>> This had advantage of being robust to people messing around with
> >>>> column titles (extra spaces etc.) as long as they don't mess with
> >>>> the
> >>> ID.
> >>>>
> >>>> *The 360 Giving Approach*
> >>>>
> >>>> See http://threesixtygiving.github.io/standard/

> >>>>
> >>>> As yet - not multilingual version of this is implemented - but the
> >>>> idea is
> >>>> that:
> >>>>
> >>>> - The CSV serialisation is based on an underlying Ontology
> >>>> (available at
> >>>> https://github.com/ThreeSixtyGiving/prototype-tools) which means
> >>> there
> >>>> is a URI for each column (the final part of which provides a
> >>>> machine-readable column ID), and labels, which can be expressed in
> >>>> various languages.
> >>>> - When a version of the spreadsheet for humans is created, the
> >>>> column ID is replaced with the English language label, or labels
> >>>> from some other language.
> >>>> - A conversion tool is created to map between IDs and labels.
> >>>>
> >>>> As yet a way to address to Issue 2 has not been proposed in this
> >>> approach.
> >>>>
> >>>> I'm personally leaning more towards the HXL approach over the
> >>>> long-run, though perhaps linked to an ontology with IDs for fields
> >>>> also rather than just a data dictionary to support more
> >>>> idiomatically friendly JSON and XML representations.
> >>>>
> >>>>
> >>>> Let me know if this covers what you needed, or if write up in some
> >>>> other style would be useful,
> >>>>
> >>>> Would also welcome any feedback on whether we're missing good
> ideas
> >>>> and approaches from the wider CSV standardisation work that we
> >>>> should be thinking about...
> >>>>
> >>>> All the best
> >>>>
> >>>> Tim
> >>>>
> >>>>
> >>>> On Sun, May 18, 2014 at 5:28 PM, Jeni Tennison wrote:
> >>>>
> >>>>> Tim,
> >>>>>
> >>>>> I hope you’re well?
> >>>>>
> >>>>> When we met up a little while ago, you talked about a CSV-based
> >>>>> format that you were putting together where you wanted the
> >> general
> >>>>> format to be the same across languages, but wanted the headers to
> >>> be
> >>>>> different so that they were understandable to
> >>>>> non-English-language-
> >>> speakers.
> >>>>>
> >>>>> I wonder if you could write a little description of the issue and
> >>>>> send me a couple of example files that show how that works, so
> >>>>> that I can include them as a use case for the CSV WG?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jeni
> >>>>> --
> >>>>> Jeni Tennison, Technical Director theODI.org
> >>>>> +44 (0) 7974 420 482 @JeniT
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> --
> >>>> Tim Davies
> >>>> Research Coordinator, Open Data Research Network
> >>>> +44 7834 856 303
> >>>> @timdavies | @odrnetwork | www.opendataresearch.org
> >>>>
> >>>> *World Wide Web Foundation | **1110 Vermont Ave NW, Suite 500,
> >>>> Washington DC 20005, USA** | www.webfoundation.org |
> >>>> Twitter: @webfoundation*
> >>>>
> >>>
> >>> --
> >>> Jeni Tennison, Technical Director theODI.org
> >>> +44 (0) 7974 420 482 @JeniT
> >>>
> >
>
Received on Monday, 2 June 2014 09:20:00 UTC