Re: CSV2RDF redraft from Jeni Tennison on 2014-03-26 (public-csv-wg@w3.org from March 2014)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 26 Mar 2014 18:34:15 +0000
To: CSV on the Web Working Group <public-csv-wg@w3.org>, Andy Seaborne <andy@apache.org>
Message-ID: <etPan.53331da7.684a481a.a6b@jenit.local>
It’s a design choice. The options as I see them are:

  1. define a mapping that uses blank nodes for properties
  2. define a mapping that constructs a name for the properties using the column number
  3. say that the “label” annotation (which might come from embedded headers or from a metadata document) must be present for all columns for there to be any conversion to RDF

My feeling is that it’s more friendly to do #1 or #2 because it means that at least the RDF user can import the CSV as RDF into something on which they could then use CONSTRUCT statements or something to create something more meaningful.

Certainly I don’t think that there’s necessarily the same constraint in conversions to eg JSON or XML, but there might be a similar consideration in a mapping into tabular data frameworks (eg import into SQL database).

Jeni

------------------------------------------------------
From: Andy Seaborne andy@apache.org
Reply: Andy Seaborne andy@apache.org
Date: 26 March 2014 at 18:09:47
To: Jeni Tennison jeni@jenitennison.com, CSV on the Web Working Group public-csv-wg@w3.org
Subject:  Re: CSV2RDF redraft

> On 26/03/14 17:47, Jeni Tennison wrote:
> > Yes, it’s (unfortunately) common for CSV files to be published without headers. And  
> in some cases the best thing, because it helps people aggregate them together. See (from  
> our use cases document):
> >
> > http://w3c.github.io/csvw/use-cases-and-requirements/uganda_000000000005_monthly_stage2  
> > http://publicdata.landregistry.gov.uk/market-trend-data/price-paid-data/b/pp-monthly-update.txt  
> >
> > That second one is even from your use case!
>  
> Yes but! there are annotations albeit in descriptive text currently.
>  
> http://www.landregistry.gov.uk/market-trend-data/public-data/price-paid-faq#m18  
>  
> >
> > Not that it’s something that we should encourage publishers to do, but something the  
> conversions need to deal with.
> >
> > The decision to not include headers in the core data model was made in:
> >
> > http://www.w3.org/2014/02/26-csvw-minutes.html
>  
> ----
> for "headerless" csvs we could always define default
> properties, :column1 ,... , :columnn , right?
>  
> That sounds good
>  
> jeni: columns only have numbers, and row 1 titles are simple annotations
> ----
>  
> headerless and annotationless can be incorporated (CSV2RDf or CSV-LD).
>  
> Or is that a roadblock for you?
>  
> Andy
>  
>  
> >
> > Cheers,
> >
> > Jeni
> >
> > ------------------------------------------------------
> > From: Andy Seaborne andy@apache.org
> > Reply: Andy Seaborne andy@apache.org
> > Date: 26 March 2014 at 17:34:29
> > To: Jeni Tennison jeni@jenitennison.com, CSV on the Web Working Group public-csv-wg@w3.org  
> > Subject: Re: CSV2RDF redraft
> >
> >> On 26/03/14 15:55, Jeni Tennison wrote:
> >>> Andy,
> >>>
> >>> What about in the absence of headers (which aren’t in the core data model)?
> >>
> >> Do we have examples of that?
> >>
> >> I don't think that CSV files without headers nor annotation information
> >> are much use on the web. To use the information, you need to know
> >> something.
> >>
> >> Otherwise its not "publishing", it's "data exchange" between agreeing
> >> parties.
> >>
> >> The best is "col_1", "col_2", ... c.f.
> >> http://shancarter.github.io/mr-data-converter/ then you have to add your
> >> own interpretation.
> >>
> >> Should we include a header requirement, or at least a preference, in CDM?
> >>
> >> Andy
> >>
> >>>
> >>> Jeni
> >>>
> >>> ------------------------------------------------------
> >>> From: Andy Seaborne andy@apache.org
> >>> Reply: Andy Seaborne andy@apache.org
> >>> Date: 26 March 2014 at 14:58:40
> >>> To: CSV on the Web Working Group public-csv-wg@w3.org
> >>> Subject: CSV2RDF redraft
> >>>
> >>>> https://www.w3.org/2013/csvw/wiki/CSV2RDF
> >>>>
> >>>> This is a conversion based on defining the triples produced, not the
> >>>> syntax used as output.
> >>>>
> >>>> ------------
> >>>> Town,Population
> >>>> Southton,123000
> >>>> Northville,654000
> >>>> ------------
> >>>>
> >>>> in the absence of any annotations (i.e. Core Data Model):
> >>>>
> >>>> generates (if Turtle used - N-triples example in the wiki):
> >>>>
> >>>> ------------
> >>>> @prefix : .
> >>>> @prefix csv: .
> >>>>
> >>>> # Column information
> >>>>
> >>>> csv:column [ csv:colName "Town" ;
> >>>> csv:colPredicate :Town ;
> >>>> csv:colIndex 1 ] ;
> >>>> csv:column [ csv:colName "Population" ;
> >>>> csv:colPredicate :Population ;
> >>>> csv:colIndex 2 ] ;
> >>>> .
> >>>>
> >>>> # Data rows
> >>>> [ csv:row 1 ; :Town "Southton" ; :Population 123000 ] .
> >>>> [ csv:row 2 ; :Town "Northville" ; :Population 654000 ] .
> >>>> ------------
> >>>>
> >>>> population becomes number by guessing from the data.
> >>>>
> >>>> In that is uses one predicate per column, it is similar to CSV-lD in the
> >>>> absence of any @context.
> >>>>
> >>>> If we can make the creation of the CSV-LD @context align to the minimal
> >>>> structure CSV2RDF uses, we wil at least have a common base line.
> >>>>
> >>>> Gregg and I will discuss that as per the telecon.
> >>>>
> >>>> Andy
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Jeni Tennison
> >>> http://www.jenitennison.com/
> >>>
> >>
> >>
> >>
> >>
> >
> > --
> > Jeni Tennison
> > http://www.jenitennison.com/
> >
>  
>  
>  
>  

--  
Jeni Tennison
http://www.jenitennison.com/
Received on Wednesday, 26 March 2014 18:35:03 UTC