Re: CSV2RDF redraft from Gregg Kellogg on 2014-03-26 (public-csv-wg@w3.org from March 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Wed, 26 Mar 2014 11:26:15 -0700
To: Juan Sequeda <juanfederico@gmail.com>
Cc: Jeni Tennison <jeni@jenitennison.com>, CSV on the Web Working Group <public-csv-wg@w3.org>, Andy Seaborne <andy@apache.org>
Message-Id: <96DFAE70-C22A-416E-AFFE-4EF0E208BA2F@greggkellogg.net>
If a CSV had no header, then that fact would either need to be described in the metadata file, or passed as a processing option for a direct map.

I think the way to handle both of these. Within CSV-LD I envision using header names within templates, but this could be extended to use other field identifiers. For example:

For example:

"{:rowno}" might expand to the the current row number
"{:colno=1}" might reference the contents of the first column (1-based)
"{foo:colno}" might expand to the column number of the field with header "foo"

A direct mapping for a CSV without a header row could then be automatically created using these patterns to generate something similar to what Andy provided:

{
  "@context: {
    "@vocab": "http://w3c/future-csv-vocab/",
    "@base": "http://host/data.csv",
  },
  "row": {"@value": "{:rowno}", "@type": "xsd:integer"},
  "col1": "{:colno=1}",
  "col2": "{:colno=2}",
  ...
}

Applied to Andy's example without the header row, this would create the following Turtle:

@prefix : <http://w3c/future-csv-vocab/>
[ :row 1; :col1 "Southton", "123000" ] .
[ :row 2; :col1 "Northville", "654000" ] .

Note that we loose the fact that the second column is an integer, as it's escaped within a string. With metadata, we can assert that it is an integer (or whatever).

If we did use the column headers, an automated context would presumably just make use of those with something like the following:

{
  "@context: {
    "@vocab": "http://host/data.csv#",
    "@base": "http://host/data.csv",
    "csv": "http://w3c/future-csv-vocab/",
  },
  "csv:row": {"@value": "{:rowno}", "@type": "xsd:integer"},
  "Town": "{Town}",
  "Population": "{Population}",
  ...
}

Note that I've used the document base as @vocab. I also don't include the column information Andy showed. Providing for something like this would require some redefinition of the CSV-LD mapping frame, to include a boilerplate portion in addition to that repeated for each record.

Gregg Kellogg
gregg@greggkellogg.net

On Mar 26, 2014, at 10:54 AM, Juan Sequeda <juanfederico@gmail.com> wrote:

> I've probably missed this, but is there a wiki draft on how to specify csv metadata.
> 
> I would assume that if a CSV file doesn't have a header, then the header information could be specified in a metadata file. Then the mapping file would make use of that metadata file to know the column name/position. 
> 
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com
> 
> 
> On Wed, Mar 26, 2014 at 12:47 PM, Jeni Tennison <jeni@jenitennison.com> wrote:
> Yes, it’s (unfortunately) common for CSV files to be published without headers. And in some cases the best thing, because it helps people aggregate them together. See (from our use cases document):
> 
>   http://w3c.github.io/csvw/use-cases-and-requirements/uganda_000000000005_monthly_stage2
>   http://publicdata.landregistry.gov.uk/market-trend-data/price-paid-data/b/pp-monthly-update.txt
> 
> That second one is even from your use case!
> 
> Not that it’s something that we should encourage publishers to do, but something the conversions need to deal with.
> 
> The decision to not include headers in the core data model was made in:
> 
>   http://www.w3.org/2014/02/26-csvw-minutes.html
> 
> Cheers,
> 
> Jeni
> 
> ------------------------------------------------------
> From: Andy Seaborne andy@apache.org
> Reply: Andy Seaborne andy@apache.org
> Date: 26 March 2014 at 17:34:29
> To: Jeni Tennison jeni@jenitennison.com, CSV on the Web Working Group public-csv-wg@w3.org
> Subject:  Re: CSV2RDF redraft
> 
> > On 26/03/14 15:55, Jeni Tennison wrote:
> > > Andy,
> > >
> > > What about in the absence of headers (which aren’t in the core data model)?
> >
> > Do we have examples of that?
> >
> > I don't think that CSV files without headers nor annotation information
> > are much use on the web. To use the information, you need to know
> > something.
> >
> > Otherwise its not "publishing", it's "data exchange" between agreeing
> > parties.
> >
> > The best is "col_1", "col_2", ... c.f.
> > http://shancarter.github.io/mr-data-converter/ then you have to add your
> > own interpretation.
> >
> > Should we include a header requirement, or at least a preference, in CDM?
> >
> > Andy
> >
> > >
> > > Jeni
> > >
> > > ------------------------------------------------------
> > > From: Andy Seaborne andy@apache.org
> > > Reply: Andy Seaborne andy@apache.org
> > > Date: 26 March 2014 at 14:58:40
> > > To: CSV on the Web Working Group public-csv-wg@w3.org
> > > Subject: CSV2RDF redraft
> > >
> > >> https://www.w3.org/2013/csvw/wiki/CSV2RDF
> > >>
> > >> This is a conversion based on defining the triples produced, not the
> > >> syntax used as output.
> > >>
> > >> ------------
> > >> Town,Population
> > >> Southton,123000
> > >> Northville,654000
> > >> ------------
> > >>
> > >> in the absence of any annotations (i.e. Core Data Model):
> > >>
> > >> generates (if Turtle used - N-triples example in the wiki):
> > >>
> > >> ------------
> > >> @prefix : .
> > >> @prefix csv: .
> > >>
> > >> # Column information
> > >>
> > >> csv:column [ csv:colName "Town" ;
> > >> csv:colPredicate :Town ;
> > >> csv:colIndex 1 ] ;
> > >> csv:column [ csv:colName "Population" ;
> > >> csv:colPredicate :Population ;
> > >> csv:colIndex 2 ] ;
> > >> .
> > >>
> > >> # Data rows
> > >> [ csv:row 1 ; :Town "Southton" ; :Population 123000 ] .
> > >> [ csv:row 2 ; :Town "Northville" ; :Population 654000 ] .
> > >> ------------
> > >>
> > >> population becomes number by guessing from the data.
> > >>
> > >> In that is uses one predicate per column, it is similar to CSV-lD in the
> > >> absence of any @context.
> > >>
> > >> If we can make the creation of the CSV-LD @context align to the minimal
> > >> structure CSV2RDF uses, we wil at least have a common base line.
> > >>
> > >> Gregg and I will discuss that as per the telecon.
> > >>
> > >> Andy
> > >>
> > >>
> > >>
> > >>
> > >
> > > --
> > > Jeni Tennison
> > > http://www.jenitennison.com/
> > >
> >
> >
> >
> >
> 
> --
> Jeni Tennison
> http://www.jenitennison.com/
> 
>
Received on Wednesday, 26 March 2014 18:26:45 UTC