Re: Suggestion on ISSUE-7 in "Model for Tabular Data and Metadata on the Web": Allow metadata in any RDF-enabled format

On Apr 6, 2014, at 5:09 PM, Jeni Tennison <jeni@jenitennison.com> wrote:

> Hi David,
> 
> Thanks for raising this. I think that there’s a distinction between the metadata about the dataset (eg its author, when it was published) being mappable to RDF and the standardised format of the metadata document that provides annotations about CSV files being in an RDF format.
> 
> Being able to map CSVs and metadata about CSVs into RDF is very much part of what the Working Group needs to do, and Andy and Gregg are taking that work forward at the moment. That is what the charter is referring to when it says "It should also be possible to encode this metadata in RDF”.
> 
> Regarding the format of the metadata document: being able to use CSV for that document is exactly where my thinking is going as well. As you say, it means that it can be mapped to other formats if required. I think it also makes the creation of the metadata/schema accessible to non-developers, which is a good thing to achieve if we can manage it.
> 
> A generic format would be something like:
> 
> about,property,value,type or language
> ,name,Example,
> ,author,Jeni Tennison,
> ,created,2014-04-06,date
> col=1,name,Name,en
> col=1,name,Nom,fr
> col=1,required,true,boolean
> ...
> 
> A more specific format would need to look something like:
> 
> row,col,name@en,name@fr,description@en,description@fr,required,type,lang
> ,,Example,,,,
> ,1,Name,Nom,Someone’s name.,,true,string,en
> ,2,Email,,Someone’s email.,true,URL,
> 
> which is messy for handling things like multiple languages, which are moderately likely within schemas.
> 
> Curious what other people think of this approach.

I definitely see value in being able to do this, but I don't see how it will let us do any kind of structured mapping. For example, in Jeremy's recent example (mapping weather data)[1], he's looking to go directly to something like the following:

@base               <http://data.example.org/wow/data/weather-observations/> .
@prefix ssn:        <http://purl.oclc.org/NET/ssnx/ssn#> .
@prefix time:       <http://www.w3.org/2006/time#> .
@prefix xsd:        <http://www.w3.org/2001/XMLSchema#> . 
@prefix qudt:       <http://qudt.org/1.1/schema/qudt#> .
@prefix def-op:     <http://data.example.org/wow/def/observed-property#> .

<site/22580943/date-time/20131213T0800Z>
    a ssn:Observation ;
    ssn:observationSamplingTime [ time:inXSDDateTime "2013-12-13T08:00:00Z"^^xsd:dateTime ] ;
    ssn:observationResult [
        a ssn:SensorOutput ;
        def-op:airTemperature_C [ qudt:numericValue "11.2"^^xsd:double ] ;
        def-op:dewPointTemperature_C [ qudt:numericValue "10.2"^^xsd:double ] ] .

<site/22580943/date-time/20131213T0900Z>
    a ssn:Observation ;
    ssn:observationSamplingTime [ time:inXSDDateTime "2013-12-13T09:00:00Z"^^xsd:dateTime ] ;
    ssn:observationResult [
        a ssn:SensorOutput ;
        def-op:airTemperature_C [ qudt:numericValue "12.0"^^xsd:double ] ;
        def-op:dewPointTemperature_C [ qudt:numericValue "10.2"^^xsd:double ] ] .

Simply mapping columns to property values of some row node, even if it retains datatype/IRI information, does not allow such a result to be reproduced. The CSV-LD mapping frame does, however while this is an RDF serialization, much information is contained in the JSON-LD context. The chained portions of the template do help in creating this type of chained result, though.

I do think that at a base-level, beyond the "direct mapping", simply providing datatype information is useful, but it only goes so far. Another "simple" way to do this is to associate RDF predicates with each column, and use an rdfs:range definition for that property to intuit a transformation, but this won't get to sub-fields which are comma-delimited, for example. Chaining structure could be layered on top of this.

Placing type information in a JSON-LD context, or using the JSON Table Schema [2[ recently referenced [2], would then seem to be beyond the WG charter, which seems unfortunate. To stick with the letter of the charter would seem to require that we repeat work (e.g., from [2]) just so that it can be represented as RDF.

Jeremy described more about his mapping example here [3]

Gregg

[1] https://github.com/w3c/csvw/blob/gh-pages/examples/simple-weather-observation.md#rdf-encoding
[2] http://dataprotocols.org/json-table-schema/
[3] https://github.com/w3c/csvw/blob/gh-pages/csv-ld/mapping-frame-within-tabular-data-package.md

> Jeni
> 
> ------------------------------------------------------
> From: David Booth david@dbooth.org
> Reply: David Booth david@dbooth.org
> Date: 6 April 2014 at 14:58:19
> To: public-csv-wg@w3.org public-csv-wg@w3.org, Jeni Tennison jeni@jenitennison.com, 'Gregg Kellogg' gregg@greggkellogg.com
> Subject:  Suggestion on ISSUE-7 in "Model for Tabular Data and Metadata on the Web": Allow metadata in any RDF-enabled format
> 
>> http://w3c.github.io/csvw/syntax/#h_issue_7
>> says:
>> [[
>> Issue 7
>> 
>> Used a suffix on filenames to find metadata about them, though we
>> haven't decided what format metadata documents should be in, or even if
>> they should be conneg'd.
>> ]]
>> 
>> The WG charter says that the metadata "should be defined, or should have
>> an encoding, in standard RDF". One possibility would be to allow a
>> related metadata document to be in any RDF-enabled format -- including
>> CSV+. If a metadata document were supplied in CSV+ format, then it
>> could be converted to RDF according to the same mapping rules as any
>> other CSV+ document. This of course is recursive, but in practice the
>> recursion would likely involve only one or two steps.
>> 
>> David
>> 
>> 
>> 
> 
> --  
> Jeni Tennison
> http://www.jenitennison.com/

Received on Monday, 7 April 2014 00:40:29 UTC