W3C home > Mailing lists > Public > public-csv-wg@w3.org > June 2014

RE: Attempted example CSV metadata document and template

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Thu, 19 Jun 2014 12:36:10 +0000
To: Dan Brickley <danbri@google.com>
CC: CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE20884900A@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
> -----Original Message-----
> From: Dan Brickley [mailto:danbri@google.com]
> Sent: 18 June 2014 12:46
> To: Tandy, Jeremy
> Cc: CSV on the Web Working Group
> Subject: Re: Attempted example CSV metadata document and template
> 
> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
> wrote:
> > All -
> >
> > I've just uploaded to [GitHub][1] a rework of the "Simple Weather
> Observation" example. I've tried to create a CSV metadata document
> following the rules in the [Metadata Vocabulary for Tabular Data][2]
> and [Generating RDF from Tabular Data on the Web][3] documents.
> >
> > I would be particularly interested in:
> >
> > - corrections to errors!
> > - comments on additional proposed properties in the metadata document
> > ("short-name", "template", "microsyntax")
> > - use of "hasFormat" to specify the Content-Type associated with a
> > Template
> > - use of a REGEXP within a URI Template to convert ISO 8601 syntax to
> > a simplified form
> 
> I don't completely understand this mechanism yet, but do you think it
> could be stretched to address the SKOS/codes issue in
> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-

> ExpressingHierarchyWithinOccupationalListings
> where we'd want to explode strings like "15-1199.00", "15-1199.01" and
> emit triples like 'broader' when certain patterns matched?
> 
> Dan
> 

OK ... let's have a go.

Here's the header and a line of data:

---
O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description
15-1199.03,Web Administrators,"Manage web environment design, deployment, development and maintenance activities. [...]"
---

Here's a guess at the CSV metadata description in which I am using the ["multiple regexp each extracting a single value" pattern][1]:

---
{
   "name": "2010_Occupations",
   "title": "O*NET-SEC Occupational listing for 2010",
   "publisher": [{
       "name": "O*Net Resource Center",
       "web": " http://www.onetcenter.org/ "
   }],
   "resources": [{
       "name": "2010_Occupations-csv",
       "path": "2010_Occupations.csv",
       "schema": {"columns": [
           {
               "name": "onet-soc-2010-code",
               "title": "O*NET-SOC 2010 Code",
               "description": "O*NET Standard Occupational Classification Code (2010).",
               "type": "string",
               "required": true,
               "unique": true, 
               "microsyntax": [{
                       "name": "soc-major-group",
                       "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
                   },{
                       "name": "soc-minor-group",
                       "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
                   },{
                       "name": "soc-broad-group",
                       "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
                   },{
                       "name": "soc-detailed-occupation",
                       "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
                   },{
                       "name": "onetsoc-occupation",
                       "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
                   }

               ]
           },
           {
               "name": "title",
               "title": "O*NET-SOC 2010 Title",
               "description": "Title of occupational classification.",
               "type": "string",
               "required": true
           },
           {
               "name": "description",
               "title": "O*NET-SOC 2010 Description",
               "description": Description of occupational classification.",
               "type": "string",
               "required": true
           }
       ]},
       "template": {
           "name": "2010_Occupations-csv-to-ttl",
           "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
           "type": "template",
           "path": "2010_Occupations-csv-to-ttl.ttl",
           "hasFormat": "text/turtle"
       }
   }]
}
---

You can see that I've used the `microsyntax` object to capture the 5 independent elements of the O*NET-SOC code each with its own regexp: "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-occupation" and "onetsoc-occupation". Whether this is the _best_ way to do, I don't know ... it's just an idea to get us talking about possibilities and options!

The template (prefixes etc. intentionally left out) might then be:

---
ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
    skos:notation "{onet-soc-2010-code}" ;
    skos:prefLabel "{title}" ;
    dct:description "{description}" ;
    skos:broader ex:{soc-major-group}-0000, 
                 ex:{soc-major-group}-{soc-minor-group}00, 
                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0,
                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} .
---

However, this does not help when we look at the required _conditional behaviour_: when the value of "onetsoc-occupation" = "00" this is identical to the term from the SOC taxonomy, and the template should be more like

---
ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} a ex:SOC-DetailedOccupation ;
    skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation}" ;
    skos:prefLabel "{title}" ;
    dct:description "{description}" ;
    skos:broader ex:{soc-major-group}-0000, 
                 ex:{soc-major-group}-{soc-minor-group}00, 
                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0 .
---

It occurs to be that we may wish to trigger different templates based on a conditional response - or even whether we wish to trigger a template at all for a given line!

Thinking out of the box (is that a euphemism for "making it up as I go along"?), it would seem that each "template" block in the CSV metadata might have a "condition" statement that tells it when to fire - using values of column names or microsyntax element names? e.g.

---
       "template": {
           "name": "2010_Occupations-csv-to-ttl",
           "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
           "type": "template",
           "path": "2010_Occupations-csv-to-ttl.ttl",
           "hasFormat": "text/turtle",
           "condition": "if {soc-detailed-occupation} != '00'"
       }
---

Default behaviour (if no "condition" statement included) would be _always_ to trigger the template for each row.

However, looking at this, I am immediately concerned that including if-then-else blocks and comparison operators hugely increases the complexity of our work. Perhaps this is a good point to "bug out" to some external agent (e.g. call-back function or promise).

Jeremy

[1]: https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-template-for-simple-weather-obs-example.md#multiple-regexp-each-extracting-single-value


> 
> > - thoughts about a way to describe that microsyntax format within the
> metadata document (see CellMicrosyntax requirement][4]), e.g. to define
> the sub-elements within the microsyntax that may be extracted for use
> later - see [Parsing cell microsyntax][5].
> >
> > Comments welcome.
> >
> > Jeremy
> >
> >
> > [1]:
> > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-

> te
> > mplate-for-simple-weather-obs-example.md
> > [2]: http://w3c.github.io/csvw/metadata/index.html

> > [3]: http://w3c.github.io/csvw/csv2rdf/

> > [4]:
> > http://w3c.github.io/csvw/use-cases-and-requirements/#R-

> CellMicrosynta
> > x
> > [5]:
> > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-

> te
> > mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax
> >
Received on Thursday, 19 June 2014 12:36:41 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:40 UTC