- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Thu, 19 Jun 2014 12:36:10 +0000
- To: Dan Brickley <danbri@google.com>
- CC: CSV on the Web Working Group <public-csv-wg@w3.org>
> -----Original Message-----
> From: Dan Brickley [mailto:danbri@google.com]
> Sent: 18 June 2014 12:46
> To: Tandy, Jeremy
> Cc: CSV on the Web Working Group
> Subject: Re: Attempted example CSV metadata document and template
>
> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
> wrote:
> > All -
> >
> > I've just uploaded to [GitHub][1] a rework of the "Simple Weather
> Observation" example. I've tried to create a CSV metadata document
> following the rules in the [Metadata Vocabulary for Tabular Data][2]
> and [Generating RDF from Tabular Data on the Web][3] documents.
> >
> > I would be particularly interested in:
> >
> > - corrections to errors!
> > - comments on additional proposed properties in the metadata document
> > ("short-name", "template", "microsyntax")
> > - use of "hasFormat" to specify the Content-Type associated with a
> > Template
> > - use of a REGEXP within a URI Template to convert ISO 8601 syntax to
> > a simplified form
>
> I don't completely understand this mechanism yet, but do you think it
> could be stretched to address the SKOS/codes issue in
> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> ExpressingHierarchyWithinOccupationalListings
> where we'd want to explode strings like "15-1199.00", "15-1199.01" and
> emit triples like 'broader' when certain patterns matched?
>
> Dan
>
OK ... let's have a go.
Here's the header and a line of data:
---
O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description
15-1199.03,Web Administrators,"Manage web environment design, deployment, development and maintenance activities. [...]"
---
Here's a guess at the CSV metadata description in which I am using the ["multiple regexp each extracting a single value" pattern][1]:
---
{
"name": "2010_Occupations",
"title": "O*NET-SEC Occupational listing for 2010",
"publisher": [{
"name": "O*Net Resource Center",
"web": " http://www.onetcenter.org/ "
}],
"resources": [{
"name": "2010_Occupations-csv",
"path": "2010_Occupations.csv",
"schema": {"columns": [
{
"name": "onet-soc-2010-code",
"title": "O*NET-SOC 2010 Code",
"description": "O*NET Standard Occupational Classification Code (2010).",
"type": "string",
"required": true,
"unique": true,
"microsyntax": [{
"name": "soc-major-group",
"regexp": "/^(\d{2})-\d{4}.\d{2}$/"
},{
"name": "soc-minor-group",
"regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
},{
"name": "soc-broad-group",
"regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
},{
"name": "soc-detailed-occupation",
"regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
},{
"name": "onetsoc-occupation",
"regexp": "/^\d{2}-\d{4}.(\d{2})$/"
}
]
},
{
"name": "title",
"title": "O*NET-SOC 2010 Title",
"description": "Title of occupational classification.",
"type": "string",
"required": true
},
{
"name": "description",
"title": "O*NET-SOC 2010 Description",
"description": Description of occupational classification.",
"type": "string",
"required": true
}
]},
"template": {
"name": "2010_Occupations-csv-to-ttl",
"description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
"type": "template",
"path": "2010_Occupations-csv-to-ttl.ttl",
"hasFormat": "text/turtle"
}
}]
}
---
You can see that I've used the `microsyntax` object to capture the 5 independent elements of the O*NET-SOC code each with its own regexp: "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-occupation" and "onetsoc-occupation". Whether this is the _best_ way to do, I don't know ... it's just an idea to get us talking about possibilities and options!
The template (prefixes etc. intentionally left out) might then be:
---
ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
skos:notation "{onet-soc-2010-code}" ;
skos:prefLabel "{title}" ;
dct:description "{description}" ;
skos:broader ex:{soc-major-group}-0000,
ex:{soc-major-group}-{soc-minor-group}00,
ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0,
ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} .
---
However, this does not help when we look at the required _conditional behaviour_: when the value of "onetsoc-occupation" = "00" this is identical to the term from the SOC taxonomy, and the template should be more like
---
ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} a ex:SOC-DetailedOccupation ;
skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation}" ;
skos:prefLabel "{title}" ;
dct:description "{description}" ;
skos:broader ex:{soc-major-group}-0000,
ex:{soc-major-group}-{soc-minor-group}00,
ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0 .
---
It occurs to be that we may wish to trigger different templates based on a conditional response - or even whether we wish to trigger a template at all for a given line!
Thinking out of the box (is that a euphemism for "making it up as I go along"?), it would seem that each "template" block in the CSV metadata might have a "condition" statement that tells it when to fire - using values of column names or microsyntax element names? e.g.
---
"template": {
"name": "2010_Occupations-csv-to-ttl",
"description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
"type": "template",
"path": "2010_Occupations-csv-to-ttl.ttl",
"hasFormat": "text/turtle",
"condition": "if {soc-detailed-occupation} != '00'"
}
---
Default behaviour (if no "condition" statement included) would be _always_ to trigger the template for each row.
However, looking at this, I am immediately concerned that including if-then-else blocks and comparison operators hugely increases the complexity of our work. Perhaps this is a good point to "bug out" to some external agent (e.g. call-back function or promise).
Jeremy
[1]: https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-template-for-simple-weather-obs-example.md#multiple-regexp-each-extracting-single-value
>
> > - thoughts about a way to describe that microsyntax format within the
> metadata document (see CellMicrosyntax requirement][4]), e.g. to define
> the sub-elements within the microsyntax that may be extracted for use
> later - see [Parsing cell microsyntax][5].
> >
> > Comments welcome.
> >
> > Jeremy
> >
> >
> > [1]:
> > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-
> te
> > mplate-for-simple-weather-obs-example.md
> > [2]: http://w3c.github.io/csvw/metadata/index.html
> > [3]: http://w3c.github.io/csvw/csv2rdf/
> > [4]:
> > http://w3c.github.io/csvw/use-cases-and-requirements/#R-
> CellMicrosynta
> > x
> > [5]:
> > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-
> te
> > mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax
> >
Received on Thursday, 19 June 2014 12:36:41 UTC