- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Thu, 19 Jun 2014 12:36:10 +0000
- To: Dan Brickley <danbri@google.com>
- CC: CSV on the Web Working Group <public-csv-wg@w3.org>
> -----Original Message----- > From: Dan Brickley [mailto:danbri@google.com] > Sent: 18 June 2014 12:46 > To: Tandy, Jeremy > Cc: CSV on the Web Working Group > Subject: Re: Attempted example CSV metadata document and template > > On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> > wrote: > > All - > > > > I've just uploaded to [GitHub][1] a rework of the "Simple Weather > Observation" example. I've tried to create a CSV metadata document > following the rules in the [Metadata Vocabulary for Tabular Data][2] > and [Generating RDF from Tabular Data on the Web][3] documents. > > > > I would be particularly interested in: > > > > - corrections to errors! > > - comments on additional proposed properties in the metadata document > > ("short-name", "template", "microsyntax") > > - use of "hasFormat" to specify the Content-Type associated with a > > Template > > - use of a REGEXP within a URI Template to convert ISO 8601 syntax to > > a simplified form > > I don't completely understand this mechanism yet, but do you think it > could be stretched to address the SKOS/codes issue in > http://w3c.github.io/csvw/use-cases-and-requirements/#UC- > ExpressingHierarchyWithinOccupationalListings > where we'd want to explode strings like "15-1199.00", "15-1199.01" and > emit triples like 'broader' when certain patterns matched? > > Dan > OK ... let's have a go. Here's the header and a line of data: --- O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description 15-1199.03,Web Administrators,"Manage web environment design, deployment, development and maintenance activities. [...]" --- Here's a guess at the CSV metadata description in which I am using the ["multiple regexp each extracting a single value" pattern][1]: --- { "name": "2010_Occupations", "title": "O*NET-SEC Occupational listing for 2010", "publisher": [{ "name": "O*Net Resource Center", "web": " http://www.onetcenter.org/ " }], "resources": [{ "name": "2010_Occupations-csv", "path": "2010_Occupations.csv", "schema": {"columns": [ { "name": "onet-soc-2010-code", "title": "O*NET-SOC 2010 Code", "description": "O*NET Standard Occupational Classification Code (2010).", "type": "string", "required": true, "unique": true, "microsyntax": [{ "name": "soc-major-group", "regexp": "/^(\d{2})-\d{4}.\d{2}$/" },{ "name": "soc-minor-group", "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/" },{ "name": "soc-broad-group", "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/" },{ "name": "soc-detailed-occupation", "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/" },{ "name": "onetsoc-occupation", "regexp": "/^\d{2}-\d{4}.(\d{2})$/" } ] }, { "name": "title", "title": "O*NET-SOC 2010 Title", "description": "Title of occupational classification.", "type": "string", "required": true }, { "name": "description", "title": "O*NET-SOC 2010 Description", "description": Description of occupational classification.", "type": "string", "required": true } ]}, "template": { "name": "2010_Occupations-csv-to-ttl", "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).", "type": "template", "path": "2010_Occupations-csv-to-ttl.ttl", "hasFormat": "text/turtle" } }] } --- You can see that I've used the `microsyntax` object to capture the 5 independent elements of the O*NET-SOC code each with its own regexp: "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-occupation" and "onetsoc-occupation". Whether this is the _best_ way to do, I don't know ... it's just an idea to get us talking about possibilities and options! The template (prefixes etc. intentionally left out) might then be: --- ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ; skos:notation "{onet-soc-2010-code}" ; skos:prefLabel "{title}" ; dct:description "{description}" ; skos:broader ex:{soc-major-group}-0000, ex:{soc-major-group}-{soc-minor-group}00, ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0, ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} . --- However, this does not help when we look at the required _conditional behaviour_: when the value of "onetsoc-occupation" = "00" this is identical to the term from the SOC taxonomy, and the template should be more like --- ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} a ex:SOC-DetailedOccupation ; skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation}" ; skos:prefLabel "{title}" ; dct:description "{description}" ; skos:broader ex:{soc-major-group}-0000, ex:{soc-major-group}-{soc-minor-group}00, ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0 . --- It occurs to be that we may wish to trigger different templates based on a conditional response - or even whether we wish to trigger a template at all for a given line! Thinking out of the box (is that a euphemism for "making it up as I go along"?), it would seem that each "template" block in the CSV metadata might have a "condition" statement that tells it when to fire - using values of column names or microsyntax element names? e.g. --- "template": { "name": "2010_Occupations-csv-to-ttl", "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).", "type": "template", "path": "2010_Occupations-csv-to-ttl.ttl", "hasFormat": "text/turtle", "condition": "if {soc-detailed-occupation} != '00'" } --- Default behaviour (if no "condition" statement included) would be _always_ to trigger the template for each row. However, looking at this, I am immediately concerned that including if-then-else blocks and comparison operators hugely increases the complexity of our work. Perhaps this is a good point to "bug out" to some external agent (e.g. call-back function or promise). Jeremy [1]: https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-template-for-simple-weather-obs-example.md#multiple-regexp-each-extracting-single-value > > > - thoughts about a way to describe that microsyntax format within the > metadata document (see CellMicrosyntax requirement][4]), e.g. to define > the sub-elements within the microsyntax that may be extracted for use > later - see [Parsing cell microsyntax][5]. > > > > Comments welcome. > > > > Jeremy > > > > > > [1]: > > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and- > te > > mplate-for-simple-weather-obs-example.md > > [2]: http://w3c.github.io/csvw/metadata/index.html > > [3]: http://w3c.github.io/csvw/csv2rdf/ > > [4]: > > http://w3c.github.io/csvw/use-cases-and-requirements/#R- > CellMicrosynta > > x > > [5]: > > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and- > te > > mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax > >
Received on Thursday, 19 June 2014 12:36:41 UTC