- From: Ivan Herman <ivan@w3.org>
- Date: Sat, 21 Jun 2014 09:38:12 +0200
- To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
- Cc: Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <0EDBA5D3-6E79-48C4-9EA0-D62621915087@w3.org>
Jeremy, one thing that I was wondering about was that the simple naming mechanism for the various microsyntaxes may not work out. Consider "columns" : [ { "name" : "datetime", ... "microsytax": [ { "name" : N1, "regexp" : "...." }, ..... ] }, { "name" : "anothercolumn", ... "microsyntax" { "name" : N1, "regexp" : "...." }, ..... } ] When working through the cells in a row, what would 'N1' refer to? Unless we want to require the unicity of the microsyntax names, we may hit an issue. And I do not think requiring a unique name is a good idea; if the metadata becomes big, this may become a nuisance. What this means is that the syntax becomes more complicated. Something like {datetime:N1} or something similar (which raises the issue of escape characters, too:-( As for the conditionals: mustache has some syntax for this which is a bit different {{#bla}} .. any template here {{/bla}} although the mustache semantics is a bit different (afaik it relies on the existence or not of a key in an object). We could use the mustache semantics but we probably need something more, too, like "if 'bla' is a microsyntax name and is true if the value of the cell matches the regexp then it is true". But I agree that the conditional complicates the templates a lot. Here is where our use cases may have to switch in: do our use cases justify the need for conditionals (remembering that, though we are discussing turtle here, I do not see any difference between generating turtle and generating XML or JSON through the same mechanism). My 2 cents... Ivan On 19 Jun 2014, at 14:36 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote: >> -----Original Message----- >> From: Dan Brickley [mailto:danbri@google.com] >> Sent: 18 June 2014 12:46 >> To: Tandy, Jeremy >> Cc: CSV on the Web Working Group >> Subject: Re: Attempted example CSV metadata document and template >> >> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> >> wrote: >>> All - >>> >>> I've just uploaded to [GitHub][1] a rework of the "Simple Weather >> Observation" example. I've tried to create a CSV metadata document >> following the rules in the [Metadata Vocabulary for Tabular Data][2] >> and [Generating RDF from Tabular Data on the Web][3] documents. >>> >>> I would be particularly interested in: >>> >>> - corrections to errors! >>> - comments on additional proposed properties in the metadata document >>> ("short-name", "template", "microsyntax") >>> - use of "hasFormat" to specify the Content-Type associated with a >>> Template >>> - use of a REGEXP within a URI Template to convert ISO 8601 syntax to >>> a simplified form >> >> I don't completely understand this mechanism yet, but do you think it >> could be stretched to address the SKOS/codes issue in >> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- >> ExpressingHierarchyWithinOccupationalListings >> where we'd want to explode strings like "15-1199.00", "15-1199.01" and >> emit triples like 'broader' when certain patterns matched? >> >> Dan >> > > OK ... let's have a go. > > Here's the header and a line of data: > > --- > O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description > 15-1199.03,Web Administrators,"Manage web environment design, deployment, development and maintenance activities. [...]" > --- > > Here's a guess at the CSV metadata description in which I am using the ["multiple regexp each extracting a single value" pattern][1]: > > --- > { > "name": "2010_Occupations", > "title": "O*NET-SEC Occupational listing for 2010", > "publisher": [{ > "name": "O*Net Resource Center", > "web": " http://www.onetcenter.org/ " > }], > "resources": [{ > "name": "2010_Occupations-csv", > "path": "2010_Occupations.csv", > "schema": {"columns": [ > { > "name": "onet-soc-2010-code", > "title": "O*NET-SOC 2010 Code", > "description": "O*NET Standard Occupational Classification Code (2010).", > "type": "string", > "required": true, > "unique": true, > "microsyntax": [{ > "name": "soc-major-group", > "regexp": "/^(\d{2})-\d{4}.\d{2}$/" > },{ > "name": "soc-minor-group", > "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/" > },{ > "name": "soc-broad-group", > "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/" > },{ > "name": "soc-detailed-occupation", > "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/" > },{ > "name": "onetsoc-occupation", > "regexp": "/^\d{2}-\d{4}.(\d{2})$/" > } > > ] > }, > { > "name": "title", > "title": "O*NET-SOC 2010 Title", > "description": "Title of occupational classification.", > "type": "string", > "required": true > }, > { > "name": "description", > "title": "O*NET-SOC 2010 Description", > "description": Description of occupational classification.", > "type": "string", > "required": true > } > ]}, > "template": { > "name": "2010_Occupations-csv-to-ttl", > "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).", > "type": "template", > "path": "2010_Occupations-csv-to-ttl.ttl", > "hasFormat": "text/turtle" > } > }] > } > --- > > You can see that I've used the `microsyntax` object to capture the 5 independent elements of the O*NET-SOC code each with its own regexp: "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-occupation" and "onetsoc-occupation". Whether this is the _best_ way to do, I don't know ... it's just an idea to get us talking about possibilities and options! > > The template (prefixes etc. intentionally left out) might then be: > > --- > ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ; > skos:notation "{onet-soc-2010-code}" ; > skos:prefLabel "{title}" ; > dct:description "{description}" ; > skos:broader ex:{soc-major-group}-0000, > ex:{soc-major-group}-{soc-minor-group}00, > ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0, > ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} . > --- > > However, this does not help when we look at the required _conditional behaviour_: when the value of "onetsoc-occupation" = "00" this is identical to the term from the SOC taxonomy, and the template should be more like > > --- > ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} a ex:SOC-DetailedOccupation ; > skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation}" ; > skos:prefLabel "{title}" ; > dct:description "{description}" ; > skos:broader ex:{soc-major-group}-0000, > ex:{soc-major-group}-{soc-minor-group}00, > ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0 . > --- > > It occurs to be that we may wish to trigger different templates based on a conditional response - or even whether we wish to trigger a template at all for a given line! > > Thinking out of the box (is that a euphemism for "making it up as I go along"?), it would seem that each "template" block in the CSV metadata might have a "condition" statement that tells it when to fire - using values of column names or microsyntax element names? e.g. > > --- > "template": { > "name": "2010_Occupations-csv-to-ttl", > "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).", > "type": "template", > "path": "2010_Occupations-csv-to-ttl.ttl", > "hasFormat": "text/turtle", > "condition": "if {soc-detailed-occupation} != '00'" > } > --- > > Default behaviour (if no "condition" statement included) would be _always_ to trigger the template for each row. > > However, looking at this, I am immediately concerned that including if-then-else blocks and comparison operators hugely increases the complexity of our work. Perhaps this is a good point to "bug out" to some external agent (e.g. call-back function or promise). > > Jeremy > > [1]: https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-template-for-simple-weather-obs-example.md#multiple-regexp-each-extracting-single-value > >> >>> - thoughts about a way to describe that microsyntax format within the >> metadata document (see CellMicrosyntax requirement][4]), e.g. to define >> the sub-elements within the microsyntax that may be extracted for use >> later - see [Parsing cell microsyntax][5]. >>> >>> Comments welcome. >>> >>> Jeremy >>> >>> >>> [1]: >>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and- >> te >>> mplate-for-simple-weather-obs-example.md >>> [2]: http://w3c.github.io/csvw/metadata/index.html >>> [3]: http://w3c.github.io/csvw/csv2rdf/ >>> [4]: >>> http://w3c.github.io/csvw/use-cases-and-requirements/#R- >> CellMicrosynta >>> x >>> [5]: >>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and- >> te >>> mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me
Received on Saturday, 21 June 2014 07:38:44 UTC