- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Mon, 23 Jun 2014 16:03:51 +0000
- To: Ivan Herman <ivan@w3.org>
- CC: Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
> -----Original Message----- > From: Ivan Herman [mailto:ivan@w3.org] > Sent: 21 June 2014 08:38 > To: Tandy, Jeremy > Cc: Dan Brickley; W3C CSV on the Web Working Group > Subject: Re: Attempted example CSV metadata document and template > > Jeremy, > > one thing that I was wondering about was that the simple naming > mechanism for the various microsyntaxes may not work out. Consider > > "columns" : [ > { "name" : "datetime", > ... > "microsytax": [ > { "name" : N1, > "regexp" : "...." > }, > ..... > ] > }, > { "name" : "anothercolumn", > ... > "microsyntax" > { "name" : N1, > "regexp" : "...." > }, > ..... > } > > ] > > > When working through the cells in a row, what would 'N1' refer to? > Unless we want to require the unicity of the microsyntax names, we may > hit an issue. And I do not think requiring a unique name is a good > idea; if the metadata becomes big, this may become a nuisance. Agreed. I made the assumption that all instances of "name" within a given metadata document would need to be unique. I had not considered any mechanisms to make this easy for users; e.g. using the "name" from an enclosing object to automatically _namespace_ sub-names. We could leave it to the user to ensure uniqueness (easy for us; adds load to the end user which is less good); in which case the example above would fail to validate. Alternatively, we could apply a form of name-spacing; e.g. "datetime/N1" and "anothercolumn/N1" within your example above. > > What this means is that the syntax becomes more complicated. Something > like {datetime:N1} or something similar (which raises the issue of > escape characters, too:-( Agreed! I chose a different separator character to you, but the same issue applies. > > As for the conditionals: mustache has some syntax for this which is a > bit different > > {{#bla}} > .. any template here > {{/bla}} > > although the mustache semantics is a bit different (afaik it relies on > the existence or not of a key in an object). We could use the mustache > semantics but we probably need something more, too, like "if 'bla' is a > microsyntax name and is true if the value of the cell matches the > regexp then it is true". Syntax-wise, we want our metadata document to be valid JSON, so we would need something different to mustache. However, I agree that our use cases call for similar semantics. Perhaps the syntax might be something like: "condition: { "operator": "if ({bla})", "template": { "name": "2010_Occupations-csv-to-ttl", "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).", "type": "template", "path": "2010_Occupations-csv-to-ttl.ttl", "hasFormat": "text/turtle" } } In this case, I'm trying to say that the template will be triggered if the value of {bla} is true / not null etc. ... the value of {bla} is taken by evaluating the column (or microsyntax element) with "name" = "bla" for the row being processed. Like you say: """it relies on the existence or not of a key in an object""" (I don't really like the syntax; I guess that others can come up with better.) > > But I agree that the conditional complicates the templates a lot. Here > is where our use cases may have to switch in: do our use cases justify > the need for conditionals (remembering that, though we are discussing > turtle here, I do not see any difference between generating turtle and > generating XML or JSON through the same mechanism). The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1], motivated by the ExpressingHierarchyWithinOccupationalListings use case. This use case gives us two requirements: i) triggering a template if a value of a cell is not null; e.g. to generate the SKOS concept scheme from the SOC structure ... 15-0000,,,,Computer and Mathematical Occupations,,,,, ,15-1100,,,Computer Occupations,,,,, ,,15-1110,,Computer and Information Research Scientists,,,,, ,,,15-1111,Computer and Information Research Scientists,,,,, Here we can see that I only want a ex:SOC-MajorGroup entity created on the first row shown above (where col 1 is populated). ii) triggering a template if a value of a cell equates to a particular string (or the opposite); e.g. when the value of "onetsoc-occupation" = "00" as shown in the example shown [earlier in this email thread][3]. ... "operator": "if ({onetsoc-occupation} == '00')" Perhaps there are cases for more complex operations? I don't know. Perhaps this is where call-back functions or promises could be used to parse a row and provide a Boolean response as to whether the template should be triggered? Again, I don't know ... and some considerable thought would be required to work out the details of such. Jeremy [1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-ConditionalProcessingBasedOnCellValues [2]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-ExpressingHierarchyWithinOccupationalListings [3]: http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html > > My 2 cents... > > Ivan > > > > > On 19 Jun 2014, at 14:36 , Tandy, Jeremy > <jeremy.tandy@metoffice.gov.uk> wrote: > > >> -----Original Message----- > >> From: Dan Brickley [mailto:danbri@google.com] > >> Sent: 18 June 2014 12:46 > >> To: Tandy, Jeremy > >> Cc: CSV on the Web Working Group > >> Subject: Re: Attempted example CSV metadata document and template > >> > >> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> > >> wrote: > >>> All - > >>> > >>> I've just uploaded to [GitHub][1] a rework of the "Simple Weather > >> Observation" example. I've tried to create a CSV metadata document > >> following the rules in the [Metadata Vocabulary for Tabular Data][2] > >> and [Generating RDF from Tabular Data on the Web][3] documents. > >>> > >>> I would be particularly interested in: > >>> > >>> - corrections to errors! > >>> - comments on additional proposed properties in the metadata > >>> document ("short-name", "template", "microsyntax") > >>> - use of "hasFormat" to specify the Content-Type associated with a > >>> Template > >>> - use of a REGEXP within a URI Template to convert ISO 8601 syntax > >>> to a simplified form > >> > >> I don't completely understand this mechanism yet, but do you think > it > >> could be stretched to address the SKOS/codes issue in > >> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- > >> ExpressingHierarchyWithinOccupationalListings > >> where we'd want to explode strings like "15-1199.00", "15-1199.01" > >> and emit triples like 'broader' when certain patterns matched? > >> > >> Dan > >> > > > > OK ... let's have a go. > > > > Here's the header and a line of data: > > > > --- > > O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description > > 15-1199.03,Web Administrators,"Manage web environment design, > deployment, development and maintenance activities. [...]" > > --- > > > > Here's a guess at the CSV metadata description in which I am using > the ["multiple regexp each extracting a single value" pattern][1]: > > > > --- > > { > > "name": "2010_Occupations", > > "title": "O*NET-SEC Occupational listing for 2010", > > "publisher": [{ > > "name": "O*Net Resource Center", > > "web": " http://www.onetcenter.org/ " > > }], > > "resources": [{ > > "name": "2010_Occupations-csv", > > "path": "2010_Occupations.csv", > > "schema": {"columns": [ > > { > > "name": "onet-soc-2010-code", > > "title": "O*NET-SOC 2010 Code", > > "description": "O*NET Standard Occupational > Classification Code (2010).", > > "type": "string", > > "required": true, > > "unique": true, > > "microsyntax": [{ > > "name": "soc-major-group", > > "regexp": "/^(\d{2})-\d{4}.\d{2}$/" > > },{ > > "name": "soc-minor-group", > > "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/" > > },{ > > "name": "soc-broad-group", > > "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/" > > },{ > > "name": "soc-detailed-occupation", > > "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/" > > },{ > > "name": "onetsoc-occupation", > > "regexp": "/^\d{2}-\d{4}.(\d{2})$/" > > } > > > > ] > > }, > > { > > "name": "title", > > "title": "O*NET-SOC 2010 Title", > > "description": "Title of occupational classification.", > > "type": "string", > > "required": true > > }, > > { > > "name": "description", > > "title": "O*NET-SOC 2010 Description", > > "description": Description of occupational > classification.", > > "type": "string", > > "required": true > > } > > ]}, > > "template": { > > "name": "2010_Occupations-csv-to-ttl", > > "description": "Template converting CSV content to SKOS/RDF > (expressed in Turtle syntax).", > > "type": "template", > > "path": "2010_Occupations-csv-to-ttl.ttl", > > "hasFormat": "text/turtle" > > } > > }] > > } > > --- > > > > You can see that I've used the `microsyntax` object to capture the 5 > independent elements of the O*NET-SOC code each with its own regexp: > "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed- > occupation" and "onetsoc-occupation". Whether this is the _best_ way to > do, I don't know ... it's just an idea to get us talking about > possibilities and options! > > > > The template (prefixes etc. intentionally left out) might then be: > > > > --- > > ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ; > > skos:notation "{onet-soc-2010-code}" ; > > skos:prefLabel "{title}" ; > > dct:description "{description}" ; > > skos:broader ex:{soc-major-group}-0000, > > ex:{soc-major-group}-{soc-minor-group}00, > > ex:{soc-major-group}-{soc-minor-group}{soc-broad- > group}0, > > ex:{soc-major-group}-{soc-minor-group}{soc-broad- > group}{soc-detailed-occupation} . > > --- > > > > However, this does not help when we look at the required _conditional > > behaviour_: when the value of "onetsoc-occupation" = "00" this is > > identical to the term from the SOC taxonomy, and the template should > > be more like > > > > --- > > ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed- > occupation} a ex:SOC-DetailedOccupation ; > > skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad- > group}{soc-detailed-occupation}" ; > > skos:prefLabel "{title}" ; > > dct:description "{description}" ; > > skos:broader ex:{soc-major-group}-0000, > > ex:{soc-major-group}-{soc-minor-group}00, > > ex:{soc-major-group}-{soc-minor-group}{soc-broad- > group}0 . > > --- > > > > It occurs to be that we may wish to trigger different templates based > on a conditional response - or even whether we wish to trigger a > template at all for a given line! > > > > Thinking out of the box (is that a euphemism for "making it up as I > go along"?), it would seem that each "template" block in the CSV > metadata might have a "condition" statement that tells it when to fire > - using values of column names or microsyntax element names? e.g. > > > > --- > > "template": { > > "name": "2010_Occupations-csv-to-ttl", > > "description": "Template converting CSV content to SKOS/RDF > (expressed in Turtle syntax).", > > "type": "template", > > "path": "2010_Occupations-csv-to-ttl.ttl", > > "hasFormat": "text/turtle", > > "condition": "if {soc-detailed-occupation} != '00'" > > } > > --- > > > > Default behaviour (if no "condition" statement included) would be > _always_ to trigger the template for each row. > > > > However, looking at this, I am immediately concerned that including > if-then-else blocks and comparison operators hugely increases the > complexity of our work. Perhaps this is a good point to "bug out" to > some external agent (e.g. call-back function or promise). > > > > Jeremy > > > > [1]: > > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and- > te > > mplate-for-simple-weather-obs-example.md#multiple-regexp-each- > extracti > > ng-single-value > > > >> > >>> - thoughts about a way to describe that microsyntax format within > >>> the > >> metadata document (see CellMicrosyntax requirement][4]), e.g. to > >> define the sub-elements within the microsyntax that may be extracted > >> for use later - see [Parsing cell microsyntax][5]. > >>> > >>> Comments welcome. > >>> > >>> Jeremy > >>> > >>> > >>> [1]: > >>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- > and- > >> te > >>> mplate-for-simple-weather-obs-example.md > >>> [2]: http://w3c.github.io/csvw/metadata/index.html > >>> [3]: http://w3c.github.io/csvw/csv2rdf/ > >>> [4]: > >>> http://w3c.github.io/csvw/use-cases-and-requirements/#R- > >> CellMicrosynta > >>> x > >>> [5]: > >>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- > and- > >> te > >>> mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax > > > ---- > Ivan Herman, W3C > Digital Publishing Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > GPG: 0x343F1A3D > WebID: http://www.ivan-herman.net/foaf#me > > > >
Received on Monday, 23 June 2014 16:04:24 UTC