- From: Andy Seaborne <andy@apache.org>
- Date: Tue, 24 Jun 2014 11:39:20 +0100
- To: public-csv-wg@w3.org
(general observation) There are ways to get conditional effects without explicit "if-the-else" 1/ Apply different templates : that is multiple passes with different matching conditions. 2/ A template is valid if and only if all its associated templates are defined (the template may not acatully be used) so that a (non-)matching regex is controlling whether the template is applied. These might be applicable separately or together. Andy On 23/06/14 17:35, Ivan Herman wrote: > > On 23 Jun 2014, at 18:03 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote: > >>> -----Original Message----- >>> From: Ivan Herman [mailto:ivan@w3.org] >>> Sent: 21 June 2014 08:38 >>> To: Tandy, Jeremy >>> Cc: Dan Brickley; W3C CSV on the Web Working Group >>> Subject: Re: Attempted example CSV metadata document and template >>> >>> Jeremy, >>> >>> one thing that I was wondering about was that the simple naming >>> mechanism for the various microsyntaxes may not work out. Consider >>> >>> "columns" : [ >>> { "name" : "datetime", >>> ... >>> "microsytax": [ >>> { "name" : N1, >>> "regexp" : "...." >>> }, >>> ..... >>> ] >>> }, >>> { "name" : "anothercolumn", >>> ... >>> "microsyntax" >>> { "name" : N1, >>> "regexp" : "...." >>> }, >>> ..... >>> } >>> >>> ] >>> >>> >>> When working through the cells in a row, what would 'N1' refer to? >>> Unless we want to require the unicity of the microsyntax names, we may >>> hit an issue. And I do not think requiring a unique name is a good >>> idea; if the metadata becomes big, this may become a nuisance. >> >> Agreed. I made the assumption that all instances of "name" within a given metadata document would need to be unique. I had not considered any mechanisms to make this easy for users; e.g. using the "name" from an enclosing object to automatically _namespace_ sub-names. >> >> We could leave it to the user to ensure uniqueness (easy for us; adds load to the end user which is less good); in which case the example above would fail to validate. >> >> Alternatively, we could apply a form of name-spacing; e.g. "datetime/N1" and "anothercolumn/N1" within your example above. >> >>> >>> What this means is that the syntax becomes more complicated. Something >>> like {datetime:N1} or something similar (which raises the issue of >>> escape characters, too:-( >> >> Agreed! I chose a different separator character to you, but the same issue applies. >> >>> >>> As for the conditionals: mustache has some syntax for this which is a >>> bit different >>> >>> {{#bla}} >>> .. any template here >>> {{/bla}} >>> >>> although the mustache semantics is a bit different (afaik it relies on >>> the existence or not of a key in an object). We could use the mustache >>> semantics but we probably need something more, too, like "if 'bla' is a >>> microsyntax name and is true if the value of the cell matches the >>> regexp then it is true". >> >> Syntax-wise, we want our metadata document to be valid JSON, so we would need something different to mustache. However, I agree that our use cases call for similar semantics. Perhaps the syntax might be something like: >> >> "condition: { >> "operator": "if ({bla})", >> "template": { >> "name": "2010_Occupations-csv-to-ttl", >> "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).", >> "type": "template", >> "path": "2010_Occupations-csv-to-ttl.ttl", >> "hasFormat": "text/turtle" >> } >> } >> >> In this case, I'm trying to say that the template will be triggered if the value of {bla} is true / not null etc. ... the value of {bla} is taken by evaluating the column (or microsyntax element) with "name" = "bla" for the row being processed. Like you say: """it relies on the existence or not of a key in an object""" >> >> (I don't really like the syntax; I guess that others can come up with better.) > > Ouch, you are right, I forgot about the fact that we want templates for conditionals:-( > > But before getting into the boring issue of syntax we have to decide whether we need them... > >> >>> >>> But I agree that the conditional complicates the templates a lot. Here >>> is where our use cases may have to switch in: do our use cases justify >>> the need for conditionals (remembering that, though we are discussing >>> turtle here, I do not see any difference between generating turtle and >>> generating XML or JSON through the same mechanism). >> >> The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1], motivated by the ExpressingHierarchyWithinOccupationalListings use case. This use case gives us two requirements: >> >> i) triggering a template if a value of a cell is not null; e.g. to generate the SKOS concept scheme from the SOC structure ... >> >> 15-0000,,,,Computer and Mathematical Occupations,,,,, >> ,15-1100,,,Computer Occupations,,,,, >> ,,15-1110,,Computer and Information Research Scientists,,,,, >> ,,,15-1111,Computer and Information Research Scientists,,,,, >> >> Here we can see that I only want a ex:SOC-MajorGroup entity created on the first row shown above (where col 1 is populated). >> >> ii) triggering a template if a value of a cell equates to a particular string (or the opposite); e.g. when the value of "onetsoc-occupation" = "00" as shown in the example shown [earlier in this email thread][3]. ... >> >> "operator": "if ({onetsoc-occupation} == '00')" >> >> Perhaps there are cases for more complex operations? I don't know. Perhaps this is where call-back functions or promises could be used to parse a row and provide a Boolean response as to whether the template should be triggered? Again, I don't know ... and some considerable thought would be required to work out the details of such. > > For me these seem to be convincing that we need something. My preference would be, though, to avoid all the issues about defining 'if'-s and 'else'-s and comparions operators, etc, etc, and fall back on regular expressions ('match'-'not match') simply because regular expressions are used elsewhere already. Would that be enough? > > Ivan > >> >> Jeremy >> >> >> >> [1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-ConditionalProcessingBasedOnCellValues >> [2]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-ExpressingHierarchyWithinOccupationalListings >> [3]: http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html >> >>> >>> My 2 cents... >>> >>> Ivan >>> >>> >>> >>> >>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy >>> <jeremy.tandy@metoffice.gov.uk> wrote: >>> >>>>> -----Original Message----- >>>>> From: Dan Brickley [mailto:danbri@google.com] >>>>> Sent: 18 June 2014 12:46 >>>>> To: Tandy, Jeremy >>>>> Cc: CSV on the Web Working Group >>>>> Subject: Re: Attempted example CSV metadata document and template >>>>> >>>>> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> >>>>> wrote: >>>>>> All - >>>>>> >>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple Weather >>>>> Observation" example. I've tried to create a CSV metadata document >>>>> following the rules in the [Metadata Vocabulary for Tabular Data][2] >>>>> and [Generating RDF from Tabular Data on the Web][3] documents. >>>>>> >>>>>> I would be particularly interested in: >>>>>> >>>>>> - corrections to errors! >>>>>> - comments on additional proposed properties in the metadata >>>>>> document ("short-name", "template", "microsyntax") >>>>>> - use of "hasFormat" to specify the Content-Type associated with a >>>>>> Template >>>>>> - use of a REGEXP within a URI Template to convert ISO 8601 syntax >>>>>> to a simplified form >>>>> >>>>> I don't completely understand this mechanism yet, but do you think >>> it >>>>> could be stretched to address the SKOS/codes issue in >>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- >>>>> ExpressingHierarchyWithinOccupationalListings >>>>> where we'd want to explode strings like "15-1199.00", "15-1199.01" >>>>> and emit triples like 'broader' when certain patterns matched? >>>>> >>>>> Dan >>>>> >>>> >>>> OK ... let's have a go. >>>> >>>> Here's the header and a line of data: >>>> >>>> --- >>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description >>>> 15-1199.03,Web Administrators,"Manage web environment design, >>> deployment, development and maintenance activities. [...]" >>>> --- >>>> >>>> Here's a guess at the CSV metadata description in which I am using >>> the ["multiple regexp each extracting a single value" pattern][1]: >>>> >>>> --- >>>> { >>>> "name": "2010_Occupations", >>>> "title": "O*NET-SEC Occupational listing for 2010", >>>> "publisher": [{ >>>> "name": "O*Net Resource Center", >>>> "web": " http://www.onetcenter.org/ " >>>> }], >>>> "resources": [{ >>>> "name": "2010_Occupations-csv", >>>> "path": "2010_Occupations.csv", >>>> "schema": {"columns": [ >>>> { >>>> "name": "onet-soc-2010-code", >>>> "title": "O*NET-SOC 2010 Code", >>>> "description": "O*NET Standard Occupational >>> Classification Code (2010).", >>>> "type": "string", >>>> "required": true, >>>> "unique": true, >>>> "microsyntax": [{ >>>> "name": "soc-major-group", >>>> "regexp": "/^(\d{2})-\d{4}.\d{2}$/" >>>> },{ >>>> "name": "soc-minor-group", >>>> "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/" >>>> },{ >>>> "name": "soc-broad-group", >>>> "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/" >>>> },{ >>>> "name": "soc-detailed-occupation", >>>> "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/" >>>> },{ >>>> "name": "onetsoc-occupation", >>>> "regexp": "/^\d{2}-\d{4}.(\d{2})$/" >>>> } >>>> >>>> ] >>>> }, >>>> { >>>> "name": "title", >>>> "title": "O*NET-SOC 2010 Title", >>>> "description": "Title of occupational classification.", >>>> "type": "string", >>>> "required": true >>>> }, >>>> { >>>> "name": "description", >>>> "title": "O*NET-SOC 2010 Description", >>>> "description": Description of occupational >>> classification.", >>>> "type": "string", >>>> "required": true >>>> } >>>> ]}, >>>> "template": { >>>> "name": "2010_Occupations-csv-to-ttl", >>>> "description": "Template converting CSV content to SKOS/RDF >>> (expressed in Turtle syntax).", >>>> "type": "template", >>>> "path": "2010_Occupations-csv-to-ttl.ttl", >>>> "hasFormat": "text/turtle" >>>> } >>>> }] >>>> } >>>> --- >>>> >>>> You can see that I've used the `microsyntax` object to capture the 5 >>> independent elements of the O*NET-SOC code each with its own regexp: >>> "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed- >>> occupation" and "onetsoc-occupation". Whether this is the _best_ way to >>> do, I don't know ... it's just an idea to get us talking about >>> possibilities and options! >>>> >>>> The template (prefixes etc. intentionally left out) might then be: >>>> >>>> --- >>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ; >>>> skos:notation "{onet-soc-2010-code}" ; >>>> skos:prefLabel "{title}" ; >>>> dct:description "{description}" ; >>>> skos:broader ex:{soc-major-group}-0000, >>>> ex:{soc-major-group}-{soc-minor-group}00, >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- >>> group}0, >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- >>> group}{soc-detailed-occupation} . >>>> --- >>>> >>>> However, this does not help when we look at the required _conditional >>>> behaviour_: when the value of "onetsoc-occupation" = "00" this is >>>> identical to the term from the SOC taxonomy, and the template should >>>> be more like >>>> >>>> --- >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed- >>> occupation} a ex:SOC-DetailedOccupation ; >>>> skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad- >>> group}{soc-detailed-occupation}" ; >>>> skos:prefLabel "{title}" ; >>>> dct:description "{description}" ; >>>> skos:broader ex:{soc-major-group}-0000, >>>> ex:{soc-major-group}-{soc-minor-group}00, >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- >>> group}0 . >>>> --- >>>> >>>> It occurs to be that we may wish to trigger different templates based >>> on a conditional response - or even whether we wish to trigger a >>> template at all for a given line! >>>> >>>> Thinking out of the box (is that a euphemism for "making it up as I >>> go along"?), it would seem that each "template" block in the CSV >>> metadata might have a "condition" statement that tells it when to fire >>> - using values of column names or microsyntax element names? e.g. >>>> >>>> --- >>>> "template": { >>>> "name": "2010_Occupations-csv-to-ttl", >>>> "description": "Template converting CSV content to SKOS/RDF >>> (expressed in Turtle syntax).", >>>> "type": "template", >>>> "path": "2010_Occupations-csv-to-ttl.ttl", >>>> "hasFormat": "text/turtle", >>>> "condition": "if {soc-detailed-occupation} != '00'" >>>> } >>>> --- >>>> >>>> Default behaviour (if no "condition" statement included) would be >>> _always_ to trigger the template for each row. >>>> >>>> However, looking at this, I am immediately concerned that including >>> if-then-else blocks and comparison operators hugely increases the >>> complexity of our work. Perhaps this is a good point to "bug out" to >>> some external agent (e.g. call-back function or promise). >>>> >>>> Jeremy >>>> >>>> [1]: >>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and- >>> te >>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each- >>> extracti >>>> ng-single-value >>>> >>>>> >>>>>> - thoughts about a way to describe that microsyntax format within >>>>>> the >>>>> metadata document (see CellMicrosyntax requirement][4]), e.g. to >>>>> define the sub-elements within the microsyntax that may be extracted >>>>> for use later - see [Parsing cell microsyntax][5]. >>>>>> >>>>>> Comments welcome. >>>>>> >>>>>> Jeremy >>>>>> >>>>>> >>>>>> [1]: >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- >>> and- >>>>> te >>>>>> mplate-for-simple-weather-obs-example.md >>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html >>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/ >>>>>> [4]: >>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R- >>>>> CellMicrosynta >>>>>> x >>>>>> [5]: >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- >>> and- >>>>> te >>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax >>> >>> >>> ---- >>> Ivan Herman, W3C >>> Digital Publishing Activity Lead >>> Home: http://www.w3.org/People/Ivan/ >>> mobile: +31-641044153 >>> GPG: 0x343F1A3D >>> WebID: http://www.ivan-herman.net/foaf#me > > > ---- > Ivan Herman, W3C > Digital Publishing Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > GPG: 0x343F1A3D > WebID: http://www.ivan-herman.net/foaf#me > > > > >
Received on Tuesday, 24 June 2014 10:39:54 UTC