- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Tue, 24 Jun 2014 11:11:50 +0000
- To: Andy Seaborne <andy@apache.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Hi Andy - Hopefully the [worked example][1] that I've just created illustrate your point. Please feel free to fix / amend / re-write as necessary!!! In both examples, I've created multiple templates, which are configured to be triggered on a matching condition. Jeremy [1]: https://github.com/w3c/csvw/blob/gh-pages/examples/conditional-matching-in-occupational-listing-hierarchy.md > -----Original Message----- > From: Andy Seaborne [mailto:andy@apache.org] > Sent: 24 June 2014 11:39 > To: public-csv-wg@w3.org > Subject: Re: Attempted example CSV metadata document and template > > (general observation) > > There are ways to get conditional effects without explicit "if-the- > else" > > 1/ Apply different templates : that is multiple passes with different > matching conditions. > > 2/ A template is valid if and only if all its associated templates are > defined (the template may not acatully be used) so that a (non- > )matching regex is controlling whether the template is applied. > > These might be applicable separately or together. > > Andy > > On 23/06/14 17:35, Ivan Herman wrote: > > > > On 23 Jun 2014, at 18:03 , Tandy, Jeremy > <jeremy.tandy@metoffice.gov.uk> wrote: > > > >>> -----Original Message----- > >>> From: Ivan Herman [mailto:ivan@w3.org] > >>> Sent: 21 June 2014 08:38 > >>> To: Tandy, Jeremy > >>> Cc: Dan Brickley; W3C CSV on the Web Working Group > >>> Subject: Re: Attempted example CSV metadata document and template > >>> > >>> Jeremy, > >>> > >>> one thing that I was wondering about was that the simple naming > >>> mechanism for the various microsyntaxes may not work out. Consider > >>> > >>> "columns" : [ > >>> { "name" : "datetime", > >>> ... > >>> "microsytax": [ > >>> { "name" : N1, > >>> "regexp" : "...." > >>> }, > >>> ..... > >>> ] > >>> }, > >>> { "name" : "anothercolumn", > >>> ... > >>> "microsyntax" > >>> { "name" : N1, > >>> "regexp" : "...." > >>> }, > >>> ..... > >>> } > >>> > >>> ] > >>> > >>> > >>> When working through the cells in a row, what would 'N1' refer to? > >>> Unless we want to require the unicity of the microsyntax names, we > >>> may hit an issue. And I do not think requiring a unique name is a > >>> good idea; if the metadata becomes big, this may become a nuisance. > >> > >> Agreed. I made the assumption that all instances of "name" within a > given metadata document would need to be unique. I had not considered > any mechanisms to make this easy for users; e.g. using the "name" from > an enclosing object to automatically _namespace_ sub-names. > >> > >> We could leave it to the user to ensure uniqueness (easy for us; > adds load to the end user which is less good); in which case the > example above would fail to validate. > >> > >> Alternatively, we could apply a form of name-spacing; e.g. > "datetime/N1" and "anothercolumn/N1" within your example above. > >> > >>> > >>> What this means is that the syntax becomes more complicated. > >>> Something like {datetime:N1} or something similar (which raises the > >>> issue of escape characters, too:-( > >> > >> Agreed! I chose a different separator character to you, but the same > issue applies. > >> > >>> > >>> As for the conditionals: mustache has some syntax for this which is > >>> a bit different > >>> > >>> {{#bla}} > >>> .. any template here > >>> {{/bla}} > >>> > >>> although the mustache semantics is a bit different (afaik it relies > >>> on the existence or not of a key in an object). We could use the > >>> mustache semantics but we probably need something more, too, like > >>> "if 'bla' is a microsyntax name and is true if the value of the > cell > >>> matches the regexp then it is true". > >> > >> Syntax-wise, we want our metadata document to be valid JSON, so we > would need something different to mustache. However, I agree that our > use cases call for similar semantics. Perhaps the syntax might be > something like: > >> > >> "condition: { > >> "operator": "if ({bla})", > >> "template": { > >> "name": "2010_Occupations-csv-to-ttl", > >> "description": "Template converting CSV content to SKOS/RDF > (expressed in Turtle syntax).", > >> "type": "template", > >> "path": "2010_Occupations-csv-to-ttl.ttl", > >> "hasFormat": "text/turtle" > >> } > >> } > >> > >> In this case, I'm trying to say that the template will be triggered > if the value of {bla} is true / not null etc. ... the value of {bla} is > taken by evaluating the column (or microsyntax element) with "name" = > "bla" for the row being processed. Like you say: """it relies on the > existence or not of a key in an object""" > >> > >> (I don't really like the syntax; I guess that others can come up > with > >> better.) > > > > Ouch, you are right, I forgot about the fact that we want templates > > for conditionals:-( > > > > But before getting into the boring issue of syntax we have to decide > whether we need them... > > > >> > >>> > >>> But I agree that the conditional complicates the templates a lot. > >>> Here is where our use cases may have to switch in: do our use cases > >>> justify the need for conditionals (remembering that, though we are > >>> discussing turtle here, I do not see any difference between > >>> generating turtle and generating XML or JSON through the same > mechanism). > >> > >> The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1], > motivated by the ExpressingHierarchyWithinOccupationalListings use > case. This use case gives us two requirements: > >> > >> i) triggering a template if a value of a cell is not null; e.g. to > generate the SKOS concept scheme from the SOC structure ... > >> > >> 15-0000,,,,Computer and Mathematical Occupations,,,,, > >> ,15-1100,,,Computer Occupations,,,,, ,,15-1110,,Computer and > >> Information Research Scientists,,,,, ,,,15-1111,Computer and > >> Information Research Scientists,,,,, > >> > >> Here we can see that I only want a ex:SOC-MajorGroup entity created > on the first row shown above (where col 1 is populated). > >> > >> ii) triggering a template if a value of a cell equates to a > particular string (or the opposite); e.g. when the value of "onetsoc- > occupation" = "00" as shown in the example shown [earlier in this email > thread][3]. ... > >> > >> "operator": "if ({onetsoc-occupation} == '00')" > >> > >> Perhaps there are cases for more complex operations? I don't know. > Perhaps this is where call-back functions or promises could be used to > parse a row and provide a Boolean response as to whether the template > should be triggered? Again, I don't know ... and some considerable > thought would be required to work out the details of such. > > > > For me these seem to be convincing that we need something. My > preference would be, though, to avoid all the issues about defining > 'if'-s and 'else'-s and comparions operators, etc, etc, and fall back > on regular expressions ('match'-'not match') simply because regular > expressions are used elsewhere already. Would that be enough? > > > > Ivan > > > >> > >> Jeremy > >> > >> > >> > >> [1]: > >> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R- > Con > >> ditionalProcessingBasedOnCellValues > >> [2]: > >> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC- > Ex > >> pressingHierarchyWithinOccupationalListings > >> [3]: > >> http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html > >> > >>> > >>> My 2 cents... > >>> > >>> Ivan > >>> > >>> > >>> > >>> > >>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy > >>> <jeremy.tandy@metoffice.gov.uk> wrote: > >>> > >>>>> -----Original Message----- > >>>>> From: Dan Brickley [mailto:danbri@google.com] > >>>>> Sent: 18 June 2014 12:46 > >>>>> To: Tandy, Jeremy > >>>>> Cc: CSV on the Web Working Group > >>>>> Subject: Re: Attempted example CSV metadata document and template > >>>>> > >>>>> On 12 June 2014 12:57, Tandy, Jeremy > >>>>> <jeremy.tandy@metoffice.gov.uk> > >>>>> wrote: > >>>>>> All - > >>>>>> > >>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple > Weather > >>>>> Observation" example. I've tried to create a CSV metadata > document > >>>>> following the rules in the [Metadata Vocabulary for Tabular > >>>>> Data][2] and [Generating RDF from Tabular Data on the Web][3] > documents. > >>>>>> > >>>>>> I would be particularly interested in: > >>>>>> > >>>>>> - corrections to errors! > >>>>>> - comments on additional proposed properties in the metadata > >>>>>> document ("short-name", "template", "microsyntax") > >>>>>> - use of "hasFormat" to specify the Content-Type associated with > >>>>>> a Template > >>>>>> - use of a REGEXP within a URI Template to convert ISO 8601 > >>>>>> syntax to a simplified form > >>>>> > >>>>> I don't completely understand this mechanism yet, but do you > think > >>> it > >>>>> could be stretched to address the SKOS/codes issue in > >>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- > >>>>> ExpressingHierarchyWithinOccupationalListings > >>>>> where we'd want to explode strings like "15-1199.00", "15- > 1199.01" > >>>>> and emit triples like 'broader' when certain patterns matched? > >>>>> > >>>>> Dan > >>>>> > >>>> > >>>> OK ... let's have a go. > >>>> > >>>> Here's the header and a line of data: > >>>> > >>>> --- > >>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 > Description > >>>> 15-1199.03,Web Administrators,"Manage web environment design, > >>> deployment, development and maintenance activities. [...]" > >>>> --- > >>>> > >>>> Here's a guess at the CSV metadata description in which I am using > >>> the ["multiple regexp each extracting a single value" pattern][1]: > >>>> > >>>> --- > >>>> { > >>>> "name": "2010_Occupations", > >>>> "title": "O*NET-SEC Occupational listing for 2010", > >>>> "publisher": [{ > >>>> "name": "O*Net Resource Center", > >>>> "web": " http://www.onetcenter.org/ " > >>>> }], > >>>> "resources": [{ > >>>> "name": "2010_Occupations-csv", > >>>> "path": "2010_Occupations.csv", > >>>> "schema": {"columns": [ > >>>> { > >>>> "name": "onet-soc-2010-code", > >>>> "title": "O*NET-SOC 2010 Code", > >>>> "description": "O*NET Standard Occupational > >>> Classification Code (2010).", > >>>> "type": "string", > >>>> "required": true, > >>>> "unique": true, > >>>> "microsyntax": [{ > >>>> "name": "soc-major-group", > >>>> "regexp": "/^(\d{2})-\d{4}.\d{2}$/" > >>>> },{ > >>>> "name": "soc-minor-group", > >>>> "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/" > >>>> },{ > >>>> "name": "soc-broad-group", > >>>> "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/" > >>>> },{ > >>>> "name": "soc-detailed-occupation", > >>>> "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/" > >>>> },{ > >>>> "name": "onetsoc-occupation", > >>>> "regexp": "/^\d{2}-\d{4}.(\d{2})$/" > >>>> } > >>>> > >>>> ] > >>>> }, > >>>> { > >>>> "name": "title", > >>>> "title": "O*NET-SOC 2010 Title", > >>>> "description": "Title of occupational > classification.", > >>>> "type": "string", > >>>> "required": true > >>>> }, > >>>> { > >>>> "name": "description", > >>>> "title": "O*NET-SOC 2010 Description", > >>>> "description": Description of occupational > >>> classification.", > >>>> "type": "string", > >>>> "required": true > >>>> } > >>>> ]}, > >>>> "template": { > >>>> "name": "2010_Occupations-csv-to-ttl", > >>>> "description": "Template converting CSV content to > >>>> SKOS/RDF > >>> (expressed in Turtle syntax).", > >>>> "type": "template", > >>>> "path": "2010_Occupations-csv-to-ttl.ttl", > >>>> "hasFormat": "text/turtle" > >>>> } > >>>> }] > >>>> } > >>>> --- > >>>> > >>>> You can see that I've used the `microsyntax` object to capture the > >>>> 5 > >>> independent elements of the O*NET-SOC code each with its own > regexp: > >>> "soc-major-group", "soc-minor-group", "soc-broad-group", > >>> "soc-detailed- occupation" and "onetsoc-occupation". Whether this > is > >>> the _best_ way to do, I don't know ... it's just an idea to get us > >>> talking about possibilities and options! > >>>> > >>>> The template (prefixes etc. intentionally left out) might then be: > >>>> > >>>> --- > >>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ; > >>>> skos:notation "{onet-soc-2010-code}" ; > >>>> skos:prefLabel "{title}" ; > >>>> dct:description "{description}" ; > >>>> skos:broader ex:{soc-major-group}-0000, > >>>> ex:{soc-major-group}-{soc-minor-group}00, > >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- > >>> group}0, > >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- > >>> group}{soc-detailed-occupation} . > >>>> --- > >>>> > >>>> However, this does not help when we look at the required > >>>> _conditional > >>>> behaviour_: when the value of "onetsoc-occupation" = "00" this is > >>>> identical to the term from the SOC taxonomy, and the template > >>>> should be more like > >>>> > >>>> --- > >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc- > detaile > >>>> d- > >>> occupation} a ex:SOC-DetailedOccupation ; > >>>> skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad- > >>> group}{soc-detailed-occupation}" ; > >>>> skos:prefLabel "{title}" ; > >>>> dct:description "{description}" ; > >>>> skos:broader ex:{soc-major-group}-0000, > >>>> ex:{soc-major-group}-{soc-minor-group}00, > >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- > >>> group}0 . > >>>> --- > >>>> > >>>> It occurs to be that we may wish to trigger different templates > >>>> based > >>> on a conditional response - or even whether we wish to trigger a > >>> template at all for a given line! > >>>> > >>>> Thinking out of the box (is that a euphemism for "making it up as > I > >>> go along"?), it would seem that each "template" block in the CSV > >>> metadata might have a "condition" statement that tells it when to > >>> fire > >>> - using values of column names or microsyntax element names? e.g. > >>>> > >>>> --- > >>>> "template": { > >>>> "name": "2010_Occupations-csv-to-ttl", > >>>> "description": "Template converting CSV content to > >>>> SKOS/RDF > >>> (expressed in Turtle syntax).", > >>>> "type": "template", > >>>> "path": "2010_Occupations-csv-to-ttl.ttl", > >>>> "hasFormat": "text/turtle", > >>>> "condition": "if {soc-detailed-occupation} != '00'" > >>>> } > >>>> --- > >>>> > >>>> Default behaviour (if no "condition" statement included) would be > >>> _always_ to trigger the template for each row. > >>>> > >>>> However, looking at this, I am immediately concerned that > including > >>> if-then-else blocks and comparison operators hugely increases the > >>> complexity of our work. Perhaps this is a good point to "bug out" > to > >>> some external agent (e.g. call-back function or promise). > >>>> > >>>> Jeremy > >>>> > >>>> [1]: > >>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- > and > >>>> - > >>> te > >>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each- > >>> extracti > >>>> ng-single-value > >>>> > >>>>> > >>>>>> - thoughts about a way to describe that microsyntax format > within > >>>>>> the > >>>>> metadata document (see CellMicrosyntax requirement][4]), e.g. to > >>>>> define the sub-elements within the microsyntax that may be > >>>>> extracted for use later - see [Parsing cell microsyntax][5]. > >>>>>> > >>>>>> Comments welcome. > >>>>>> > >>>>>> Jeremy > >>>>>> > >>>>>> > >>>>>> [1]: > >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- > >>> and- > >>>>> te > >>>>>> mplate-for-simple-weather-obs-example.md > >>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html > >>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/ > >>>>>> [4]: > >>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R- > >>>>> CellMicrosynta > >>>>>> x > >>>>>> [5]: > >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- > >>> and- > >>>>> te > >>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell- > microsyntax > >>> > >>> > >>> ---- > >>> Ivan Herman, W3C > >>> Digital Publishing Activity Lead > >>> Home: http://www.w3.org/People/Ivan/ > >>> mobile: +31-641044153 > >>> GPG: 0x343F1A3D > >>> WebID: http://www.ivan-herman.net/foaf#me > > > > > > ---- > > Ivan Herman, W3C > > Digital Publishing Activity Lead > > Home: http://www.w3.org/People/Ivan/ > > mobile: +31-641044153 > > GPG: 0x343F1A3D > > WebID: http://www.ivan-herman.net/foaf#me > > > > > > > > > > >
Received on Tuesday, 24 June 2014 11:12:19 UTC