W3C home > Mailing lists > Public > public-csv-wg@w3.org > June 2014

RE: Attempted example CSV metadata document and template

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Tue, 24 Jun 2014 11:11:50 +0000
To: Andy Seaborne <andy@apache.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE20884C839@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Hi Andy -

Hopefully the [worked example][1] that I've just created illustrate your point. Please feel free to fix / amend / re-write as necessary!!!

In both examples, I've created multiple templates, which are configured to be triggered on a matching condition.

Jeremy

[1]: https://github.com/w3c/csvw/blob/gh-pages/examples/conditional-matching-in-occupational-listing-hierarchy.md 

> -----Original Message-----
> From: Andy Seaborne [mailto:andy@apache.org]
> Sent: 24 June 2014 11:39
> To: public-csv-wg@w3.org
> Subject: Re: Attempted example CSV metadata document and template
> 
> (general observation)
> 
> There are ways to get conditional effects without explicit "if-the-
> else"
> 
> 1/ Apply different templates : that is multiple passes with different
> matching conditions.
> 
> 2/ A template is valid if and only if all its associated templates are
> defined (the template may not acatully be used) so that a (non-
> )matching regex is controlling whether the template is applied.
> 
> These might be applicable separately or together.
> 
> 	Andy
> 
> On 23/06/14 17:35, Ivan Herman wrote:
> >
> > On 23 Jun 2014, at 18:03 , Tandy, Jeremy
> <jeremy.tandy@metoffice.gov.uk> wrote:
> >
> >>> -----Original Message-----
> >>> From: Ivan Herman [mailto:ivan@w3.org]
> >>> Sent: 21 June 2014 08:38
> >>> To: Tandy, Jeremy
> >>> Cc: Dan Brickley; W3C CSV on the Web Working Group
> >>> Subject: Re: Attempted example CSV metadata document and template
> >>>
> >>> Jeremy,
> >>>
> >>> one thing that I was wondering about was that the simple naming
> >>> mechanism for the various microsyntaxes may not work out. Consider
> >>>
> >>> 	"columns" : [
> >>> 		{ "name" : "datetime",
> >>> 		  ...
> >>>                   "microsytax": [
> >>> 			{ "name" : N1,
> >>> 			  "regexp" : "...."
> >>> 			},
> >>> 			.....
> >>>                   ]
> >>> 		},
> >>> 		{ "name" : "anothercolumn",
> >>> 		  ...
> >>> 		  "microsyntax"
> >>> 			{ "name" : N1,
> >>> 			  "regexp" : "...."
> >>> 			},
> >>> 			.....
> >>> 		}
> >>>
> >>> 	]
> >>>
> >>>
> >>> When working through the cells in a row, what would 'N1' refer to?
> >>> Unless we want to require the unicity of the microsyntax names, we
> >>> may hit an issue. And I do not think requiring a unique name is a
> >>> good idea; if the metadata becomes big, this may become a nuisance.
> >>
> >> Agreed. I made the assumption that all instances of "name" within a
> given metadata document would need to be unique. I had not considered
> any mechanisms to make this easy for users; e.g. using the "name" from
> an enclosing object to automatically _namespace_ sub-names.
> >>
> >> We could leave it to the user to ensure uniqueness (easy for us;
> adds load to the end user which is less good); in which case the
> example above would fail to validate.
> >>
> >> Alternatively, we could apply a form of name-spacing; e.g.
> "datetime/N1" and "anothercolumn/N1" within your example above.
> >>
> >>>
> >>> What this means is that the syntax becomes more complicated.
> >>> Something like {datetime:N1} or something similar (which raises the
> >>> issue of escape characters, too:-(
> >>
> >> Agreed! I chose a different separator character to you, but the same
> issue applies.
> >>
> >>>
> >>> As for the conditionals: mustache has some syntax for this which is
> >>> a bit different
> >>>
> >>> {{#bla}}
> >>>    .. any template here
> >>> {{/bla}}
> >>>
> >>> although the mustache semantics is a bit different (afaik it relies
> >>> on the existence or not of a key in an object). We could use the
> >>> mustache semantics but we probably need something more, too, like
> >>> "if 'bla' is a microsyntax name and is true if the value of the
> cell
> >>> matches the regexp then it is true".
> >>
> >> Syntax-wise, we want our metadata document to be valid JSON, so we
> would need something different to mustache. However, I agree that our
> use cases call for similar semantics. Perhaps the syntax might be
> something like:
> >>
> >> "condition: {
> >>     "operator": "if ({bla})",
> >>     "template": {
> >>         "name": "2010_Occupations-csv-to-ttl",
> >>         "description": "Template converting CSV content to SKOS/RDF
> (expressed in Turtle syntax).",
> >>         "type": "template",
> >>         "path": "2010_Occupations-csv-to-ttl.ttl",
> >>         "hasFormat": "text/turtle"
> >>     }
> >> }
> >>
> >> In this case, I'm trying to say that the template will be triggered
> if the value of {bla} is true / not null etc. ... the value of {bla} is
> taken by evaluating the column (or microsyntax element) with "name" =
> "bla" for the row being processed. Like you say: """it relies on the
> existence or not of a key in an object"""
> >>
> >> (I don't really like the syntax; I guess that others can come up
> with
> >> better.)
> >
> > Ouch, you are right, I forgot about the fact that we want templates
> > for conditionals:-(
> >
> > But before getting into the boring issue of syntax we have to decide
> whether we need them...
> >
> >>
> >>>
> >>> But I agree that the conditional complicates the templates a lot.
> >>> Here is where our use cases may have to switch in: do our use cases
> >>> justify the need for conditionals (remembering that, though we are
> >>> discussing turtle here, I do not see any difference between
> >>> generating turtle and generating XML or JSON through the same
> mechanism).
> >>
> >> The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1],
> motivated by the ExpressingHierarchyWithinOccupationalListings use
> case. This use case gives us two requirements:
> >>
> >> i) triggering a template if a value of a cell is not null; e.g. to
> generate the SKOS concept scheme from the SOC structure ...
> >>
> >> 15-0000,,,,Computer and Mathematical Occupations,,,,,
> >> ,15-1100,,,Computer Occupations,,,,, ,,15-1110,,Computer and
> >> Information Research Scientists,,,,, ,,,15-1111,Computer and
> >> Information Research Scientists,,,,,
> >>
> >> Here we can see that I only want a ex:SOC-MajorGroup entity created
> on the first row shown above (where col 1 is populated).
> >>
> >> ii) triggering a template if a value of a cell equates to a
> particular string (or the opposite); e.g. when the value of "onetsoc-
> occupation" = "00" as shown in the example shown [earlier in this email
> thread][3]. ...
> >>
> >> "operator": "if ({onetsoc-occupation} == '00')"
> >>
> >> Perhaps there are cases for more complex operations? I don't know.
> Perhaps this is where call-back functions or promises could be used to
> parse a row and provide a Boolean response as to whether the template
> should be triggered? Again, I don't know ... and some considerable
> thought would be required to work out the details of such.
> >
> > For me these seem to be convincing that we need something. My
> preference would be, though, to avoid all the issues about defining
> 'if'-s and 'else'-s and comparions operators, etc, etc, and fall back
> on regular expressions ('match'-'not match') simply because regular
> expressions are used elsewhere already. Would that be enough?
> >
> > Ivan
> >
> >>
> >> Jeremy
> >>
> >>
> >>
> >> [1]:
> >> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-
> Con
> >> ditionalProcessingBasedOnCellValues
> >> [2]:
> >> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-
> Ex
> >> pressingHierarchyWithinOccupationalListings
> >> [3]:
> >> http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html
> >>
> >>>
> >>> My 2 cents...
> >>>
> >>> Ivan
> >>>
> >>>
> >>>
> >>>
> >>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy
> >>> <jeremy.tandy@metoffice.gov.uk> wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Dan Brickley [mailto:danbri@google.com]
> >>>>> Sent: 18 June 2014 12:46
> >>>>> To: Tandy, Jeremy
> >>>>> Cc: CSV on the Web Working Group
> >>>>> Subject: Re: Attempted example CSV metadata document and template
> >>>>>
> >>>>> On 12 June 2014 12:57, Tandy, Jeremy
> >>>>> <jeremy.tandy@metoffice.gov.uk>
> >>>>> wrote:
> >>>>>> All -
> >>>>>>
> >>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple
> Weather
> >>>>> Observation" example. I've tried to create a CSV metadata
> document
> >>>>> following the rules in the [Metadata Vocabulary for Tabular
> >>>>> Data][2] and [Generating RDF from Tabular Data on the Web][3]
> documents.
> >>>>>>
> >>>>>> I would be particularly interested in:
> >>>>>>
> >>>>>> - corrections to errors!
> >>>>>> - comments on additional proposed properties in the metadata
> >>>>>> document ("short-name", "template", "microsyntax")
> >>>>>> - use of "hasFormat" to specify the Content-Type associated with
> >>>>>> a Template
> >>>>>> - use of a REGEXP within a URI Template to convert ISO 8601
> >>>>>> syntax to a simplified form
> >>>>>
> >>>>> I don't completely understand this mechanism yet, but do you
> think
> >>> it
> >>>>> could be stretched to address the SKOS/codes issue in
> >>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> >>>>> ExpressingHierarchyWithinOccupationalListings
> >>>>> where we'd want to explode strings like "15-1199.00", "15-
> 1199.01"
> >>>>> and emit triples like 'broader' when certain patterns matched?
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>
> >>>> OK ... let's have a go.
> >>>>
> >>>> Here's the header and a line of data:
> >>>>
> >>>> ---
> >>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010
> Description
> >>>> 15-1199.03,Web Administrators,"Manage web environment design,
> >>> deployment, development and maintenance activities. [...]"
> >>>> ---
> >>>>
> >>>> Here's a guess at the CSV metadata description in which I am using
> >>> the ["multiple regexp each extracting a single value" pattern][1]:
> >>>>
> >>>> ---
> >>>> {
> >>>>   "name": "2010_Occupations",
> >>>>   "title": "O*NET-SEC Occupational listing for 2010",
> >>>>   "publisher": [{
> >>>>       "name": "O*Net Resource Center",
> >>>>       "web": " http://www.onetcenter.org/ "
> >>>>   }],
> >>>>   "resources": [{
> >>>>       "name": "2010_Occupations-csv",
> >>>>       "path": "2010_Occupations.csv",
> >>>>       "schema": {"columns": [
> >>>>           {
> >>>>               "name": "onet-soc-2010-code",
> >>>>               "title": "O*NET-SOC 2010 Code",
> >>>>               "description": "O*NET Standard Occupational
> >>> Classification Code (2010).",
> >>>>               "type": "string",
> >>>>               "required": true,
> >>>>               "unique": true,
> >>>>               "microsyntax": [{
> >>>>                       "name": "soc-major-group",
> >>>>                       "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
> >>>>                   },{
> >>>>                       "name": "soc-minor-group",
> >>>>                       "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
> >>>>                   },{
> >>>>                       "name": "soc-broad-group",
> >>>>                       "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
> >>>>                   },{
> >>>>                       "name": "soc-detailed-occupation",
> >>>>                       "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
> >>>>                   },{
> >>>>                       "name": "onetsoc-occupation",
> >>>>                       "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
> >>>>                   }
> >>>>
> >>>>               ]
> >>>>           },
> >>>>           {
> >>>>               "name": "title",
> >>>>               "title": "O*NET-SOC 2010 Title",
> >>>>               "description": "Title of occupational
> classification.",
> >>>>               "type": "string",
> >>>>               "required": true
> >>>>           },
> >>>>           {
> >>>>               "name": "description",
> >>>>               "title": "O*NET-SOC 2010 Description",
> >>>>               "description": Description of occupational
> >>> classification.",
> >>>>               "type": "string",
> >>>>               "required": true
> >>>>           }
> >>>>       ]},
> >>>>       "template": {
> >>>>           "name": "2010_Occupations-csv-to-ttl",
> >>>>           "description": "Template converting CSV content to
> >>>> SKOS/RDF
> >>> (expressed in Turtle syntax).",
> >>>>           "type": "template",
> >>>>           "path": "2010_Occupations-csv-to-ttl.ttl",
> >>>>           "hasFormat": "text/turtle"
> >>>>       }
> >>>>   }]
> >>>> }
> >>>> ---
> >>>>
> >>>> You can see that I've used the `microsyntax` object to capture the
> >>>> 5
> >>> independent elements of the O*NET-SOC code each with its own
> regexp:
> >>> "soc-major-group", "soc-minor-group", "soc-broad-group",
> >>> "soc-detailed- occupation" and "onetsoc-occupation". Whether this
> is
> >>> the _best_ way to do, I don't know ... it's just an idea to get us
> >>> talking about possibilities and options!
> >>>>
> >>>> The template (prefixes etc. intentionally left out) might then be:
> >>>>
> >>>> ---
> >>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
> >>>>    skos:notation "{onet-soc-2010-code}" ;
> >>>>    skos:prefLabel "{title}" ;
> >>>>    dct:description "{description}" ;
> >>>>    skos:broader ex:{soc-major-group}-0000,
> >>>>                 ex:{soc-major-group}-{soc-minor-group}00,
> >>>>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> >>> group}0,
> >>>>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> >>> group}{soc-detailed-occupation} .
> >>>> ---
> >>>>
> >>>> However, this does not help when we look at the required
> >>>> _conditional
> >>>> behaviour_: when the value of "onetsoc-occupation" = "00" this is
> >>>> identical to the term from the SOC taxonomy, and the template
> >>>> should be more like
> >>>>
> >>>> ---
> >>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-
> detaile
> >>>> d-
> >>> occupation} a ex:SOC-DetailedOccupation ;
> >>>>    skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-
> >>> group}{soc-detailed-occupation}" ;
> >>>>    skos:prefLabel "{title}" ;
> >>>>    dct:description "{description}" ;
> >>>>    skos:broader ex:{soc-major-group}-0000,
> >>>>                 ex:{soc-major-group}-{soc-minor-group}00,
> >>>>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> >>> group}0 .
> >>>> ---
> >>>>
> >>>> It occurs to be that we may wish to trigger different templates
> >>>> based
> >>> on a conditional response - or even whether we wish to trigger a
> >>> template at all for a given line!
> >>>>
> >>>> Thinking out of the box (is that a euphemism for "making it up as
> I
> >>> go along"?), it would seem that each "template" block in the CSV
> >>> metadata might have a "condition" statement that tells it when to
> >>> fire
> >>> - using values of column names or microsyntax element names? e.g.
> >>>>
> >>>> ---
> >>>>       "template": {
> >>>>           "name": "2010_Occupations-csv-to-ttl",
> >>>>           "description": "Template converting CSV content to
> >>>> SKOS/RDF
> >>> (expressed in Turtle syntax).",
> >>>>           "type": "template",
> >>>>           "path": "2010_Occupations-csv-to-ttl.ttl",
> >>>>           "hasFormat": "text/turtle",
> >>>>           "condition": "if {soc-detailed-occupation} != '00'"
> >>>>       }
> >>>> ---
> >>>>
> >>>> Default behaviour (if no "condition" statement included) would be
> >>> _always_ to trigger the template for each row.
> >>>>
> >>>> However, looking at this, I am immediately concerned that
> including
> >>> if-then-else blocks and comparison operators hugely increases the
> >>> complexity of our work. Perhaps this is a good point to "bug out"
> to
> >>> some external agent (e.g. call-back function or promise).
> >>>>
> >>>> Jeremy
> >>>>
> >>>> [1]:
> >>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
> and
> >>>> -
> >>> te
> >>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each-
> >>> extracti
> >>>> ng-single-value
> >>>>
> >>>>>
> >>>>>> - thoughts about a way to describe that microsyntax format
> within
> >>>>>> the
> >>>>> metadata document (see CellMicrosyntax requirement][4]), e.g. to
> >>>>> define the sub-elements within the microsyntax that may be
> >>>>> extracted for use later - see [Parsing cell microsyntax][5].
> >>>>>>
> >>>>>> Comments welcome.
> >>>>>>
> >>>>>> Jeremy
> >>>>>>
> >>>>>>
> >>>>>> [1]:
> >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
> >>> and-
> >>>>> te
> >>>>>> mplate-for-simple-weather-obs-example.md
> >>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html
> >>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/
> >>>>>> [4]:
> >>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
> >>>>> CellMicrosynta
> >>>>>> x
> >>>>>> [5]:
> >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
> >>> and-
> >>>>> te
> >>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell-
> microsyntax
> >>>
> >>>
> >>> ----
> >>> Ivan Herman, W3C
> >>> Digital Publishing Activity Lead
> >>> Home: http://www.w3.org/People/Ivan/
> >>> mobile: +31-641044153
> >>> GPG: 0x343F1A3D
> >>> WebID: http://www.ivan-herman.net/foaf#me
> >
> >
> > ----
> > Ivan Herman, W3C
> > Digital Publishing Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > GPG: 0x343F1A3D
> > WebID: http://www.ivan-herman.net/foaf#me
> >
> >
> >
> >
> >
> 
Received on Tuesday, 24 June 2014 11:12:19 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:40 UTC