Re: Attempted example CSV metadata document and template from Ivan Herman on 2014-06-21 (public-csv-wg@w3.org from June 2014)

From: Ivan Herman <ivan@w3.org>
Date: Sat, 21 Jun 2014 09:38:12 +0200
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Cc: Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <0EDBA5D3-6E79-48C4-9EA0-D62621915087@w3.org>
Jeremy,

one thing that I was wondering about was that the simple naming mechanism for the various microsyntaxes may not work out. Consider

	"columns" : [
		{ "name" : "datetime",
		  ...
                  "microsytax": [
			{ "name" : N1,
			  "regexp" : "...."
			},
			.....
                  ]
		},
		{ "name" : "anothercolumn",
		  ...
		  "microsyntax"
			{ "name" : N1,
			  "regexp" : "...."
			},
			.....
		}

	] 


When working through the cells in a row, what would 'N1' refer to? Unless we want to require the unicity of the microsyntax names, we may hit an issue. And I do not think requiring a unique name is a good idea; if the metadata becomes big, this may become a nuisance.

What this means is that the syntax becomes more complicated. Something like {datetime:N1} or something similar (which raises the issue of escape characters, too:-(

As for the conditionals: mustache has some syntax for this which is a bit different

{{#bla}}
   .. any template here
{{/bla}}

although the mustache semantics is a bit different (afaik it relies on the existence or not of a key in an object). We could use the mustache semantics but we probably need something more, too, like "if 'bla' is a microsyntax name and is true if the value of the cell matches the regexp then it is true".

But I agree that the conditional complicates the templates a lot. Here is where our use cases may have to switch in: do our use cases justify the need for conditionals (remembering that, though we are discussing turtle here, I do not see any difference between generating turtle and generating XML or JSON through the same mechanism).

My 2 cents...

Ivan




On 19 Jun 2014, at 14:36 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:

>> -----Original Message-----
>> From: Dan Brickley [mailto:danbri@google.com]
>> Sent: 18 June 2014 12:46
>> To: Tandy, Jeremy
>> Cc: CSV on the Web Working Group
>> Subject: Re: Attempted example CSV metadata document and template
>> 
>> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
>> wrote:
>>> All -
>>> 
>>> I've just uploaded to [GitHub][1] a rework of the "Simple Weather
>> Observation" example. I've tried to create a CSV metadata document
>> following the rules in the [Metadata Vocabulary for Tabular Data][2]
>> and [Generating RDF from Tabular Data on the Web][3] documents.
>>> 
>>> I would be particularly interested in:
>>> 
>>> - corrections to errors!
>>> - comments on additional proposed properties in the metadata document
>>> ("short-name", "template", "microsyntax")
>>> - use of "hasFormat" to specify the Content-Type associated with a
>>> Template
>>> - use of a REGEXP within a URI Template to convert ISO 8601 syntax to
>>> a simplified form
>> 
>> I don't completely understand this mechanism yet, but do you think it
>> could be stretched to address the SKOS/codes issue in
>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
>> ExpressingHierarchyWithinOccupationalListings
>> where we'd want to explode strings like "15-1199.00", "15-1199.01" and
>> emit triples like 'broader' when certain patterns matched?
>> 
>> Dan
>> 
> 
> OK ... let's have a go.
> 
> Here's the header and a line of data:
> 
> ---
> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description
> 15-1199.03,Web Administrators,"Manage web environment design, deployment, development and maintenance activities. [...]"
> ---
> 
> Here's a guess at the CSV metadata description in which I am using the ["multiple regexp each extracting a single value" pattern][1]:
> 
> ---
> {
>   "name": "2010_Occupations",
>   "title": "O*NET-SEC Occupational listing for 2010",
>   "publisher": [{
>       "name": "O*Net Resource Center",
>       "web": " http://www.onetcenter.org/ "
>   }],
>   "resources": [{
>       "name": "2010_Occupations-csv",
>       "path": "2010_Occupations.csv",
>       "schema": {"columns": [
>           {
>               "name": "onet-soc-2010-code",
>               "title": "O*NET-SOC 2010 Code",
>               "description": "O*NET Standard Occupational Classification Code (2010).",
>               "type": "string",
>               "required": true,
>               "unique": true, 
>               "microsyntax": [{
>                       "name": "soc-major-group",
>                       "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
>                   },{
>                       "name": "soc-minor-group",
>                       "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
>                   },{
>                       "name": "soc-broad-group",
>                       "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
>                   },{
>                       "name": "soc-detailed-occupation",
>                       "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
>                   },{
>                       "name": "onetsoc-occupation",
>                       "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
>                   }
> 
>               ]
>           },
>           {
>               "name": "title",
>               "title": "O*NET-SOC 2010 Title",
>               "description": "Title of occupational classification.",
>               "type": "string",
>               "required": true
>           },
>           {
>               "name": "description",
>               "title": "O*NET-SOC 2010 Description",
>               "description": Description of occupational classification.",
>               "type": "string",
>               "required": true
>           }
>       ]},
>       "template": {
>           "name": "2010_Occupations-csv-to-ttl",
>           "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
>           "type": "template",
>           "path": "2010_Occupations-csv-to-ttl.ttl",
>           "hasFormat": "text/turtle"
>       }
>   }]
> }
> ---
> 
> You can see that I've used the `microsyntax` object to capture the 5 independent elements of the O*NET-SOC code each with its own regexp: "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-occupation" and "onetsoc-occupation". Whether this is the _best_ way to do, I don't know ... it's just an idea to get us talking about possibilities and options!
> 
> The template (prefixes etc. intentionally left out) might then be:
> 
> ---
> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
>    skos:notation "{onet-soc-2010-code}" ;
>    skos:prefLabel "{title}" ;
>    dct:description "{description}" ;
>    skos:broader ex:{soc-major-group}-0000, 
>                 ex:{soc-major-group}-{soc-minor-group}00, 
>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0,
>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} .
> ---
> 
> However, this does not help when we look at the required _conditional behaviour_: when the value of "onetsoc-occupation" = "00" this is identical to the term from the SOC taxonomy, and the template should be more like
> 
> ---
> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation} a ex:SOC-DetailedOccupation ;
>    skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-occupation}" ;
>    skos:prefLabel "{title}" ;
>    dct:description "{description}" ;
>    skos:broader ex:{soc-major-group}-0000, 
>                 ex:{soc-major-group}-{soc-minor-group}00, 
>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}0 .
> ---
> 
> It occurs to be that we may wish to trigger different templates based on a conditional response - or even whether we wish to trigger a template at all for a given line!
> 
> Thinking out of the box (is that a euphemism for "making it up as I go along"?), it would seem that each "template" block in the CSV metadata might have a "condition" statement that tells it when to fire - using values of column names or microsyntax element names? e.g.
> 
> ---
>       "template": {
>           "name": "2010_Occupations-csv-to-ttl",
>           "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
>           "type": "template",
>           "path": "2010_Occupations-csv-to-ttl.ttl",
>           "hasFormat": "text/turtle",
>           "condition": "if {soc-detailed-occupation} != '00'"
>       }
> ---
> 
> Default behaviour (if no "condition" statement included) would be _always_ to trigger the template for each row.
> 
> However, looking at this, I am immediately concerned that including if-then-else blocks and comparison operators hugely increases the complexity of our work. Perhaps this is a good point to "bug out" to some external agent (e.g. call-back function or promise).
> 
> Jeremy
> 
> [1]: https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-template-for-simple-weather-obs-example.md#multiple-regexp-each-extracting-single-value
> 
>> 
>>> - thoughts about a way to describe that microsyntax format within the
>> metadata document (see CellMicrosyntax requirement][4]), e.g. to define
>> the sub-elements within the microsyntax that may be extracted for use
>> later - see [Parsing cell microsyntax][5].
>>> 
>>> Comments welcome.
>>> 
>>> Jeremy
>>> 
>>> 
>>> [1]:
>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-
>> te
>>> mplate-for-simple-weather-obs-example.md
>>> [2]: http://w3c.github.io/csvw/metadata/index.html
>>> [3]: http://w3c.github.io/csvw/csv2rdf/
>>> [4]:
>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
>> CellMicrosynta
>>> x
>>> [5]:
>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-
>> te
>>> mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me
Received on Saturday, 21 June 2014 07:38:44 UTC