W3C home > Mailing lists > Public > public-csv-wg@w3.org > June 2014

Re: Attempted example CSV metadata document and template

From: Andy Seaborne <andy@apache.org>
Date: Tue, 24 Jun 2014 11:39:20 +0100
Message-ID: <53A95558.7070805@apache.org>
To: public-csv-wg@w3.org
(general observation)

There are ways to get conditional effects without explicit "if-the-else"

1/ Apply different templates : that is multiple passes with different 
matching conditions.

2/ A template is valid if and only if all its associated templates are 
defined (the template may not acatully be used) so that a (non-)matching 
regex is controlling whether the template is applied.

These might be applicable separately or together.

	Andy

On 23/06/14 17:35, Ivan Herman wrote:
>
> On 23 Jun 2014, at 18:03 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:
>
>>> -----Original Message-----
>>> From: Ivan Herman [mailto:ivan@w3.org]
>>> Sent: 21 June 2014 08:38
>>> To: Tandy, Jeremy
>>> Cc: Dan Brickley; W3C CSV on the Web Working Group
>>> Subject: Re: Attempted example CSV metadata document and template
>>>
>>> Jeremy,
>>>
>>> one thing that I was wondering about was that the simple naming
>>> mechanism for the various microsyntaxes may not work out. Consider
>>>
>>> 	"columns" : [
>>> 		{ "name" : "datetime",
>>> 		  ...
>>>                   "microsytax": [
>>> 			{ "name" : N1,
>>> 			  "regexp" : "...."
>>> 			},
>>> 			.....
>>>                   ]
>>> 		},
>>> 		{ "name" : "anothercolumn",
>>> 		  ...
>>> 		  "microsyntax"
>>> 			{ "name" : N1,
>>> 			  "regexp" : "...."
>>> 			},
>>> 			.....
>>> 		}
>>>
>>> 	]
>>>
>>>
>>> When working through the cells in a row, what would 'N1' refer to?
>>> Unless we want to require the unicity of the microsyntax names, we may
>>> hit an issue. And I do not think requiring a unique name is a good
>>> idea; if the metadata becomes big, this may become a nuisance.
>>
>> Agreed. I made the assumption that all instances of "name" within a given metadata document would need to be unique. I had not considered any mechanisms to make this easy for users; e.g. using the "name" from an enclosing object to automatically _namespace_ sub-names.
>>
>> We could leave it to the user to ensure uniqueness (easy for us; adds load to the end user which is less good); in which case the example above would fail to validate.
>>
>> Alternatively, we could apply a form of name-spacing; e.g. "datetime/N1" and "anothercolumn/N1" within your example above.
>>
>>>
>>> What this means is that the syntax becomes more complicated. Something
>>> like {datetime:N1} or something similar (which raises the issue of
>>> escape characters, too:-(
>>
>> Agreed! I chose a different separator character to you, but the same issue applies.
>>
>>>
>>> As for the conditionals: mustache has some syntax for this which is a
>>> bit different
>>>
>>> {{#bla}}
>>>    .. any template here
>>> {{/bla}}
>>>
>>> although the mustache semantics is a bit different (afaik it relies on
>>> the existence or not of a key in an object). We could use the mustache
>>> semantics but we probably need something more, too, like "if 'bla' is a
>>> microsyntax name and is true if the value of the cell matches the
>>> regexp then it is true".
>>
>> Syntax-wise, we want our metadata document to be valid JSON, so we would need something different to mustache. However, I agree that our use cases call for similar semantics. Perhaps the syntax might be something like:
>>
>> "condition: {
>>     "operator": "if ({bla})",
>>     "template": {
>>         "name": "2010_Occupations-csv-to-ttl",
>>         "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
>>         "type": "template",
>>         "path": "2010_Occupations-csv-to-ttl.ttl",
>>         "hasFormat": "text/turtle"
>>     }
>> }
>>
>> In this case, I'm trying to say that the template will be triggered if the value of {bla} is true / not null etc. ... the value of {bla} is taken by evaluating the column (or microsyntax element) with "name" = "bla" for the row being processed. Like you say: """it relies on the existence or not of a key in an object"""
>>
>> (I don't really like the syntax; I guess that others can come up with better.)
>
> Ouch, you are right, I forgot about the fact that we want templates for conditionals:-(
>
> But before getting into the boring issue of syntax we have to decide whether we need them...
>
>>
>>>
>>> But I agree that the conditional complicates the templates a lot. Here
>>> is where our use cases may have to switch in: do our use cases justify
>>> the need for conditionals (remembering that, though we are discussing
>>> turtle here, I do not see any difference between generating turtle and
>>> generating XML or JSON through the same mechanism).
>>
>> The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1], motivated by the ExpressingHierarchyWithinOccupationalListings use case. This use case gives us two requirements:
>>
>> i) triggering a template if a value of a cell is not null; e.g. to generate the SKOS concept scheme from the SOC structure ...
>>
>> 15-0000,,,,Computer and Mathematical Occupations,,,,,
>> ,15-1100,,,Computer Occupations,,,,,
>> ,,15-1110,,Computer and Information Research Scientists,,,,,
>> ,,,15-1111,Computer and Information Research Scientists,,,,,
>>
>> Here we can see that I only want a ex:SOC-MajorGroup entity created on the first row shown above (where col 1 is populated).
>>
>> ii) triggering a template if a value of a cell equates to a particular string (or the opposite); e.g. when the value of "onetsoc-occupation" = "00" as shown in the example shown [earlier in this email thread][3]. ...
>>
>> "operator": "if ({onetsoc-occupation} == '00')"
>>
>> Perhaps there are cases for more complex operations? I don't know. Perhaps this is where call-back functions or promises could be used to parse a row and provide a Boolean response as to whether the template should be triggered? Again, I don't know ... and some considerable thought would be required to work out the details of such.
>
> For me these seem to be convincing that we need something. My preference would be, though, to avoid all the issues about defining 'if'-s and 'else'-s and comparions operators, etc, etc, and fall back on regular expressions ('match'-'not match') simply because regular expressions are used elsewhere already. Would that be enough?
>
> Ivan
>
>>
>> Jeremy
>>
>>
>>
>> [1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-ConditionalProcessingBasedOnCellValues
>> [2]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-ExpressingHierarchyWithinOccupationalListings
>> [3]: http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html
>>
>>>
>>> My 2 cents...
>>>
>>> Ivan
>>>
>>>
>>>
>>>
>>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy
>>> <jeremy.tandy@metoffice.gov.uk> wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: Dan Brickley [mailto:danbri@google.com]
>>>>> Sent: 18 June 2014 12:46
>>>>> To: Tandy, Jeremy
>>>>> Cc: CSV on the Web Working Group
>>>>> Subject: Re: Attempted example CSV metadata document and template
>>>>>
>>>>> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
>>>>> wrote:
>>>>>> All -
>>>>>>
>>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple Weather
>>>>> Observation" example. I've tried to create a CSV metadata document
>>>>> following the rules in the [Metadata Vocabulary for Tabular Data][2]
>>>>> and [Generating RDF from Tabular Data on the Web][3] documents.
>>>>>>
>>>>>> I would be particularly interested in:
>>>>>>
>>>>>> - corrections to errors!
>>>>>> - comments on additional proposed properties in the metadata
>>>>>> document ("short-name", "template", "microsyntax")
>>>>>> - use of "hasFormat" to specify the Content-Type associated with a
>>>>>> Template
>>>>>> - use of a REGEXP within a URI Template to convert ISO 8601 syntax
>>>>>> to a simplified form
>>>>>
>>>>> I don't completely understand this mechanism yet, but do you think
>>> it
>>>>> could be stretched to address the SKOS/codes issue in
>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
>>>>> ExpressingHierarchyWithinOccupationalListings
>>>>> where we'd want to explode strings like "15-1199.00", "15-1199.01"
>>>>> and emit triples like 'broader' when certain patterns matched?
>>>>>
>>>>> Dan
>>>>>
>>>>
>>>> OK ... let's have a go.
>>>>
>>>> Here's the header and a line of data:
>>>>
>>>> ---
>>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description
>>>> 15-1199.03,Web Administrators,"Manage web environment design,
>>> deployment, development and maintenance activities. [...]"
>>>> ---
>>>>
>>>> Here's a guess at the CSV metadata description in which I am using
>>> the ["multiple regexp each extracting a single value" pattern][1]:
>>>>
>>>> ---
>>>> {
>>>>   "name": "2010_Occupations",
>>>>   "title": "O*NET-SEC Occupational listing for 2010",
>>>>   "publisher": [{
>>>>       "name": "O*Net Resource Center",
>>>>       "web": " http://www.onetcenter.org/ "
>>>>   }],
>>>>   "resources": [{
>>>>       "name": "2010_Occupations-csv",
>>>>       "path": "2010_Occupations.csv",
>>>>       "schema": {"columns": [
>>>>           {
>>>>               "name": "onet-soc-2010-code",
>>>>               "title": "O*NET-SOC 2010 Code",
>>>>               "description": "O*NET Standard Occupational
>>> Classification Code (2010).",
>>>>               "type": "string",
>>>>               "required": true,
>>>>               "unique": true,
>>>>               "microsyntax": [{
>>>>                       "name": "soc-major-group",
>>>>                       "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
>>>>                   },{
>>>>                       "name": "soc-minor-group",
>>>>                       "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
>>>>                   },{
>>>>                       "name": "soc-broad-group",
>>>>                       "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
>>>>                   },{
>>>>                       "name": "soc-detailed-occupation",
>>>>                       "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
>>>>                   },{
>>>>                       "name": "onetsoc-occupation",
>>>>                       "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
>>>>                   }
>>>>
>>>>               ]
>>>>           },
>>>>           {
>>>>               "name": "title",
>>>>               "title": "O*NET-SOC 2010 Title",
>>>>               "description": "Title of occupational classification.",
>>>>               "type": "string",
>>>>               "required": true
>>>>           },
>>>>           {
>>>>               "name": "description",
>>>>               "title": "O*NET-SOC 2010 Description",
>>>>               "description": Description of occupational
>>> classification.",
>>>>               "type": "string",
>>>>               "required": true
>>>>           }
>>>>       ]},
>>>>       "template": {
>>>>           "name": "2010_Occupations-csv-to-ttl",
>>>>           "description": "Template converting CSV content to SKOS/RDF
>>> (expressed in Turtle syntax).",
>>>>           "type": "template",
>>>>           "path": "2010_Occupations-csv-to-ttl.ttl",
>>>>           "hasFormat": "text/turtle"
>>>>       }
>>>>   }]
>>>> }
>>>> ---
>>>>
>>>> You can see that I've used the `microsyntax` object to capture the 5
>>> independent elements of the O*NET-SOC code each with its own regexp:
>>> "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-
>>> occupation" and "onetsoc-occupation". Whether this is the _best_ way to
>>> do, I don't know ... it's just an idea to get us talking about
>>> possibilities and options!
>>>>
>>>> The template (prefixes etc. intentionally left out) might then be:
>>>>
>>>> ---
>>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
>>>>    skos:notation "{onet-soc-2010-code}" ;
>>>>    skos:prefLabel "{title}" ;
>>>>    dct:description "{description}" ;
>>>>    skos:broader ex:{soc-major-group}-0000,
>>>>                 ex:{soc-major-group}-{soc-minor-group}00,
>>>>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>>> group}0,
>>>>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>>> group}{soc-detailed-occupation} .
>>>> ---
>>>>
>>>> However, this does not help when we look at the required _conditional
>>>> behaviour_: when the value of "onetsoc-occupation" = "00" this is
>>>> identical to the term from the SOC taxonomy, and the template should
>>>> be more like
>>>>
>>>> ---
>>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-
>>> occupation} a ex:SOC-DetailedOccupation ;
>>>>    skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-
>>> group}{soc-detailed-occupation}" ;
>>>>    skos:prefLabel "{title}" ;
>>>>    dct:description "{description}" ;
>>>>    skos:broader ex:{soc-major-group}-0000,
>>>>                 ex:{soc-major-group}-{soc-minor-group}00,
>>>>                 ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>>> group}0 .
>>>> ---
>>>>
>>>> It occurs to be that we may wish to trigger different templates based
>>> on a conditional response - or even whether we wish to trigger a
>>> template at all for a given line!
>>>>
>>>> Thinking out of the box (is that a euphemism for "making it up as I
>>> go along"?), it would seem that each "template" block in the CSV
>>> metadata might have a "condition" statement that tells it when to fire
>>> - using values of column names or microsyntax element names? e.g.
>>>>
>>>> ---
>>>>       "template": {
>>>>           "name": "2010_Occupations-csv-to-ttl",
>>>>           "description": "Template converting CSV content to SKOS/RDF
>>> (expressed in Turtle syntax).",
>>>>           "type": "template",
>>>>           "path": "2010_Occupations-csv-to-ttl.ttl",
>>>>           "hasFormat": "text/turtle",
>>>>           "condition": "if {soc-detailed-occupation} != '00'"
>>>>       }
>>>> ---
>>>>
>>>> Default behaviour (if no "condition" statement included) would be
>>> _always_ to trigger the template for each row.
>>>>
>>>> However, looking at this, I am immediately concerned that including
>>> if-then-else blocks and comparison operators hugely increases the
>>> complexity of our work. Perhaps this is a good point to "bug out" to
>>> some external agent (e.g. call-back function or promise).
>>>>
>>>> Jeremy
>>>>
>>>> [1]:
>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-
>>> te
>>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each-
>>> extracti
>>>> ng-single-value
>>>>
>>>>>
>>>>>> - thoughts about a way to describe that microsyntax format within
>>>>>> the
>>>>> metadata document (see CellMicrosyntax requirement][4]), e.g. to
>>>>> define the sub-elements within the microsyntax that may be extracted
>>>>> for use later - see [Parsing cell microsyntax][5].
>>>>>>
>>>>>> Comments welcome.
>>>>>>
>>>>>> Jeremy
>>>>>>
>>>>>>
>>>>>> [1]:
>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
>>> and-
>>>>> te
>>>>>> mplate-for-simple-weather-obs-example.md
>>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html
>>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/
>>>>>> [4]:
>>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
>>>>> CellMicrosynta
>>>>>> x
>>>>>> [5]:
>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
>>> and-
>>>>> te
>>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax
>>>
>>>
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> WebID: http://www.ivan-herman.net/foaf#me
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>
Received on Tuesday, 24 June 2014 10:39:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:40 UTC