W3C home > Mailing lists > Public > public-csv-wg@w3.org > June 2014

Re: Attempted example CSV metadata document and template

From: Ivan Herman <ivan@w3.org>
Date: Mon, 23 Jun 2014 18:35:28 +0200
Cc: Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <F9813ACB-3F46-42B7-BBDE-83CD0C1C0D18@w3.org>
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>

On 23 Jun 2014, at 18:03 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:

>> -----Original Message-----
>> From: Ivan Herman [mailto:ivan@w3.org]
>> Sent: 21 June 2014 08:38
>> To: Tandy, Jeremy
>> Cc: Dan Brickley; W3C CSV on the Web Working Group
>> Subject: Re: Attempted example CSV metadata document and template
>> 
>> Jeremy,
>> 
>> one thing that I was wondering about was that the simple naming
>> mechanism for the various microsyntaxes may not work out. Consider
>> 
>> 	"columns" : [
>> 		{ "name" : "datetime",
>> 		  ...
>>                  "microsytax": [
>> 			{ "name" : N1,
>> 			  "regexp" : "...."
>> 			},
>> 			.....
>>                  ]
>> 		},
>> 		{ "name" : "anothercolumn",
>> 		  ...
>> 		  "microsyntax"
>> 			{ "name" : N1,
>> 			  "regexp" : "...."
>> 			},
>> 			.....
>> 		}
>> 
>> 	]
>> 
>> 
>> When working through the cells in a row, what would 'N1' refer to?
>> Unless we want to require the unicity of the microsyntax names, we may
>> hit an issue. And I do not think requiring a unique name is a good
>> idea; if the metadata becomes big, this may become a nuisance.
> 
> Agreed. I made the assumption that all instances of "name" within a given metadata document would need to be unique. I had not considered any mechanisms to make this easy for users; e.g. using the "name" from an enclosing object to automatically _namespace_ sub-names.
> 
> We could leave it to the user to ensure uniqueness (easy for us; adds load to the end user which is less good); in which case the example above would fail to validate.
> 
> Alternatively, we could apply a form of name-spacing; e.g. "datetime/N1" and "anothercolumn/N1" within your example above.
> 
>> 
>> What this means is that the syntax becomes more complicated. Something
>> like {datetime:N1} or something similar (which raises the issue of
>> escape characters, too:-(
> 
> Agreed! I chose a different separator character to you, but the same issue applies.
> 
>> 
>> As for the conditionals: mustache has some syntax for this which is a
>> bit different
>> 
>> {{#bla}}
>>   .. any template here
>> {{/bla}}
>> 
>> although the mustache semantics is a bit different (afaik it relies on
>> the existence or not of a key in an object). We could use the mustache
>> semantics but we probably need something more, too, like "if 'bla' is a
>> microsyntax name and is true if the value of the cell matches the
>> regexp then it is true".
> 
> Syntax-wise, we want our metadata document to be valid JSON, so we would need something different to mustache. However, I agree that our use cases call for similar semantics. Perhaps the syntax might be something like:
> 
> "condition: {
>    "operator": "if ({bla})",
>    "template": {
>        "name": "2010_Occupations-csv-to-ttl",
>        "description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
>        "type": "template",
>        "path": "2010_Occupations-csv-to-ttl.ttl",
>        "hasFormat": "text/turtle"
>    }
> }
> 
> In this case, I'm trying to say that the template will be triggered if the value of {bla} is true / not null etc. ... the value of {bla} is taken by evaluating the column (or microsyntax element) with "name" = "bla" for the row being processed. Like you say: """it relies on the existence or not of a key in an object"""
> 
> (I don't really like the syntax; I guess that others can come up with better.)

Ouch, you are right, I forgot about the fact that we want templates for conditionals:-( 

But before getting into the boring issue of syntax we have to decide whether we need them...

> 
>> 
>> But I agree that the conditional complicates the templates a lot. Here
>> is where our use cases may have to switch in: do our use cases justify
>> the need for conditionals (remembering that, though we are discussing
>> turtle here, I do not see any difference between generating turtle and
>> generating XML or JSON through the same mechanism).
> 
> The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1], motivated by the ExpressingHierarchyWithinOccupationalListings use case. This use case gives us two requirements:
> 
> i) triggering a template if a value of a cell is not null; e.g. to generate the SKOS concept scheme from the SOC structure ...
> 
> 15-0000,,,,Computer and Mathematical Occupations,,,,,
> ,15-1100,,,Computer Occupations,,,,,
> ,,15-1110,,Computer and Information Research Scientists,,,,,
> ,,,15-1111,Computer and Information Research Scientists,,,,,
> 
> Here we can see that I only want a ex:SOC-MajorGroup entity created on the first row shown above (where col 1 is populated).
> 
> ii) triggering a template if a value of a cell equates to a particular string (or the opposite); e.g. when the value of "onetsoc-occupation" = "00" as shown in the example shown [earlier in this email thread][3]. ...
> 
> "operator": "if ({onetsoc-occupation} == '00')"
> 
> Perhaps there are cases for more complex operations? I don't know. Perhaps this is where call-back functions or promises could be used to parse a row and provide a Boolean response as to whether the template should be triggered? Again, I don't know ... and some considerable thought would be required to work out the details of such.

For me these seem to be convincing that we need something. My preference would be, though, to avoid all the issues about defining 'if'-s and 'else'-s and comparions operators, etc, etc, and fall back on regular expressions ('match'-'not match') simply because regular expressions are used elsewhere already. Would that be enough?

Ivan

> 
> Jeremy
> 
> 
> 
> [1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-ConditionalProcessingBasedOnCellValues 
> [2]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-ExpressingHierarchyWithinOccupationalListings 
> [3]: http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html 
> 
>> 
>> My 2 cents...
>> 
>> Ivan
>> 
>> 
>> 
>> 
>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy
>> <jeremy.tandy@metoffice.gov.uk> wrote:
>> 
>>>> -----Original Message-----
>>>> From: Dan Brickley [mailto:danbri@google.com]
>>>> Sent: 18 June 2014 12:46
>>>> To: Tandy, Jeremy
>>>> Cc: CSV on the Web Working Group
>>>> Subject: Re: Attempted example CSV metadata document and template
>>>> 
>>>> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
>>>> wrote:
>>>>> All -
>>>>> 
>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple Weather
>>>> Observation" example. I've tried to create a CSV metadata document
>>>> following the rules in the [Metadata Vocabulary for Tabular Data][2]
>>>> and [Generating RDF from Tabular Data on the Web][3] documents.
>>>>> 
>>>>> I would be particularly interested in:
>>>>> 
>>>>> - corrections to errors!
>>>>> - comments on additional proposed properties in the metadata
>>>>> document ("short-name", "template", "microsyntax")
>>>>> - use of "hasFormat" to specify the Content-Type associated with a
>>>>> Template
>>>>> - use of a REGEXP within a URI Template to convert ISO 8601 syntax
>>>>> to a simplified form
>>>> 
>>>> I don't completely understand this mechanism yet, but do you think
>> it
>>>> could be stretched to address the SKOS/codes issue in
>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
>>>> ExpressingHierarchyWithinOccupationalListings
>>>> where we'd want to explode strings like "15-1199.00", "15-1199.01"
>>>> and emit triples like 'broader' when certain patterns matched?
>>>> 
>>>> Dan
>>>> 
>>> 
>>> OK ... let's have a go.
>>> 
>>> Here's the header and a line of data:
>>> 
>>> ---
>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description
>>> 15-1199.03,Web Administrators,"Manage web environment design,
>> deployment, development and maintenance activities. [...]"
>>> ---
>>> 
>>> Here's a guess at the CSV metadata description in which I am using
>> the ["multiple regexp each extracting a single value" pattern][1]:
>>> 
>>> ---
>>> {
>>>  "name": "2010_Occupations",
>>>  "title": "O*NET-SEC Occupational listing for 2010",
>>>  "publisher": [{
>>>      "name": "O*Net Resource Center",
>>>      "web": " http://www.onetcenter.org/ "
>>>  }],
>>>  "resources": [{
>>>      "name": "2010_Occupations-csv",
>>>      "path": "2010_Occupations.csv",
>>>      "schema": {"columns": [
>>>          {
>>>              "name": "onet-soc-2010-code",
>>>              "title": "O*NET-SOC 2010 Code",
>>>              "description": "O*NET Standard Occupational
>> Classification Code (2010).",
>>>              "type": "string",
>>>              "required": true,
>>>              "unique": true,
>>>              "microsyntax": [{
>>>                      "name": "soc-major-group",
>>>                      "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
>>>                  },{
>>>                      "name": "soc-minor-group",
>>>                      "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
>>>                  },{
>>>                      "name": "soc-broad-group",
>>>                      "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
>>>                  },{
>>>                      "name": "soc-detailed-occupation",
>>>                      "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
>>>                  },{
>>>                      "name": "onetsoc-occupation",
>>>                      "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
>>>                  }
>>> 
>>>              ]
>>>          },
>>>          {
>>>              "name": "title",
>>>              "title": "O*NET-SOC 2010 Title",
>>>              "description": "Title of occupational classification.",
>>>              "type": "string",
>>>              "required": true
>>>          },
>>>          {
>>>              "name": "description",
>>>              "title": "O*NET-SOC 2010 Description",
>>>              "description": Description of occupational
>> classification.",
>>>              "type": "string",
>>>              "required": true
>>>          }
>>>      ]},
>>>      "template": {
>>>          "name": "2010_Occupations-csv-to-ttl",
>>>          "description": "Template converting CSV content to SKOS/RDF
>> (expressed in Turtle syntax).",
>>>          "type": "template",
>>>          "path": "2010_Occupations-csv-to-ttl.ttl",
>>>          "hasFormat": "text/turtle"
>>>      }
>>>  }]
>>> }
>>> ---
>>> 
>>> You can see that I've used the `microsyntax` object to capture the 5
>> independent elements of the O*NET-SOC code each with its own regexp:
>> "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-
>> occupation" and "onetsoc-occupation". Whether this is the _best_ way to
>> do, I don't know ... it's just an idea to get us talking about
>> possibilities and options!
>>> 
>>> The template (prefixes etc. intentionally left out) might then be:
>>> 
>>> ---
>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
>>>   skos:notation "{onet-soc-2010-code}" ;
>>>   skos:prefLabel "{title}" ;
>>>   dct:description "{description}" ;
>>>   skos:broader ex:{soc-major-group}-0000,
>>>                ex:{soc-major-group}-{soc-minor-group}00,
>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>> group}0,
>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>> group}{soc-detailed-occupation} .
>>> ---
>>> 
>>> However, this does not help when we look at the required _conditional
>>> behaviour_: when the value of "onetsoc-occupation" = "00" this is
>>> identical to the term from the SOC taxonomy, and the template should
>>> be more like
>>> 
>>> ---
>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-
>> occupation} a ex:SOC-DetailedOccupation ;
>>>   skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-
>> group}{soc-detailed-occupation}" ;
>>>   skos:prefLabel "{title}" ;
>>>   dct:description "{description}" ;
>>>   skos:broader ex:{soc-major-group}-0000,
>>>                ex:{soc-major-group}-{soc-minor-group}00,
>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>> group}0 .
>>> ---
>>> 
>>> It occurs to be that we may wish to trigger different templates based
>> on a conditional response - or even whether we wish to trigger a
>> template at all for a given line!
>>> 
>>> Thinking out of the box (is that a euphemism for "making it up as I
>> go along"?), it would seem that each "template" block in the CSV
>> metadata might have a "condition" statement that tells it when to fire
>> - using values of column names or microsyntax element names? e.g.
>>> 
>>> ---
>>>      "template": {
>>>          "name": "2010_Occupations-csv-to-ttl",
>>>          "description": "Template converting CSV content to SKOS/RDF
>> (expressed in Turtle syntax).",
>>>          "type": "template",
>>>          "path": "2010_Occupations-csv-to-ttl.ttl",
>>>          "hasFormat": "text/turtle",
>>>          "condition": "if {soc-detailed-occupation} != '00'"
>>>      }
>>> ---
>>> 
>>> Default behaviour (if no "condition" statement included) would be
>> _always_ to trigger the template for each row.
>>> 
>>> However, looking at this, I am immediately concerned that including
>> if-then-else blocks and comparison operators hugely increases the
>> complexity of our work. Perhaps this is a good point to "bug out" to
>> some external agent (e.g. call-back function or promise).
>>> 
>>> Jeremy
>>> 
>>> [1]:
>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-
>> te
>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each-
>> extracti
>>> ng-single-value
>>> 
>>>> 
>>>>> - thoughts about a way to describe that microsyntax format within
>>>>> the
>>>> metadata document (see CellMicrosyntax requirement][4]), e.g. to
>>>> define the sub-elements within the microsyntax that may be extracted
>>>> for use later - see [Parsing cell microsyntax][5].
>>>>> 
>>>>> Comments welcome.
>>>>> 
>>>>> Jeremy
>>>>> 
>>>>> 
>>>>> [1]:
>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
>> and-
>>>> te
>>>>> mplate-for-simple-weather-obs-example.md
>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html
>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/
>>>>> [4]:
>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
>>>> CellMicrosynta
>>>>> x
>>>>> [5]:
>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
>> and-
>>>> te
>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> WebID: http://www.ivan-herman.net/foaf#me


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me






Received on Monday, 23 June 2014 16:36:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:40 UTC