Re: Attempted example CSV metadata document and template

Hi Jeremy,

I think I get it:-)

But... I see two problems with this approach.

- (This is the lesser one): do we really want to require the references to the templates to be part of the Metadata? I would guess that, typically (although not exclusively) the metadata is provided by the data publisher. But why would be the case for the templates? The same data set may be converted to different, say, XML depending on the application; ie, the templates may very well be end-user specific. But if the user defines his/her template, adding those metadata entries might be an extra load...

- If my understanding is correct, the model you have is {Condition via regexp} -> {one particular template} (Andy, is this also what you referred to?). Although you do not have an example like that, but I also presume one can extend that to an array of conditions to provide conjunction of conditions. However, that means I would have to provide a set of templates for the different cases, which means I would have to repeat the common parts over many templates. This looks fairly error prone to me:-(

Moving the conditions from the metadata into the templates themselves seem to be less error prone (although ending up with essentially if-the-else structures which may be a bit more complicated to implement). (Of course, we have the syntax issue on how to define the templates so that it would also work well with XML, Turtle, and JSON as a targeted output; lots of escape characters ahead...)

Another possibility may be to have some sort of an include facility. Much like #include in cpp...

Ivan




On 24 Jun 2014, at 13:11 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:

> Hi Andy -
> 
> Hopefully the [worked example][1] that I've just created illustrate your point. Please feel free to fix / amend / re-write as necessary!!!
> 
> In both examples, I've created multiple templates, which are configured to be triggered on a matching condition.
> 
> Jeremy
> 
> [1]: https://github.com/w3c/csvw/blob/gh-pages/examples/conditional-matching-in-occupational-listing-hierarchy.md 
> 
>> -----Original Message-----
>> From: Andy Seaborne [mailto:andy@apache.org]
>> Sent: 24 June 2014 11:39
>> To: public-csv-wg@w3.org
>> Subject: Re: Attempted example CSV metadata document and template
>> 
>> (general observation)
>> 
>> There are ways to get conditional effects without explicit "if-the-
>> else"
>> 
>> 1/ Apply different templates : that is multiple passes with different
>> matching conditions.
>> 
>> 2/ A template is valid if and only if all its associated templates are
>> defined (the template may not acatully be used) so that a (non-
>> )matching regex is controlling whether the template is applied.
>> 
>> These might be applicable separately or together.
>> 
>> 	Andy
>> 
>> On 23/06/14 17:35, Ivan Herman wrote:
>>> 
>>> On 23 Jun 2014, at 18:03 , Tandy, Jeremy
>> <jeremy.tandy@metoffice.gov.uk> wrote:
>>> 
>>>>> -----Original Message-----
>>>>> From: Ivan Herman [mailto:ivan@w3.org]
>>>>> Sent: 21 June 2014 08:38
>>>>> To: Tandy, Jeremy
>>>>> Cc: Dan Brickley; W3C CSV on the Web Working Group
>>>>> Subject: Re: Attempted example CSV metadata document and template
>>>>> 
>>>>> Jeremy,
>>>>> 
>>>>> one thing that I was wondering about was that the simple naming
>>>>> mechanism for the various microsyntaxes may not work out. Consider
>>>>> 
>>>>> 	"columns" : [
>>>>> 		{ "name" : "datetime",
>>>>> 		  ...
>>>>>                  "microsytax": [
>>>>> 			{ "name" : N1,
>>>>> 			  "regexp" : "...."
>>>>> 			},
>>>>> 			.....
>>>>>                  ]
>>>>> 		},
>>>>> 		{ "name" : "anothercolumn",
>>>>> 		  ...
>>>>> 		  "microsyntax"
>>>>> 			{ "name" : N1,
>>>>> 			  "regexp" : "...."
>>>>> 			},
>>>>> 			.....
>>>>> 		}
>>>>> 
>>>>> 	]
>>>>> 
>>>>> 
>>>>> When working through the cells in a row, what would 'N1' refer to?
>>>>> Unless we want to require the unicity of the microsyntax names, we
>>>>> may hit an issue. And I do not think requiring a unique name is a
>>>>> good idea; if the metadata becomes big, this may become a nuisance.
>>>> 
>>>> Agreed. I made the assumption that all instances of "name" within a
>> given metadata document would need to be unique. I had not considered
>> any mechanisms to make this easy for users; e.g. using the "name" from
>> an enclosing object to automatically _namespace_ sub-names.
>>>> 
>>>> We could leave it to the user to ensure uniqueness (easy for us;
>> adds load to the end user which is less good); in which case the
>> example above would fail to validate.
>>>> 
>>>> Alternatively, we could apply a form of name-spacing; e.g.
>> "datetime/N1" and "anothercolumn/N1" within your example above.
>>>> 
>>>>> 
>>>>> What this means is that the syntax becomes more complicated.
>>>>> Something like {datetime:N1} or something similar (which raises the
>>>>> issue of escape characters, too:-(
>>>> 
>>>> Agreed! I chose a different separator character to you, but the same
>> issue applies.
>>>> 
>>>>> 
>>>>> As for the conditionals: mustache has some syntax for this which is
>>>>> a bit different
>>>>> 
>>>>> {{#bla}}
>>>>>   .. any template here
>>>>> {{/bla}}
>>>>> 
>>>>> although the mustache semantics is a bit different (afaik it relies
>>>>> on the existence or not of a key in an object). We could use the
>>>>> mustache semantics but we probably need something more, too, like
>>>>> "if 'bla' is a microsyntax name and is true if the value of the
>> cell
>>>>> matches the regexp then it is true".
>>>> 
>>>> Syntax-wise, we want our metadata document to be valid JSON, so we
>> would need something different to mustache. However, I agree that our
>> use cases call for similar semantics. Perhaps the syntax might be
>> something like:
>>>> 
>>>> "condition: {
>>>>    "operator": "if ({bla})",
>>>>    "template": {
>>>>        "name": "2010_Occupations-csv-to-ttl",
>>>>        "description": "Template converting CSV content to SKOS/RDF
>> (expressed in Turtle syntax).",
>>>>        "type": "template",
>>>>        "path": "2010_Occupations-csv-to-ttl.ttl",
>>>>        "hasFormat": "text/turtle"
>>>>    }
>>>> }
>>>> 
>>>> In this case, I'm trying to say that the template will be triggered
>> if the value of {bla} is true / not null etc. ... the value of {bla} is
>> taken by evaluating the column (or microsyntax element) with "name" =
>> "bla" for the row being processed. Like you say: """it relies on the
>> existence or not of a key in an object"""
>>>> 
>>>> (I don't really like the syntax; I guess that others can come up
>> with
>>>> better.)
>>> 
>>> Ouch, you are right, I forgot about the fact that we want templates
>>> for conditionals:-(
>>> 
>>> But before getting into the boring issue of syntax we have to decide
>> whether we need them...
>>> 
>>>> 
>>>>> 
>>>>> But I agree that the conditional complicates the templates a lot.
>>>>> Here is where our use cases may have to switch in: do our use cases
>>>>> justify the need for conditionals (remembering that, though we are
>>>>> discussing turtle here, I do not see any difference between
>>>>> generating turtle and generating XML or JSON through the same
>> mechanism).
>>>> 
>>>> The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1],
>> motivated by the ExpressingHierarchyWithinOccupationalListings use
>> case. This use case gives us two requirements:
>>>> 
>>>> i) triggering a template if a value of a cell is not null; e.g. to
>> generate the SKOS concept scheme from the SOC structure ...
>>>> 
>>>> 15-0000,,,,Computer and Mathematical Occupations,,,,,
>>>> ,15-1100,,,Computer Occupations,,,,, ,,15-1110,,Computer and
>>>> Information Research Scientists,,,,, ,,,15-1111,Computer and
>>>> Information Research Scientists,,,,,
>>>> 
>>>> Here we can see that I only want a ex:SOC-MajorGroup entity created
>> on the first row shown above (where col 1 is populated).
>>>> 
>>>> ii) triggering a template if a value of a cell equates to a
>> particular string (or the opposite); e.g. when the value of "onetsoc-
>> occupation" = "00" as shown in the example shown [earlier in this email
>> thread][3]. ...
>>>> 
>>>> "operator": "if ({onetsoc-occupation} == '00')"
>>>> 
>>>> Perhaps there are cases for more complex operations? I don't know.
>> Perhaps this is where call-back functions or promises could be used to
>> parse a row and provide a Boolean response as to whether the template
>> should be triggered? Again, I don't know ... and some considerable
>> thought would be required to work out the details of such.
>>> 
>>> For me these seem to be convincing that we need something. My
>> preference would be, though, to avoid all the issues about defining
>> 'if'-s and 'else'-s and comparions operators, etc, etc, and fall back
>> on regular expressions ('match'-'not match') simply because regular
>> expressions are used elsewhere already. Would that be enough?
>>> 
>>> Ivan
>>> 
>>>> 
>>>> Jeremy
>>>> 
>>>> 
>>>> 
>>>> [1]:
>>>> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-
>> Con
>>>> ditionalProcessingBasedOnCellValues
>>>> [2]:
>>>> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-
>> Ex
>>>> pressingHierarchyWithinOccupationalListings
>>>> [3]:
>>>> http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html
>>>> 
>>>>> 
>>>>> My 2 cents...
>>>>> 
>>>>> Ivan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy
>>>>> <jeremy.tandy@metoffice.gov.uk> wrote:
>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Dan Brickley [mailto:danbri@google.com]
>>>>>>> Sent: 18 June 2014 12:46
>>>>>>> To: Tandy, Jeremy
>>>>>>> Cc: CSV on the Web Working Group
>>>>>>> Subject: Re: Attempted example CSV metadata document and template
>>>>>>> 
>>>>>>> On 12 June 2014 12:57, Tandy, Jeremy
>>>>>>> <jeremy.tandy@metoffice.gov.uk>
>>>>>>> wrote:
>>>>>>>> All -
>>>>>>>> 
>>>>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple
>> Weather
>>>>>>> Observation" example. I've tried to create a CSV metadata
>> document
>>>>>>> following the rules in the [Metadata Vocabulary for Tabular
>>>>>>> Data][2] and [Generating RDF from Tabular Data on the Web][3]
>> documents.
>>>>>>>> 
>>>>>>>> I would be particularly interested in:
>>>>>>>> 
>>>>>>>> - corrections to errors!
>>>>>>>> - comments on additional proposed properties in the metadata
>>>>>>>> document ("short-name", "template", "microsyntax")
>>>>>>>> - use of "hasFormat" to specify the Content-Type associated with
>>>>>>>> a Template
>>>>>>>> - use of a REGEXP within a URI Template to convert ISO 8601
>>>>>>>> syntax to a simplified form
>>>>>>> 
>>>>>>> I don't completely understand this mechanism yet, but do you
>> think
>>>>> it
>>>>>>> could be stretched to address the SKOS/codes issue in
>>>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
>>>>>>> ExpressingHierarchyWithinOccupationalListings
>>>>>>> where we'd want to explode strings like "15-1199.00", "15-
>> 1199.01"
>>>>>>> and emit triples like 'broader' when certain patterns matched?
>>>>>>> 
>>>>>>> Dan
>>>>>>> 
>>>>>> 
>>>>>> OK ... let's have a go.
>>>>>> 
>>>>>> Here's the header and a line of data:
>>>>>> 
>>>>>> ---
>>>>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010
>> Description
>>>>>> 15-1199.03,Web Administrators,"Manage web environment design,
>>>>> deployment, development and maintenance activities. [...]"
>>>>>> ---
>>>>>> 
>>>>>> Here's a guess at the CSV metadata description in which I am using
>>>>> the ["multiple regexp each extracting a single value" pattern][1]:
>>>>>> 
>>>>>> ---
>>>>>> {
>>>>>>  "name": "2010_Occupations",
>>>>>>  "title": "O*NET-SEC Occupational listing for 2010",
>>>>>>  "publisher": [{
>>>>>>      "name": "O*Net Resource Center",
>>>>>>      "web": " http://www.onetcenter.org/ "
>>>>>>  }],
>>>>>>  "resources": [{
>>>>>>      "name": "2010_Occupations-csv",
>>>>>>      "path": "2010_Occupations.csv",
>>>>>>      "schema": {"columns": [
>>>>>>          {
>>>>>>              "name": "onet-soc-2010-code",
>>>>>>              "title": "O*NET-SOC 2010 Code",
>>>>>>              "description": "O*NET Standard Occupational
>>>>> Classification Code (2010).",
>>>>>>              "type": "string",
>>>>>>              "required": true,
>>>>>>              "unique": true,
>>>>>>              "microsyntax": [{
>>>>>>                      "name": "soc-major-group",
>>>>>>                      "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
>>>>>>                  },{
>>>>>>                      "name": "soc-minor-group",
>>>>>>                      "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
>>>>>>                  },{
>>>>>>                      "name": "soc-broad-group",
>>>>>>                      "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
>>>>>>                  },{
>>>>>>                      "name": "soc-detailed-occupation",
>>>>>>                      "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
>>>>>>                  },{
>>>>>>                      "name": "onetsoc-occupation",
>>>>>>                      "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
>>>>>>                  }
>>>>>> 
>>>>>>              ]
>>>>>>          },
>>>>>>          {
>>>>>>              "name": "title",
>>>>>>              "title": "O*NET-SOC 2010 Title",
>>>>>>              "description": "Title of occupational
>> classification.",
>>>>>>              "type": "string",
>>>>>>              "required": true
>>>>>>          },
>>>>>>          {
>>>>>>              "name": "description",
>>>>>>              "title": "O*NET-SOC 2010 Description",
>>>>>>              "description": Description of occupational
>>>>> classification.",
>>>>>>              "type": "string",
>>>>>>              "required": true
>>>>>>          }
>>>>>>      ]},
>>>>>>      "template": {
>>>>>>          "name": "2010_Occupations-csv-to-ttl",
>>>>>>          "description": "Template converting CSV content to
>>>>>> SKOS/RDF
>>>>> (expressed in Turtle syntax).",
>>>>>>          "type": "template",
>>>>>>          "path": "2010_Occupations-csv-to-ttl.ttl",
>>>>>>          "hasFormat": "text/turtle"
>>>>>>      }
>>>>>>  }]
>>>>>> }
>>>>>> ---
>>>>>> 
>>>>>> You can see that I've used the `microsyntax` object to capture the
>>>>>> 5
>>>>> independent elements of the O*NET-SOC code each with its own
>> regexp:
>>>>> "soc-major-group", "soc-minor-group", "soc-broad-group",
>>>>> "soc-detailed- occupation" and "onetsoc-occupation". Whether this
>> is
>>>>> the _best_ way to do, I don't know ... it's just an idea to get us
>>>>> talking about possibilities and options!
>>>>>> 
>>>>>> The template (prefixes etc. intentionally left out) might then be:
>>>>>> 
>>>>>> ---
>>>>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
>>>>>>   skos:notation "{onet-soc-2010-code}" ;
>>>>>>   skos:prefLabel "{title}" ;
>>>>>>   dct:description "{description}" ;
>>>>>>   skos:broader ex:{soc-major-group}-0000,
>>>>>>                ex:{soc-major-group}-{soc-minor-group}00,
>>>>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>>>>> group}0,
>>>>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>>>>> group}{soc-detailed-occupation} .
>>>>>> ---
>>>>>> 
>>>>>> However, this does not help when we look at the required
>>>>>> _conditional
>>>>>> behaviour_: when the value of "onetsoc-occupation" = "00" this is
>>>>>> identical to the term from the SOC taxonomy, and the template
>>>>>> should be more like
>>>>>> 
>>>>>> ---
>>>>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-
>> detaile
>>>>>> d-
>>>>> occupation} a ex:SOC-DetailedOccupation ;
>>>>>>   skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-
>>>>> group}{soc-detailed-occupation}" ;
>>>>>>   skos:prefLabel "{title}" ;
>>>>>>   dct:description "{description}" ;
>>>>>>   skos:broader ex:{soc-major-group}-0000,
>>>>>>                ex:{soc-major-group}-{soc-minor-group}00,
>>>>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
>>>>> group}0 .
>>>>>> ---
>>>>>> 
>>>>>> It occurs to be that we may wish to trigger different templates
>>>>>> based
>>>>> on a conditional response - or even whether we wish to trigger a
>>>>> template at all for a given line!
>>>>>> 
>>>>>> Thinking out of the box (is that a euphemism for "making it up as
>> I
>>>>> go along"?), it would seem that each "template" block in the CSV
>>>>> metadata might have a "condition" statement that tells it when to
>>>>> fire
>>>>> - using values of column names or microsyntax element names? e.g.
>>>>>> 
>>>>>> ---
>>>>>>      "template": {
>>>>>>          "name": "2010_Occupations-csv-to-ttl",
>>>>>>          "description": "Template converting CSV content to
>>>>>> SKOS/RDF
>>>>> (expressed in Turtle syntax).",
>>>>>>          "type": "template",
>>>>>>          "path": "2010_Occupations-csv-to-ttl.ttl",
>>>>>>          "hasFormat": "text/turtle",
>>>>>>          "condition": "if {soc-detailed-occupation} != '00'"
>>>>>>      }
>>>>>> ---
>>>>>> 
>>>>>> Default behaviour (if no "condition" statement included) would be
>>>>> _always_ to trigger the template for each row.
>>>>>> 
>>>>>> However, looking at this, I am immediately concerned that
>> including
>>>>> if-then-else blocks and comparison operators hugely increases the
>>>>> complexity of our work. Perhaps this is a good point to "bug out"
>> to
>>>>> some external agent (e.g. call-back function or promise).
>>>>>> 
>>>>>> Jeremy
>>>>>> 
>>>>>> [1]:
>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
>> and
>>>>>> -
>>>>> te
>>>>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each-
>>>>> extracti
>>>>>> ng-single-value
>>>>>> 
>>>>>>> 
>>>>>>>> - thoughts about a way to describe that microsyntax format
>> within
>>>>>>>> the
>>>>>>> metadata document (see CellMicrosyntax requirement][4]), e.g. to
>>>>>>> define the sub-elements within the microsyntax that may be
>>>>>>> extracted for use later - see [Parsing cell microsyntax][5].
>>>>>>>> 
>>>>>>>> Comments welcome.
>>>>>>>> 
>>>>>>>> Jeremy
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1]:
>>>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
>>>>> and-
>>>>>>> te
>>>>>>>> mplate-for-simple-weather-obs-example.md
>>>>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html
>>>>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/
>>>>>>>> [4]:
>>>>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
>>>>>>> CellMicrosynta
>>>>>>>> x
>>>>>>>> [5]:
>>>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
>>>>> and-
>>>>>>> te
>>>>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell-
>> microsyntax
>>>>> 
>>>>> 
>>>>> ----
>>>>> Ivan Herman, W3C
>>>>> Digital Publishing Activity Lead
>>>>> Home: http://www.w3.org/People/Ivan/
>>>>> mobile: +31-641044153
>>>>> GPG: 0x343F1A3D
>>>>> WebID: http://www.ivan-herman.net/foaf#me
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> WebID: http://www.ivan-herman.net/foaf#me
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me

Received on Tuesday, 24 June 2014 12:40:45 UTC