RE: Attempted example CSV metadata document and template

> -----Original Message-----
> From: Ivan Herman [mailto:ivan@w3.org]
> Sent: 24 June 2014 13:40
> To: Tandy, Jeremy
> Cc: Andy Seaborne; W3C CSV on the Web Working Group
> Subject: Re: Attempted example CSV metadata document and template
> 
> Hi Jeremy,
> 
> I think I get it:-)
> 
> But... I see two problems with this approach.
> 
> - (This is the lesser one): do we really want to require the references
> to the templates to be part of the Metadata? 

I never saw a problem with this - but I didn't think about it very hard!

> I would guess that,
> typically (although not exclusively) the metadata is provided by the
> data publisher. 

Anyone can provide metadata ... see [R-IndependentMetadataPublication][1]. So even if the original data publisher did not provide templates, a third party could publish their own metadata description including references to the templates.

[1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-IndependentMetadataPublication 

> But why would be the case for the templates? The same
> data set may be converted to different, say, XML depending on the
> application; ie, the templates may very well be end-user specific. 

Also true, that's why my template blocks include the `hasFormat` key to say what the intended target format is. This then enables conversion software to offer the user a choice of conversions based on the formats expressed in the templates. 

I agree that this will create an increasingly large number of template blocks for a small number of "power-user" cases.

> But
> if the user defines his/her template, adding those metadata entries
> might be an extra load...
> 
> - If my understanding is correct, the model you have is {Condition via
> regexp} -> {one particular template} (Andy, is this also what you
> referred to?). 

I think so ...

> Although you do not have an example like that, 

Could you provide an example?

> but I
> also presume one can extend that to an array of conditions to provide
> conjunction of conditions. However, that means I would have to provide
> a set of templates for the different cases, which means I would have to
> repeat the common parts over many templates. This looks fairly error
> prone to me:-(

This is true. My goal so far was to articulate the problem with worked examples ... in the hope that we can iterate toward a solution that is elegant. I still anticipate a few more steps along that road!

> 
> Moving the conditions from the metadata into the templates themselves
> seem to be less error prone (although ending up with essentially if-
> the-else structures which may be a bit more complicated to implement).
> (Of course, we have the syntax issue on how to define the templates so
> that it would also work well with XML, Turtle, and JSON as a targeted
> output; lots of escape characters ahead...)

It would be good to see these ideas encapsulated in examples; I think it makes them easier to discuss!

> 
> Another possibility may be to have some sort of an include facility.
> Much like #include in cpp...
> 

Ah, the possibilities ... the trick is, as Einstein is reported to have said, to ensure that "everything [is] made as simple as possible, but not simpler." :-)

Jeremy

> Ivan
> 
> 
> 
> 
> On 24 Jun 2014, at 13:11 , Tandy, Jeremy
> <jeremy.tandy@metoffice.gov.uk> wrote:
> 
> > Hi Andy -
> >
> > Hopefully the [worked example][1] that I've just created illustrate
> your point. Please feel free to fix / amend / re-write as necessary!!!
> >
> > In both examples, I've created multiple templates, which are
> configured to be triggered on a matching condition.
> >
> > Jeremy
> >
> > [1]:
> > https://github.com/w3c/csvw/blob/gh-pages/examples/conditional-
> matchin
> > g-in-occupational-listing-hierarchy.md
> >
> >> -----Original Message-----
> >> From: Andy Seaborne [mailto:andy@apache.org]
> >> Sent: 24 June 2014 11:39
> >> To: public-csv-wg@w3.org
> >> Subject: Re: Attempted example CSV metadata document and template
> >>
> >> (general observation)
> >>
> >> There are ways to get conditional effects without explicit "if-the-
> >> else"
> >>
> >> 1/ Apply different templates : that is multiple passes with
> different
> >> matching conditions.
> >>
> >> 2/ A template is valid if and only if all its associated templates
> >> are defined (the template may not acatully be used) so that a (non-
> >> )matching regex is controlling whether the template is applied.
> >>
> >> These might be applicable separately or together.
> >>
> >> 	Andy
> >>
> >> On 23/06/14 17:35, Ivan Herman wrote:
> >>>
> >>> On 23 Jun 2014, at 18:03 , Tandy, Jeremy
> >> <jeremy.tandy@metoffice.gov.uk> wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Ivan Herman [mailto:ivan@w3.org]
> >>>>> Sent: 21 June 2014 08:38
> >>>>> To: Tandy, Jeremy
> >>>>> Cc: Dan Brickley; W3C CSV on the Web Working Group
> >>>>> Subject: Re: Attempted example CSV metadata document and template
> >>>>>
> >>>>> Jeremy,
> >>>>>
> >>>>> one thing that I was wondering about was that the simple naming
> >>>>> mechanism for the various microsyntaxes may not work out.
> Consider
> >>>>>
> >>>>> 	"columns" : [
> >>>>> 		{ "name" : "datetime",
> >>>>> 		  ...
> >>>>>                  "microsytax": [
> >>>>> 			{ "name" : N1,
> >>>>> 			  "regexp" : "...."
> >>>>> 			},
> >>>>> 			.....
> >>>>>                  ]
> >>>>> 		},
> >>>>> 		{ "name" : "anothercolumn",
> >>>>> 		  ...
> >>>>> 		  "microsyntax"
> >>>>> 			{ "name" : N1,
> >>>>> 			  "regexp" : "...."
> >>>>> 			},
> >>>>> 			.....
> >>>>> 		}
> >>>>>
> >>>>> 	]
> >>>>>
> >>>>>
> >>>>> When working through the cells in a row, what would 'N1' refer
> to?
> >>>>> Unless we want to require the unicity of the microsyntax names,
> we
> >>>>> may hit an issue. And I do not think requiring a unique name is a
> >>>>> good idea; if the metadata becomes big, this may become a
> nuisance.
> >>>>
> >>>> Agreed. I made the assumption that all instances of "name" within
> a
> >> given metadata document would need to be unique. I had not
> considered
> >> any mechanisms to make this easy for users; e.g. using the "name"
> >> from an enclosing object to automatically _namespace_ sub-names.
> >>>>
> >>>> We could leave it to the user to ensure uniqueness (easy for us;
> >> adds load to the end user which is less good); in which case the
> >> example above would fail to validate.
> >>>>
> >>>> Alternatively, we could apply a form of name-spacing; e.g.
> >> "datetime/N1" and "anothercolumn/N1" within your example above.
> >>>>
> >>>>>
> >>>>> What this means is that the syntax becomes more complicated.
> >>>>> Something like {datetime:N1} or something similar (which raises
> >>>>> the issue of escape characters, too:-(
> >>>>
> >>>> Agreed! I chose a different separator character to you, but the
> >>>> same
> >> issue applies.
> >>>>
> >>>>>
> >>>>> As for the conditionals: mustache has some syntax for this which
> >>>>> is a bit different
> >>>>>
> >>>>> {{#bla}}
> >>>>>   .. any template here
> >>>>> {{/bla}}
> >>>>>
> >>>>> although the mustache semantics is a bit different (afaik it
> >>>>> relies on the existence or not of a key in an object). We could
> >>>>> use the mustache semantics but we probably need something more,
> >>>>> too, like "if 'bla' is a microsyntax name and is true if the
> value
> >>>>> of the
> >> cell
> >>>>> matches the regexp then it is true".
> >>>>
> >>>> Syntax-wise, we want our metadata document to be valid JSON, so we
> >> would need something different to mustache. However, I agree that
> our
> >> use cases call for similar semantics. Perhaps the syntax might be
> >> something like:
> >>>>
> >>>> "condition: {
> >>>>    "operator": "if ({bla})",
> >>>>    "template": {
> >>>>        "name": "2010_Occupations-csv-to-ttl",
> >>>>        "description": "Template converting CSV content to SKOS/RDF
> >> (expressed in Turtle syntax).",
> >>>>        "type": "template",
> >>>>        "path": "2010_Occupations-csv-to-ttl.ttl",
> >>>>        "hasFormat": "text/turtle"
> >>>>    }
> >>>> }
> >>>>
> >>>> In this case, I'm trying to say that the template will be
> triggered
> >> if the value of {bla} is true / not null etc. ... the value of {bla}
> >> is taken by evaluating the column (or microsyntax element) with
> >> "name" = "bla" for the row being processed. Like you say: """it
> >> relies on the existence or not of a key in an object"""
> >>>>
> >>>> (I don't really like the syntax; I guess that others can come up
> >> with
> >>>> better.)
> >>>
> >>> Ouch, you are right, I forgot about the fact that we want templates
> >>> for conditionals:-(
> >>>
> >>> But before getting into the boring issue of syntax we have to
> decide
> >> whether we need them...
> >>>
> >>>>
> >>>>>
> >>>>> But I agree that the conditional complicates the templates a lot.
> >>>>> Here is where our use cases may have to switch in: do our use
> >>>>> cases justify the need for conditionals (remembering that, though
> >>>>> we are discussing turtle here, I do not see any difference
> between
> >>>>> generating turtle and generating XML or JSON through the same
> >> mechanism).
> >>>>
> >>>> The requirement is ["R-
> ConditionalProcessingBasedOnCellValues"][1],
> >> motivated by the ExpressingHierarchyWithinOccupationalListings use
> >> case. This use case gives us two requirements:
> >>>>
> >>>> i) triggering a template if a value of a cell is not null; e.g. to
> >> generate the SKOS concept scheme from the SOC structure ...
> >>>>
> >>>> 15-0000,,,,Computer and Mathematical Occupations,,,,,
> >>>> ,15-1100,,,Computer Occupations,,,,, ,,15-1110,,Computer and
> >>>> Information Research Scientists,,,,, ,,,15-1111,Computer and
> >>>> Information Research Scientists,,,,,
> >>>>
> >>>> Here we can see that I only want a ex:SOC-MajorGroup entity
> created
> >> on the first row shown above (where col 1 is populated).
> >>>>
> >>>> ii) triggering a template if a value of a cell equates to a
> >> particular string (or the opposite); e.g. when the value of
> "onetsoc-
> >> occupation" = "00" as shown in the example shown [earlier in this
> >> email thread][3]. ...
> >>>>
> >>>> "operator": "if ({onetsoc-occupation} == '00')"
> >>>>
> >>>> Perhaps there are cases for more complex operations? I don't know.
> >> Perhaps this is where call-back functions or promises could be used
> >> to parse a row and provide a Boolean response as to whether the
> >> template should be triggered? Again, I don't know ... and some
> >> considerable thought would be required to work out the details of
> such.
> >>>
> >>> For me these seem to be convincing that we need something. My
> >> preference would be, though, to avoid all the issues about defining
> >> 'if'-s and 'else'-s and comparions operators, etc, etc, and fall
> back
> >> on regular expressions ('match'-'not match') simply because regular
> >> expressions are used elsewhere already. Would that be enough?
> >>>
> >>> Ivan
> >>>
> >>>>
> >>>> Jeremy
> >>>>
> >>>>
> >>>>
> >>>> [1]:
> >>>> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-
> >> Con
> >>>> ditionalProcessingBasedOnCellValues
> >>>> [2]:
> >>>> http://w3c.github.io/csvw/use-cases-and-
> requirements/index.html#UC-
> >> Ex
> >>>> pressingHierarchyWithinOccupationalListings
> >>>> [3]:
> >>>> http://lists.w3.org/Archives/Public/public-csv-
> wg/2014Jun/0127.html
> >>>>
> >>>>>
> >>>>> My 2 cents...
> >>>>>
> >>>>> Ivan
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy
> >>>>> <jeremy.tandy@metoffice.gov.uk> wrote:
> >>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Dan Brickley [mailto:danbri@google.com]
> >>>>>>> Sent: 18 June 2014 12:46
> >>>>>>> To: Tandy, Jeremy
> >>>>>>> Cc: CSV on the Web Working Group
> >>>>>>> Subject: Re: Attempted example CSV metadata document and
> >>>>>>> template
> >>>>>>>
> >>>>>>> On 12 June 2014 12:57, Tandy, Jeremy
> >>>>>>> <jeremy.tandy@metoffice.gov.uk>
> >>>>>>> wrote:
> >>>>>>>> All -
> >>>>>>>>
> >>>>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple
> >> Weather
> >>>>>>> Observation" example. I've tried to create a CSV metadata
> >> document
> >>>>>>> following the rules in the [Metadata Vocabulary for Tabular
> >>>>>>> Data][2] and [Generating RDF from Tabular Data on the Web][3]
> >> documents.
> >>>>>>>>
> >>>>>>>> I would be particularly interested in:
> >>>>>>>>
> >>>>>>>> - corrections to errors!
> >>>>>>>> - comments on additional proposed properties in the metadata
> >>>>>>>> document ("short-name", "template", "microsyntax")
> >>>>>>>> - use of "hasFormat" to specify the Content-Type associated
> >>>>>>>> with a Template
> >>>>>>>> - use of a REGEXP within a URI Template to convert ISO 8601
> >>>>>>>> syntax to a simplified form
> >>>>>>>
> >>>>>>> I don't completely understand this mechanism yet, but do you
> >> think
> >>>>> it
> >>>>>>> could be stretched to address the SKOS/codes issue in
> >>>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> >>>>>>> ExpressingHierarchyWithinOccupationalListings
> >>>>>>> where we'd want to explode strings like "15-1199.00", "15-
> >> 1199.01"
> >>>>>>> and emit triples like 'broader' when certain patterns matched?
> >>>>>>>
> >>>>>>> Dan
> >>>>>>>
> >>>>>>
> >>>>>> OK ... let's have a go.
> >>>>>>
> >>>>>> Here's the header and a line of data:
> >>>>>>
> >>>>>> ---
> >>>>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010
> >> Description
> >>>>>> 15-1199.03,Web Administrators,"Manage web environment design,
> >>>>> deployment, development and maintenance activities. [...]"
> >>>>>> ---
> >>>>>>
> >>>>>> Here's a guess at the CSV metadata description in which I am
> >>>>>> using
> >>>>> the ["multiple regexp each extracting a single value"
> pattern][1]:
> >>>>>>
> >>>>>> ---
> >>>>>> {
> >>>>>>  "name": "2010_Occupations",
> >>>>>>  "title": "O*NET-SEC Occupational listing for 2010",
> >>>>>>  "publisher": [{
> >>>>>>      "name": "O*Net Resource Center",
> >>>>>>      "web": " http://www.onetcenter.org/ "
> >>>>>>  }],
> >>>>>>  "resources": [{
> >>>>>>      "name": "2010_Occupations-csv",
> >>>>>>      "path": "2010_Occupations.csv",
> >>>>>>      "schema": {"columns": [
> >>>>>>          {
> >>>>>>              "name": "onet-soc-2010-code",
> >>>>>>              "title": "O*NET-SOC 2010 Code",
> >>>>>>              "description": "O*NET Standard Occupational
> >>>>> Classification Code (2010).",
> >>>>>>              "type": "string",
> >>>>>>              "required": true,
> >>>>>>              "unique": true,
> >>>>>>              "microsyntax": [{
> >>>>>>                      "name": "soc-major-group",
> >>>>>>                      "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
> >>>>>>                  },{
> >>>>>>                      "name": "soc-minor-group",
> >>>>>>                      "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
> >>>>>>                  },{
> >>>>>>                      "name": "soc-broad-group",
> >>>>>>                      "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
> >>>>>>                  },{
> >>>>>>                      "name": "soc-detailed-occupation",
> >>>>>>                      "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
> >>>>>>                  },{
> >>>>>>                      "name": "onetsoc-occupation",
> >>>>>>                      "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
> >>>>>>                  }
> >>>>>>
> >>>>>>              ]
> >>>>>>          },
> >>>>>>          {
> >>>>>>              "name": "title",
> >>>>>>              "title": "O*NET-SOC 2010 Title",
> >>>>>>              "description": "Title of occupational
> >> classification.",
> >>>>>>              "type": "string",
> >>>>>>              "required": true
> >>>>>>          },
> >>>>>>          {
> >>>>>>              "name": "description",
> >>>>>>              "title": "O*NET-SOC 2010 Description",
> >>>>>>              "description": Description of occupational
> >>>>> classification.",
> >>>>>>              "type": "string",
> >>>>>>              "required": true
> >>>>>>          }
> >>>>>>      ]},
> >>>>>>      "template": {
> >>>>>>          "name": "2010_Occupations-csv-to-ttl",
> >>>>>>          "description": "Template converting CSV content to
> >>>>>> SKOS/RDF
> >>>>> (expressed in Turtle syntax).",
> >>>>>>          "type": "template",
> >>>>>>          "path": "2010_Occupations-csv-to-ttl.ttl",
> >>>>>>          "hasFormat": "text/turtle"
> >>>>>>      }
> >>>>>>  }]
> >>>>>> }
> >>>>>> ---
> >>>>>>
> >>>>>> You can see that I've used the `microsyntax` object to capture
> >>>>>> the
> >>>>>> 5
> >>>>> independent elements of the O*NET-SOC code each with its own
> >> regexp:
> >>>>> "soc-major-group", "soc-minor-group", "soc-broad-group",
> >>>>> "soc-detailed- occupation" and "onetsoc-occupation". Whether this
> >> is
> >>>>> the _best_ way to do, I don't know ... it's just an idea to get
> us
> >>>>> talking about possibilities and options!
> >>>>>>
> >>>>>> The template (prefixes etc. intentionally left out) might then
> be:
> >>>>>>
> >>>>>> ---
> >>>>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
> >>>>>>   skos:notation "{onet-soc-2010-code}" ;
> >>>>>>   skos:prefLabel "{title}" ;
> >>>>>>   dct:description "{description}" ;
> >>>>>>   skos:broader ex:{soc-major-group}-0000,
> >>>>>>                ex:{soc-major-group}-{soc-minor-group}00,
> >>>>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> >>>>> group}0,
> >>>>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> >>>>> group}{soc-detailed-occupation} .
> >>>>>> ---
> >>>>>>
> >>>>>> However, this does not help when we look at the required
> >>>>>> _conditional
> >>>>>> behaviour_: when the value of "onetsoc-occupation" = "00" this
> is
> >>>>>> identical to the term from the SOC taxonomy, and the template
> >>>>>> should be more like
> >>>>>>
> >>>>>> ---
> >>>>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-
> >> detaile
> >>>>>> d-
> >>>>> occupation} a ex:SOC-DetailedOccupation ;
> >>>>>>   skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-
> >>>>> group}{soc-detailed-occupation}" ;
> >>>>>>   skos:prefLabel "{title}" ;
> >>>>>>   dct:description "{description}" ;
> >>>>>>   skos:broader ex:{soc-major-group}-0000,
> >>>>>>                ex:{soc-major-group}-{soc-minor-group}00,
> >>>>>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> >>>>> group}0 .
> >>>>>> ---
> >>>>>>
> >>>>>> It occurs to be that we may wish to trigger different templates
> >>>>>> based
> >>>>> on a conditional response - or even whether we wish to trigger a
> >>>>> template at all for a given line!
> >>>>>>
> >>>>>> Thinking out of the box (is that a euphemism for "making it up
> as
> >> I
> >>>>> go along"?), it would seem that each "template" block in the CSV
> >>>>> metadata might have a "condition" statement that tells it when to
> >>>>> fire
> >>>>> - using values of column names or microsyntax element names? e.g.
> >>>>>>
> >>>>>> ---
> >>>>>>      "template": {
> >>>>>>          "name": "2010_Occupations-csv-to-ttl",
> >>>>>>          "description": "Template converting CSV content to
> >>>>>> SKOS/RDF
> >>>>> (expressed in Turtle syntax).",
> >>>>>>          "type": "template",
> >>>>>>          "path": "2010_Occupations-csv-to-ttl.ttl",
> >>>>>>          "hasFormat": "text/turtle",
> >>>>>>          "condition": "if {soc-detailed-occupation} != '00'"
> >>>>>>      }
> >>>>>> ---
> >>>>>>
> >>>>>> Default behaviour (if no "condition" statement included) would
> be
> >>>>> _always_ to trigger the template for each row.
> >>>>>>
> >>>>>> However, looking at this, I am immediately concerned that
> >> including
> >>>>> if-then-else blocks and comparison operators hugely increases the
> >>>>> complexity of our work. Perhaps this is a good point to "bug out"
> >> to
> >>>>> some external agent (e.g. call-back function or promise).
> >>>>>>
> >>>>>> Jeremy
> >>>>>>
> >>>>>> [1]:
> >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
> >> and
> >>>>>> -
> >>>>> te
> >>>>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each-
> >>>>> extracti
> >>>>>> ng-single-value
> >>>>>>
> >>>>>>>
> >>>>>>>> - thoughts about a way to describe that microsyntax format
> >> within
> >>>>>>>> the
> >>>>>>> metadata document (see CellMicrosyntax requirement][4]), e.g.
> to
> >>>>>>> define the sub-elements within the microsyntax that may be
> >>>>>>> extracted for use later - see [Parsing cell microsyntax][5].
> >>>>>>>>
> >>>>>>>> Comments welcome.
> >>>>>>>>
> >>>>>>>> Jeremy
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1]:
> >>>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-
> metadata
> >>>>>>>> -
> >>>>> and-
> >>>>>>> te
> >>>>>>>> mplate-for-simple-weather-obs-example.md
> >>>>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html
> >>>>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/
> >>>>>>>> [4]:
> >>>>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
> >>>>>>> CellMicrosynta
> >>>>>>>> x
> >>>>>>>> [5]:
> >>>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-
> metadata
> >>>>>>>> -
> >>>>> and-
> >>>>>>> te
> >>>>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell-
> >> microsyntax
> >>>>>
> >>>>>
> >>>>> ----
> >>>>> Ivan Herman, W3C
> >>>>> Digital Publishing Activity Lead
> >>>>> Home: http://www.w3.org/People/Ivan/
> >>>>> mobile: +31-641044153
> >>>>> GPG: 0x343F1A3D
> >>>>> WebID: http://www.ivan-herman.net/foaf#me
> >>>
> >>>
> >>> ----
> >>> Ivan Herman, W3C
> >>> Digital Publishing Activity Lead
> >>> Home: http://www.w3.org/People/Ivan/
> >>> mobile: +31-641044153
> >>> GPG: 0x343F1A3D
> >>> WebID: http://www.ivan-herman.net/foaf#me
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
> 
> 
> 
> 

Received on Tuesday, 24 June 2014 14:25:08 UTC