- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Mon, 23 Jun 2014 16:03:51 +0000
- To: Ivan Herman <ivan@w3.org>
- CC: Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
> -----Original Message-----
> From: Ivan Herman [mailto:ivan@w3.org]
> Sent: 21 June 2014 08:38
> To: Tandy, Jeremy
> Cc: Dan Brickley; W3C CSV on the Web Working Group
> Subject: Re: Attempted example CSV metadata document and template
>
> Jeremy,
>
> one thing that I was wondering about was that the simple naming
> mechanism for the various microsyntaxes may not work out. Consider
>
> "columns" : [
> { "name" : "datetime",
> ...
> "microsytax": [
> { "name" : N1,
> "regexp" : "...."
> },
> .....
> ]
> },
> { "name" : "anothercolumn",
> ...
> "microsyntax"
> { "name" : N1,
> "regexp" : "...."
> },
> .....
> }
>
> ]
>
>
> When working through the cells in a row, what would 'N1' refer to?
> Unless we want to require the unicity of the microsyntax names, we may
> hit an issue. And I do not think requiring a unique name is a good
> idea; if the metadata becomes big, this may become a nuisance.
Agreed. I made the assumption that all instances of "name" within a given metadata document would need to be unique. I had not considered any mechanisms to make this easy for users; e.g. using the "name" from an enclosing object to automatically _namespace_ sub-names.
We could leave it to the user to ensure uniqueness (easy for us; adds load to the end user which is less good); in which case the example above would fail to validate.
Alternatively, we could apply a form of name-spacing; e.g. "datetime/N1" and "anothercolumn/N1" within your example above.
>
> What this means is that the syntax becomes more complicated. Something
> like {datetime:N1} or something similar (which raises the issue of
> escape characters, too:-(
Agreed! I chose a different separator character to you, but the same issue applies.
>
> As for the conditionals: mustache has some syntax for this which is a
> bit different
>
> {{#bla}}
> .. any template here
> {{/bla}}
>
> although the mustache semantics is a bit different (afaik it relies on
> the existence or not of a key in an object). We could use the mustache
> semantics but we probably need something more, too, like "if 'bla' is a
> microsyntax name and is true if the value of the cell matches the
> regexp then it is true".
Syntax-wise, we want our metadata document to be valid JSON, so we would need something different to mustache. However, I agree that our use cases call for similar semantics. Perhaps the syntax might be something like:
"condition: {
"operator": "if ({bla})",
"template": {
"name": "2010_Occupations-csv-to-ttl",
"description": "Template converting CSV content to SKOS/RDF (expressed in Turtle syntax).",
"type": "template",
"path": "2010_Occupations-csv-to-ttl.ttl",
"hasFormat": "text/turtle"
}
}
In this case, I'm trying to say that the template will be triggered if the value of {bla} is true / not null etc. ... the value of {bla} is taken by evaluating the column (or microsyntax element) with "name" = "bla" for the row being processed. Like you say: """it relies on the existence or not of a key in an object"""
(I don't really like the syntax; I guess that others can come up with better.)
>
> But I agree that the conditional complicates the templates a lot. Here
> is where our use cases may have to switch in: do our use cases justify
> the need for conditionals (remembering that, though we are discussing
> turtle here, I do not see any difference between generating turtle and
> generating XML or JSON through the same mechanism).
The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1], motivated by the ExpressingHierarchyWithinOccupationalListings use case. This use case gives us two requirements:
i) triggering a template if a value of a cell is not null; e.g. to generate the SKOS concept scheme from the SOC structure ...
15-0000,,,,Computer and Mathematical Occupations,,,,,
,15-1100,,,Computer Occupations,,,,,
,,15-1110,,Computer and Information Research Scientists,,,,,
,,,15-1111,Computer and Information Research Scientists,,,,,
Here we can see that I only want a ex:SOC-MajorGroup entity created on the first row shown above (where col 1 is populated).
ii) triggering a template if a value of a cell equates to a particular string (or the opposite); e.g. when the value of "onetsoc-occupation" = "00" as shown in the example shown [earlier in this email thread][3]. ...
"operator": "if ({onetsoc-occupation} == '00')"
Perhaps there are cases for more complex operations? I don't know. Perhaps this is where call-back functions or promises could be used to parse a row and provide a Boolean response as to whether the template should be triggered? Again, I don't know ... and some considerable thought would be required to work out the details of such.
Jeremy
[1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-ConditionalProcessingBasedOnCellValues
[2]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-ExpressingHierarchyWithinOccupationalListings
[3]: http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html
>
> My 2 cents...
>
> Ivan
>
>
>
>
> On 19 Jun 2014, at 14:36 , Tandy, Jeremy
> <jeremy.tandy@metoffice.gov.uk> wrote:
>
> >> -----Original Message-----
> >> From: Dan Brickley [mailto:danbri@google.com]
> >> Sent: 18 June 2014 12:46
> >> To: Tandy, Jeremy
> >> Cc: CSV on the Web Working Group
> >> Subject: Re: Attempted example CSV metadata document and template
> >>
> >> On 12 June 2014 12:57, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
> >> wrote:
> >>> All -
> >>>
> >>> I've just uploaded to [GitHub][1] a rework of the "Simple Weather
> >> Observation" example. I've tried to create a CSV metadata document
> >> following the rules in the [Metadata Vocabulary for Tabular Data][2]
> >> and [Generating RDF from Tabular Data on the Web][3] documents.
> >>>
> >>> I would be particularly interested in:
> >>>
> >>> - corrections to errors!
> >>> - comments on additional proposed properties in the metadata
> >>> document ("short-name", "template", "microsyntax")
> >>> - use of "hasFormat" to specify the Content-Type associated with a
> >>> Template
> >>> - use of a REGEXP within a URI Template to convert ISO 8601 syntax
> >>> to a simplified form
> >>
> >> I don't completely understand this mechanism yet, but do you think
> it
> >> could be stretched to address the SKOS/codes issue in
> >> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> >> ExpressingHierarchyWithinOccupationalListings
> >> where we'd want to explode strings like "15-1199.00", "15-1199.01"
> >> and emit triples like 'broader' when certain patterns matched?
> >>
> >> Dan
> >>
> >
> > OK ... let's have a go.
> >
> > Here's the header and a line of data:
> >
> > ---
> > O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description
> > 15-1199.03,Web Administrators,"Manage web environment design,
> deployment, development and maintenance activities. [...]"
> > ---
> >
> > Here's a guess at the CSV metadata description in which I am using
> the ["multiple regexp each extracting a single value" pattern][1]:
> >
> > ---
> > {
> > "name": "2010_Occupations",
> > "title": "O*NET-SEC Occupational listing for 2010",
> > "publisher": [{
> > "name": "O*Net Resource Center",
> > "web": " http://www.onetcenter.org/ "
> > }],
> > "resources": [{
> > "name": "2010_Occupations-csv",
> > "path": "2010_Occupations.csv",
> > "schema": {"columns": [
> > {
> > "name": "onet-soc-2010-code",
> > "title": "O*NET-SOC 2010 Code",
> > "description": "O*NET Standard Occupational
> Classification Code (2010).",
> > "type": "string",
> > "required": true,
> > "unique": true,
> > "microsyntax": [{
> > "name": "soc-major-group",
> > "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
> > },{
> > "name": "soc-minor-group",
> > "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
> > },{
> > "name": "soc-broad-group",
> > "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
> > },{
> > "name": "soc-detailed-occupation",
> > "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
> > },{
> > "name": "onetsoc-occupation",
> > "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
> > }
> >
> > ]
> > },
> > {
> > "name": "title",
> > "title": "O*NET-SOC 2010 Title",
> > "description": "Title of occupational classification.",
> > "type": "string",
> > "required": true
> > },
> > {
> > "name": "description",
> > "title": "O*NET-SOC 2010 Description",
> > "description": Description of occupational
> classification.",
> > "type": "string",
> > "required": true
> > }
> > ]},
> > "template": {
> > "name": "2010_Occupations-csv-to-ttl",
> > "description": "Template converting CSV content to SKOS/RDF
> (expressed in Turtle syntax).",
> > "type": "template",
> > "path": "2010_Occupations-csv-to-ttl.ttl",
> > "hasFormat": "text/turtle"
> > }
> > }]
> > }
> > ---
> >
> > You can see that I've used the `microsyntax` object to capture the 5
> independent elements of the O*NET-SOC code each with its own regexp:
> "soc-major-group", "soc-minor-group", "soc-broad-group", "soc-detailed-
> occupation" and "onetsoc-occupation". Whether this is the _best_ way to
> do, I don't know ... it's just an idea to get us talking about
> possibilities and options!
> >
> > The template (prefixes etc. intentionally left out) might then be:
> >
> > ---
> > ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
> > skos:notation "{onet-soc-2010-code}" ;
> > skos:prefLabel "{title}" ;
> > dct:description "{description}" ;
> > skos:broader ex:{soc-major-group}-0000,
> > ex:{soc-major-group}-{soc-minor-group}00,
> > ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> group}0,
> > ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> group}{soc-detailed-occupation} .
> > ---
> >
> > However, this does not help when we look at the required _conditional
> > behaviour_: when the value of "onetsoc-occupation" = "00" this is
> > identical to the term from the SOC taxonomy, and the template should
> > be more like
> >
> > ---
> > ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-detailed-
> occupation} a ex:SOC-DetailedOccupation ;
> > skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-
> group}{soc-detailed-occupation}" ;
> > skos:prefLabel "{title}" ;
> > dct:description "{description}" ;
> > skos:broader ex:{soc-major-group}-0000,
> > ex:{soc-major-group}-{soc-minor-group}00,
> > ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> group}0 .
> > ---
> >
> > It occurs to be that we may wish to trigger different templates based
> on a conditional response - or even whether we wish to trigger a
> template at all for a given line!
> >
> > Thinking out of the box (is that a euphemism for "making it up as I
> go along"?), it would seem that each "template" block in the CSV
> metadata might have a "condition" statement that tells it when to fire
> - using values of column names or microsyntax element names? e.g.
> >
> > ---
> > "template": {
> > "name": "2010_Occupations-csv-to-ttl",
> > "description": "Template converting CSV content to SKOS/RDF
> (expressed in Turtle syntax).",
> > "type": "template",
> > "path": "2010_Occupations-csv-to-ttl.ttl",
> > "hasFormat": "text/turtle",
> > "condition": "if {soc-detailed-occupation} != '00'"
> > }
> > ---
> >
> > Default behaviour (if no "condition" statement included) would be
> _always_ to trigger the template for each row.
> >
> > However, looking at this, I am immediately concerned that including
> if-then-else blocks and comparison operators hugely increases the
> complexity of our work. Perhaps this is a good point to "bug out" to
> some external agent (e.g. call-back function or promise).
> >
> > Jeremy
> >
> > [1]:
> > https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-and-
> te
> > mplate-for-simple-weather-obs-example.md#multiple-regexp-each-
> extracti
> > ng-single-value
> >
> >>
> >>> - thoughts about a way to describe that microsyntax format within
> >>> the
> >> metadata document (see CellMicrosyntax requirement][4]), e.g. to
> >> define the sub-elements within the microsyntax that may be extracted
> >> for use later - see [Parsing cell microsyntax][5].
> >>>
> >>> Comments welcome.
> >>>
> >>> Jeremy
> >>>
> >>>
> >>> [1]:
> >>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
> and-
> >> te
> >>> mplate-for-simple-weather-obs-example.md
> >>> [2]: http://w3c.github.io/csvw/metadata/index.html
> >>> [3]: http://w3c.github.io/csvw/csv2rdf/
> >>> [4]:
> >>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
> >> CellMicrosynta
> >>> x
> >>> [5]:
> >>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
> and-
> >> te
> >>> mplate-for-simple-weather-obs-example.md#parsing-cell-microsyntax
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
Received on Monday, 23 June 2014 16:04:24 UTC