RE: Attempted example CSV metadata document and template

Just to make things easier to follow, I have added this worked example onto [GitHub][1]

Therein I have also added an alternative where we match on a "name" value (e.g. from a column or microsyntax element), which might be a bit easier for data users to follow, e.g.

               "conditional-match": {
                    "target": "soc-major-group-code",
                    "regexp": "^\\d{2}-0{4}$"
                    }

Jeremy

[1]: https://github.com/w3c/csvw/blob/gh-pages/examples/conditional-matching-in-occupational-listing-hierarchy.md 


> -----Original Message-----
> From: Tandy, Jeremy [mailto:jeremy.tandy@metoffice.gov.uk]
> Sent: 24 June 2014 11:34
> To: Ivan Herman
> Cc: Dan Brickley; W3C CSV on the Web Working Group
> Subject: RE: Attempted example CSV metadata document and template
> 
> > -----Original Message-----
> > From: Ivan Herman [mailto:ivan@w3.org]
> > Sent: 23 June 2014 17:35
> > To: Tandy, Jeremy
> > Cc: Dan Brickley; W3C CSV on the Web Working Group
> > Subject: Re: Attempted example CSV metadata document and template
> >
> >
> > On 23 Jun 2014, at 18:03 , Tandy, Jeremy
> > <jeremy.tandy@metoffice.gov.uk> wrote:
> >
> > >> -----Original Message-----
> > >> From: Ivan Herman [mailto:ivan@w3.org]
> > >> Sent: 21 June 2014 08:38
> > >> To: Tandy, Jeremy
> > >> Cc: Dan Brickley; W3C CSV on the Web Working Group
> > >> Subject: Re: Attempted example CSV metadata document and template
> > >>
> > >> Jeremy,
> > >>
> > >> one thing that I was wondering about was that the simple naming
> > >> mechanism for the various microsyntaxes may not work out. Consider
> > >>
> > >> 	"columns" : [
> > >> 		{ "name" : "datetime",
> > >> 		  ...
> > >>                  "microsytax": [
> > >> 			{ "name" : N1,
> > >> 			  "regexp" : "...."
> > >> 			},
> > >> 			.....
> > >>                  ]
> > >> 		},
> > >> 		{ "name" : "anothercolumn",
> > >> 		  ...
> > >> 		  "microsyntax"
> > >> 			{ "name" : N1,
> > >> 			  "regexp" : "...."
> > >> 			},
> > >> 			.....
> > >> 		}
> > >>
> > >> 	]
> > >>
> > >>
> > >> When working through the cells in a row, what would 'N1' refer to?
> > >> Unless we want to require the unicity of the microsyntax names, we
> > >> may hit an issue. And I do not think requiring a unique name is a
> > >> good idea; if the metadata becomes big, this may become a
> nuisance.
> > >
> > > Agreed. I made the assumption that all instances of "name" within a
> > given metadata document would need to be unique. I had not considered
> > any mechanisms to make this easy for users; e.g. using the "name"
> from
> > an enclosing object to automatically _namespace_ sub-names.
> > >
> > > We could leave it to the user to ensure uniqueness (easy for us;
> > > adds
> > load to the end user which is less good); in which case the example
> > above would fail to validate.
> > >
> > > Alternatively, we could apply a form of name-spacing; e.g.
> > "datetime/N1" and "anothercolumn/N1" within your example above.
> > >
> > >>
> > >> What this means is that the syntax becomes more complicated.
> > >> Something like {datetime:N1} or something similar (which raises
> the
> > >> issue of escape characters, too:-(
> > >
> > > Agreed! I chose a different separator character to you, but the
> same
> > issue applies.
> > >
> > >>
> > >> As for the conditionals: mustache has some syntax for this which
> is
> > a
> > >> bit different
> > >>
> > >> {{#bla}}
> > >>   .. any template here
> > >> {{/bla}}
> > >>
> > >> although the mustache semantics is a bit different (afaik it
> relies
> > >> on the existence or not of a key in an object). We could use the
> > >> mustache semantics but we probably need something more, too, like
> > "if
> > >> 'bla' is a microsyntax name and is true if the value of the cell
> > >> matches the regexp then it is true".
> > >
> > > Syntax-wise, we want our metadata document to be valid JSON, so we
> > would need something different to mustache. However, I agree that our
> > use cases call for similar semantics. Perhaps the syntax might be
> > something like:
> > >
> > > "condition: {
> > >    "operator": "if ({bla})",
> > >    "template": {
> > >        "name": "2010_Occupations-csv-to-ttl",
> > >        "description": "Template converting CSV content to SKOS/RDF
> > (expressed in Turtle syntax).",
> > >        "type": "template",
> > >        "path": "2010_Occupations-csv-to-ttl.ttl",
> > >        "hasFormat": "text/turtle"
> > >    }
> > > }
> > >
> > > In this case, I'm trying to say that the template will be triggered
> > if the value of {bla} is true / not null etc. ... the value of {bla}
> > is taken by evaluating the column (or microsyntax element) with
> "name"
> > = "bla" for the row being processed. Like you say: """it relies on
> the
> > existence or not of a key in an object"""
> > >
> > > (I don't really like the syntax; I guess that others can come up
> > > with
> > > better.)
> >
> > Ouch, you are right, I forgot about the fact that we want templates
> > for conditionals:-(
> >
> > But before getting into the boring issue of syntax we have to decide
> > whether we need them...
> 
> Syntax, boring ... no never! FWIW, it occurs to me that the conditional
> match might do better inside the "template" object, but more on that
> below.
> 
> >
> > >
> > >>
> > >> But I agree that the conditional complicates the templates a lot.
> > >> Here is where our use cases may have to switch in: do our use
> cases
> > >> justify the need for conditionals (remembering that, though we are
> > >> discussing turtle here, I do not see any difference between
> > >> generating turtle and generating XML or JSON through the same
> > mechanism).
> > >
> > > The requirement is ["R-ConditionalProcessingBasedOnCellValues"][1],
> > motivated by the ExpressingHierarchyWithinOccupationalListings use
> > case. This use case gives us two requirements:
> > >
> > > i) triggering a template if a value of a cell is not null; e.g. to
> > generate the SKOS concept scheme from the SOC structure ...
> > >
> > > 15-0000,,,,Computer and Mathematical Occupations,,,,,
> > > ,15-1100,,,Computer Occupations,,,,, ,,15-1110,,Computer and
> > > Information Research Scientists,,,,, ,,,15-1111,Computer and
> > > Information Research Scientists,,,,,
> > >
> > > Here we can see that I only want a ex:SOC-MajorGroup entity created
> > on the first row shown above (where col 1 is populated).
> > >
> > > ii) triggering a template if a value of a cell equates to a
> > particular string (or the opposite); e.g. when the value of "onetsoc-
> > occupation" = "00" as shown in the example shown [earlier in this
> > email thread][3]. ...
> > >
> > > "operator": "if ({onetsoc-occupation} == '00')"
> > >
> > > Perhaps there are cases for more complex operations? I don't know.
> > Perhaps this is where call-back functions or promises could be used
> to
> > parse a row and provide a Boolean response as to whether the template
> > should be triggered? Again, I don't know ... and some considerable
> > thought would be required to work out the details of such.
> >
> > For me these seem to be convincing that we need something. My
> > preference would be, though, to avoid all the issues about defining
> > 'if'-s and 'else'-s and comparions operators, etc, etc, and fall back
> > on regular expressions ('match'-'not match') simply because regular
> > expressions are used elsewhere already. Would that be enough?
> 
> I think that this would provide sufficient functionality for the two
> example requirements I listed.
> 
> Below, I've tried to provide worked examples for each of these
> requirements showing how such regexp conditional matching might be
> implemented ...
> 
> 1) triggering a template if a column in the row being processed is not
> empty (or null):
> 
> data snippet (from [soc_structure_2010.csv][1]):
> ---
> Major Group,Minor Group,Broad Group,Detailed Occupation,,,,,, ,,,,,,,,,
> {snip} 15-0000,,,,Computer and Mathematical Occupations,,,,, ,15-
> 1100,,,Computer Occupations,,,,, {snip} ,,15-1190,,Miscellaneous
> Computer Occupations,,,,, ,,,15-1199,"Computer Occupations, All
> Other",,,,, {snip}
> ---
> 
> [1]: http://w3c.github.io/csvw/use-cases-and-
> requirements/soc_structure_2010.csv
> 
> Let's assume that I want to trigger a template to create "Detailed
> Occupation" entities - I only want to trigger this when the 4th column
> is populated. Note that I have used "conditional-match" within the
> template blocks to provide a REGEXP that is assessed against the
> _ENTIRE_ row to determine if the template is triggered. Again, I'm not
> wedded to the names or syntax - just trying to express the idea.
> 
> (Aside 1: in creating this example, I have blundered into the
> challenges of wanting to repeatedly use same "name" within microsyntax
> blocks ... I got around the need for uniqueness using "/" as a pseudo
> path separator, but it feels clunky and ends up with long names!)
> 
> (Aside 2: I also noticed that my REGEXP weren't valid when embedding
> them in JSON as the "\" character needed escaping - hence the use of
> "\\" below ... I am assuming that any JSON processor will parse the
> literal _before_ trying to process the REGEXP)
> 
> Here's the metadata description for the resource:
> 
> ---
> {
>     "name": "soc-2010",
>     "title": "Standard Occupational Classification (2010)",
>     "publisher": [{
>         "name": "US Bureau of Labor Statistics",
>         "web": "http://www.bls.gov/ "
>     }],
>     "resources": [{
>         "name": "soc-2010-csv",
>         "path": "soc_structure_2010.csv",
>         "schema": {"columns": [
>             {
>                 "name": "soc-major-group-code",
>                 "title": "Major Group",
>                 "type": "string"
>             },
>             {
>                 "name": "soc-minor-group-code",
>                 "title": "Minor Group",
>                 "type": "string",
>                 "microsyntax": [{
>                     "name": "soc-minor-group-code/major-group-element",
>                     "regexp": "^(\\d{2})-\\d{4}$"
>                 }]
>             },
>             {
>                 "name": "soc-broad-group-code",
>                 "title": "Broad Group",
>                 "type": "string",
>                 "microsyntax": [
>                     {
>                         "name": "soc-broad-group-code/major-group-
> element",
>                         "regexp": "^(\\d{2})-\\d{4}$"
>                     },
>                     {
>                         "name": "soc-broad-group-code/minor-group-
> element",
>                         "regexp": "^\\d{2}-(\\d{2})\\d{2}$"
>                     }
>                 ]
>             },
>             {
>                 "name": "soc-detailed-occupation-code",
>                 "title": "Detailed Occupation",
>                 "type": "string",
>                 "microsyntax": [
>                     {
>                         "name": "soc-detailed-occupation-code/major-
> group-element",
>                         "regexp": "^(\\d{2})-\\d{4}$"
>                     },
>                     {
>                         "name": "soc-detailed-occupation-code/minor-
> group-element",
>                         "regexp": "^\\d{2}-(\\d{2})\\d{2}$"
>                     },
>                     {
>                         "name": "soc-detailed-occupation-code/broad-
> group-element",
>                         "regexp": "^\\d{2}-\\d{2}(\\d)\\d$"
>                     }
>                 ]
>             },
>             {
>                 "name": "soc-title",
>                 "title": "",
>                 "type": "string"
>             },
>             {"name": "empty(1)"},
>             {"name": "empty(2)"},
>             {"name": "empty(3)"},
>             {"name": "empty(4)"},
>             {"name": "empty(5)"}
>         ]},
>         "template": [
>             {
>                 "conditional-match": "^\\d{2}-0{4},{4}\\.*",
>                 "name": "major-group-template-ttl",
>                 "description": "Template converting Major Group content
> from SOC structure CSV content to SKOS/RDF (expressed in Turtle
> syntax).",
>                 "type": "template",
>                 "path": "major-group-csv-to-ttl-template.ttl",
>                 "hasFormat": "text/turtle"
>             },
>             {
>                 "conditional-match": "^,\\d{2}-\\d{2}0{2},{3}\\.*",
>                 "name": "minor-group-template-ttl",
>                 "description": "Template converting Minor Group content
> from SOC structure CSV content to SKOS/RDF (expressed in Turtle
> syntax).",
>                 "type": "template",
>                 "path": "minor-group-csv-to-ttl-template.ttl",
>                 "hasFormat": "text/turtle"
>             },
>             {
>                 "conditional-match": "^,{2}\\d{2}-\\d{3}0,{2}\\.*",
>                 "name": "broad-group-template-ttl",
>                 "description": "Template converting Broad Group content
> from SOC structure CSV content to SKOS/RDF (expressed in Turtle
> syntax).",
>                 "type": "template",
>                 "path": "broad-group-csv-to-ttl-template.ttl",
>                 "hasFormat": "text/turtle"
>             },
>             {
>                 "conditional-match": "^,{3}\\d{2}-\\d{4},\\.*",
>                 "name": "detailed-occupation-template-ttl",
>                 "description": "Template converting Detailed Occupation
> content from SOC structure CSV content to SKOS/RDF (expressed in Turtle
> syntax).",
>                 "type": "template",
>                 "path": "detailed-occupation-csv-to-ttl-template.ttl",
>                 "hasFormat": "text/turtle"
>             }
>         ]
>     }]
> }
> ---
> 
> (Apologies if the REGEXP has errors - not one of my strengths!)
> 
> My "detailed-occupation-csv-to-ttl-template.ttl" would be:
> ---
> ex:{soc-detailed-occupation-code} a ex:SOC-DetailedOccupation ;
>     skos:notation "{soc-detailed-occupation-code}" ;
>     skos:prefLabel "{soc-title}" ;
>     skos:broader ex:{soc-detailed-occupation-code/major-group-element}-
> 0000,
>                  ex:{soc-detailed-occupation-code/major-group-element}-
> {soc-detailed-occupation-code/minor-group-element}00,
>                  ex:{soc-detailed-occupation-code/major-group-element}-
> {soc-detailed-occupation-code/minor-group-element}{soc-detailed-
> occupation-code/broad-group-element}0 .
> ---
> 
> Thus, given the input row below:
> ---
> ,,,15-1199,"Computer Occupations, All Other",,,,,
> ---
> 
> ... the "detailed-occupation-template-ttl" should be triggered, based
> on the conditional match REGEXP, and provide the following TTL snippet:
> ---
> ex:15-1199 a ex:SOC-DetailedOccupation ;
>     skos:notation "15-1199" ;
>     skos:prefLabel "Computer Occupations, All Other" ;
>     skos:broader ex:15-0000,
>                  ex:15-1100,
>                  ex:15-1190 .
> ---
> 
> 2) triggering a template given a specific value within a microsyntax
> element:
> 
> data snippet (from [2010_Occupations.csv][2]):
> ---
> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 Description
> {snip} 15-1199.00,"Computer Occupations, All Other",All computer
> occupations not listed separately.
> {snip}
> 15-1199.03,Web Administrators,"Manage web environment design,
> deployment, development and maintenance activities.[...]"
> {snip}
> ---
> 
> [2]: http://w3c.github.io/csvw/use-cases-and-
> requirements/2010_Occupations.csv
> 
> This time I want to trigger a one template if the Occupation is a main
> category (e.g. Code = "15-1199.00"), else I want to trigger a different
> category. A main category is denoted with the final two digits of the
> code being "00".
> 
> (Aside 3: of course, as these two files are likely to be packaged
> together, I could have had just a single metadata description
> describing _both_ resources!)
> 
> (Aside 4: I've assumed that the conditional match is assessed against
> the entire row; whilst it's not impossible to deal with, I note that
> the need to potentially escape fields to count the columns is an added
> complexity!)
> 
> Here's the metadata description for the resource:
> 
> ---
> {
>     "name": "2010_Occupations",
>     "title": "O*NET-SOC Occupational listing for 2010",
>     "publisher": [{
>         "name": "O*Net Resource Center",
>         "web": " http://www.onetcenter.org/ "
>     }],
>     "resources": [{
>         "name": "2010_Occupations-csv",
>         "path": "2010_Occupations.csv",
>         "schema": {"columns": [
>             {
>                 "name": "onet-soc-2010-code",
>                 "title": "O*NET-SOC 2010 Code",
>                 "description": "O*NET Standard Occupational
> Classification Code (2010).",
>                 "type": "string",
>                 "required": true,
>                 "unique": true,
>                 "microsyntax": [
>                     {
>                         "name": "soc-major-group",
>                         "regexp": "^(\\d{2})-\\d{4}.\\d{2}$"
>                     },
>                     {
>                         "name": "soc-minor-group",
>                         "regexp": "^\\d{2}-(\\d{2})\\d{2}.\\d{2}$"
>                     },
>                     {
>                         "name": "soc-broad-group",
>                         "regexp": "^\\d{2}-\\d{2}(\\d)\\d.\\d{2}$"
>                     },
>                     {
>                         "name": "soc-detailed-occupation",
>                         "regexp": "^\\d{2}-\\d{3}(\\d).\\d{2}$"
>                     }
>                 ]
>             },
>             {
>                 "name": "title",
>                 "title": "O*NET-SOC 2010 Title",
>                 "description": "Title of occupational classification.",
>                 "type": "string",
>                 "required": true
>             },
>             {
>                 "name": "description",
>                 "title": "O*NET-SOC 2010 Description",
>                 "description": "Description of occupational
> classification.",
>                 "type": "string",
>                 "required": true
>             }
>         ]},
>         "template": [
>             {
>                 "conditional-match": "^\\d{2}-\\d{4}.00,\\.*",
>                 "name": "soc-occupation-category-template-ttl",
>                 "description": "Template converting SOC occupation
> category CSV content to SKOS/RDF (expressed in Turtle syntax).",
>                 "type": "template",
>                 "path": "soc-occupation-category-csv-to-ttl-
> template.ttl",
>                 "hasFormat": "text/turtle"
>             },
>             {
>                 "conditional-match": "^\\d{2}-\\d{4}.(?!00),\\.*",
>                 "name": "onet-soc-occupation-subcategory-template-ttl",
>                 "description": "Template converting O*NET SOC
> occupation sub-category CSV content to SKOS/RDF (expressed in Turtle
> syntax).",
>                 "type": "template",
>                 "path": "onet-soc-occupation-subcategory-csv-to-ttl-
> template.ttl",
>                 "hasFormat": "text/turtle"
>             }
>         ]
>     }]
> }
> ---
> 
> My TTL templates would be:
> ---soc-occupation-category-csv-to-ttl-template.ttl
> ex:{onet-soc-2010-code} a ex:SOC-DetailedOccupation ;
>     skos:notation "{onet-soc-2010-code}" ;
>     skos:prefLabel "{title}" ;
>     dct:description "{description}" ;
>     skos:exactMatch ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> group}{soc-detailed-occupation} ;
>     skos:broader ex:{soc-major-group}-0000,
>                  ex:{soc-major-group}-{soc-minor-group}00,
>                  ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> group}0 .
> ---
> 
> ---onet-soc-occupation-subcategory-csv-to-ttl-template.ttl
> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
>     skos:notation "{onet-soc-2010-code}" ;
>     skos:prefLabel "{title}" ;
>     dct:description "{description}" ;
>     skos:broader ex:{soc-major-group}-0000,
>                  ex:{soc-major-group}-{soc-minor-group}00,
>                  ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> group}0,
>                  ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> group}{soc-detailed-occupation} .
> ---
> 
> Thus, the input row below:
> ---
> 15-1199.00,"Computer Occupations, All Other",All computer occupations
> not listed separately.
> ---
> 
> ... would generate the following TTL snippet:
> ---
> ex:15-1199.00 a ex:SOC-DetailedOccupation ;
>     skos:notation "15-1199.00" ;
>     skos:prefLabel "Computer Occupations, All Other" ;
>     dct:description "All computer occupations not listed separately." ;
>     skos:exactMatch ex:15-1199 ;
>     skos:broader ex:15-0000,
>                  ex:15-1100,
>                  ex:15-1190 .
> ---
> 
> And this row:
> ---
> 15-1199.03,Web Administrators,"Manage web environment design,
> deployment, development and maintenance activities.[...]"
> ---
> 
> ... would generate this TTL snippet:
> ---
> ex:15-1199.03 a ex:ONETSOC-Occupation ;
>     skos:notation "15-1199.03" ;
>     skos:prefLabel "Web Administrators" ;
>     dct:description "Manage web environment design, deployment,
> development and maintenance activities.[...]" ;
>     skos:broader ex:15-0000,
>                  ex:15-1100,
>                  ex:15-1190,
>                  ex:15-1199 .
> ---
> 
> And I think that just about wraps it up.
> 
> Jeremy
> 
> >
> > Ivan
> >
> > >
> > > Jeremy
> > >
> > >
> > >
> > > [1]:
> > > http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-
> > Cond
> > > itionalProcessingBasedOnCellValues
> > > [2]:
> > > http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-
> > Exp
> > > ressingHierarchyWithinOccupationalListings
> > > [3]:
> > > http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0127.html
> > >
> > >>
> > >> My 2 cents...
> > >>
> > >> Ivan
> > >>
> > >>
> > >>
> > >>
> > >> On 19 Jun 2014, at 14:36 , Tandy, Jeremy
> > >> <jeremy.tandy@metoffice.gov.uk> wrote:
> > >>
> > >>>> -----Original Message-----
> > >>>> From: Dan Brickley [mailto:danbri@google.com]
> > >>>> Sent: 18 June 2014 12:46
> > >>>> To: Tandy, Jeremy
> > >>>> Cc: CSV on the Web Working Group
> > >>>> Subject: Re: Attempted example CSV metadata document and
> template
> > >>>>
> > >>>> On 12 June 2014 12:57, Tandy, Jeremy
> > >>>> <jeremy.tandy@metoffice.gov.uk>
> > >>>> wrote:
> > >>>>> All -
> > >>>>>
> > >>>>> I've just uploaded to [GitHub][1] a rework of the "Simple
> > >>>>> Weather
> > >>>> Observation" example. I've tried to create a CSV metadata
> > >>>> document following the rules in the [Metadata Vocabulary for
> > >>>> Tabular Data][2] and [Generating RDF from Tabular Data on the
> > >>>> Web][3]
> > documents.
> > >>>>>
> > >>>>> I would be particularly interested in:
> > >>>>>
> > >>>>> - corrections to errors!
> > >>>>> - comments on additional proposed properties in the metadata
> > >>>>> document ("short-name", "template", "microsyntax")
> > >>>>> - use of "hasFormat" to specify the Content-Type associated
> with
> > a
> > >>>>> Template
> > >>>>> - use of a REGEXP within a URI Template to convert ISO 8601
> > syntax
> > >>>>> to a simplified form
> > >>>>
> > >>>> I don't completely understand this mechanism yet, but do you
> > >>>> think
> > >> it
> > >>>> could be stretched to address the SKOS/codes issue in
> > >>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> > >>>> ExpressingHierarchyWithinOccupationalListings
> > >>>> where we'd want to explode strings like "15-1199.00", "15-
> 1199.01"
> > >>>> and emit triples like 'broader' when certain patterns matched?
> > >>>>
> > >>>> Dan
> > >>>>
> > >>>
> > >>> OK ... let's have a go.
> > >>>
> > >>> Here's the header and a line of data:
> > >>>
> > >>> ---
> > >>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010
> > >>> Description 15-1199.03,Web Administrators,"Manage web environment
> > >>> design,
> > >> deployment, development and maintenance activities. [...]"
> > >>> ---
> > >>>
> > >>> Here's a guess at the CSV metadata description in which I am
> using
> > >> the ["multiple regexp each extracting a single value" pattern][1]:
> > >>>
> > >>> ---
> > >>> {
> > >>>  "name": "2010_Occupations",
> > >>>  "title": "O*NET-SEC Occupational listing for 2010",
> > >>>  "publisher": [{
> > >>>      "name": "O*Net Resource Center",
> > >>>      "web": " http://www.onetcenter.org/ "
> > >>>  }],
> > >>>  "resources": [{
> > >>>      "name": "2010_Occupations-csv",
> > >>>      "path": "2010_Occupations.csv",
> > >>>      "schema": {"columns": [
> > >>>          {
> > >>>              "name": "onet-soc-2010-code",
> > >>>              "title": "O*NET-SOC 2010 Code",
> > >>>              "description": "O*NET Standard Occupational
> > >> Classification Code (2010).",
> > >>>              "type": "string",
> > >>>              "required": true,
> > >>>              "unique": true,
> > >>>              "microsyntax": [{
> > >>>                      "name": "soc-major-group",
> > >>>                      "regexp": "/^(\d{2})-\d{4}.\d{2}$/"
> > >>>                  },{
> > >>>                      "name": "soc-minor-group",
> > >>>                      "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/"
> > >>>                  },{
> > >>>                      "name": "soc-broad-group",
> > >>>                      "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/"
> > >>>                  },{
> > >>>                      "name": "soc-detailed-occupation",
> > >>>                      "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/"
> > >>>                  },{
> > >>>                      "name": "onetsoc-occupation",
> > >>>                      "regexp": "/^\d{2}-\d{4}.(\d{2})$/"
> > >>>                  }
> > >>>
> > >>>              ]
> > >>>          },
> > >>>          {
> > >>>              "name": "title",
> > >>>              "title": "O*NET-SOC 2010 Title",
> > >>>              "description": "Title of occupational
> > classification.",
> > >>>              "type": "string",
> > >>>              "required": true
> > >>>          },
> > >>>          {
> > >>>              "name": "description",
> > >>>              "title": "O*NET-SOC 2010 Description",
> > >>>              "description": Description of occupational
> > >> classification.",
> > >>>              "type": "string",
> > >>>              "required": true
> > >>>          }
> > >>>      ]},
> > >>>      "template": {
> > >>>          "name": "2010_Occupations-csv-to-ttl",
> > >>>          "description": "Template converting CSV content to
> > SKOS/RDF
> > >> (expressed in Turtle syntax).",
> > >>>          "type": "template",
> > >>>          "path": "2010_Occupations-csv-to-ttl.ttl",
> > >>>          "hasFormat": "text/turtle"
> > >>>      }
> > >>>  }]
> > >>> }
> > >>> ---
> > >>>
> > >>> You can see that I've used the `microsyntax` object to capture
> the
> > 5
> > >> independent elements of the O*NET-SOC code each with its own
> regexp:
> > >> "soc-major-group", "soc-minor-group", "soc-broad-group",
> > >> "soc-detailed- occupation" and "onetsoc-occupation". Whether this
> > >> is the _best_ way to do, I don't know ... it's just an idea to get
> > >> us talking about possibilities and options!
> > >>>
> > >>> The template (prefixes etc. intentionally left out) might then
> be:
> > >>>
> > >>> ---
> > >>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ;
> > >>>   skos:notation "{onet-soc-2010-code}" ;
> > >>>   skos:prefLabel "{title}" ;
> > >>>   dct:description "{description}" ;
> > >>>   skos:broader ex:{soc-major-group}-0000,
> > >>>                ex:{soc-major-group}-{soc-minor-group}00,
> > >>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> > >> group}0,
> > >>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> > >> group}{soc-detailed-occupation} .
> > >>> ---
> > >>>
> > >>> However, this does not help when we look at the required
> > >>> _conditional
> > >>> behaviour_: when the value of "onetsoc-occupation" = "00" this is
> > >>> identical to the term from the SOC taxonomy, and the template
> > should
> > >>> be more like
> > >>>
> > >>> ---
> > >>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc-
> > detailed
> > >>> -
> > >> occupation} a ex:SOC-DetailedOccupation ;
> > >>>   skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad-
> > >> group}{soc-detailed-occupation}" ;
> > >>>   skos:prefLabel "{title}" ;
> > >>>   dct:description "{description}" ;
> > >>>   skos:broader ex:{soc-major-group}-0000,
> > >>>                ex:{soc-major-group}-{soc-minor-group}00,
> > >>>                ex:{soc-major-group}-{soc-minor-group}{soc-broad-
> > >> group}0 .
> > >>> ---
> > >>>
> > >>> It occurs to be that we may wish to trigger different templates
> > >>> based
> > >> on a conditional response - or even whether we wish to trigger a
> > >> template at all for a given line!
> > >>>
> > >>> Thinking out of the box (is that a euphemism for "making it up as
> > >>> I
> > >> go along"?), it would seem that each "template" block in the CSV
> > >> metadata might have a "condition" statement that tells it when to
> > >> fire
> > >> - using values of column names or microsyntax element names? e.g.
> > >>>
> > >>> ---
> > >>>      "template": {
> > >>>          "name": "2010_Occupations-csv-to-ttl",
> > >>>          "description": "Template converting CSV content to
> > SKOS/RDF
> > >> (expressed in Turtle syntax).",
> > >>>          "type": "template",
> > >>>          "path": "2010_Occupations-csv-to-ttl.ttl",
> > >>>          "hasFormat": "text/turtle",
> > >>>          "condition": "if {soc-detailed-occupation} != '00'"
> > >>>      }
> > >>> ---
> > >>>
> > >>> Default behaviour (if no "condition" statement included) would be
> > >> _always_ to trigger the template for each row.
> > >>>
> > >>> However, looking at this, I am immediately concerned that
> > >>> including
> > >> if-then-else blocks and comparison operators hugely increases the
> > >> complexity of our work. Perhaps this is a good point to "bug out"
> > >> to some external agent (e.g. call-back function or promise).
> > >>>
> > >>> Jeremy
> > >>>
> > >>> [1]:
> > >>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata-
> > and-
> > >> te
> > >>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each-
> > >> extracti
> > >>> ng-single-value
> > >>>
> > >>>>
> > >>>>> - thoughts about a way to describe that microsyntax format
> > >>>>> within the
> > >>>> metadata document (see CellMicrosyntax requirement][4]), e.g. to
> > >>>> define the sub-elements within the microsyntax that may be
> > >>>> extracted for use later - see [Parsing cell microsyntax][5].
> > >>>>>
> > >>>>> Comments welcome.
> > >>>>>
> > >>>>> Jeremy
> > >>>>>
> > >>>>>
> > >>>>> [1]:
> > >>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-
> metadata-
> > >> and-
> > >>>> te
> > >>>>> mplate-for-simple-weather-obs-example.md
> > >>>>> [2]: http://w3c.github.io/csvw/metadata/index.html
> > >>>>> [3]: http://w3c.github.io/csvw/csv2rdf/
> > >>>>> [4]:
> > >>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R-
> > >>>> CellMicrosynta
> > >>>>> x
> > >>>>> [5]:
> > >>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-
> metadata-
> > >> and-
> > >>>> te
> > >>>>> mplate-for-simple-weather-obs-example.md#parsing-cell-
> microsynta
> > >>>>> x
> > >>
> > >>
> > >> ----
> > >> Ivan Herman, W3C
> > >> Digital Publishing Activity Lead
> > >> Home: http://www.w3.org/People/Ivan/
> > >> mobile: +31-641044153
> > >> GPG: 0x343F1A3D
> > >> WebID: http://www.ivan-herman.net/foaf#me
> >
> >
> > ----
> > Ivan Herman, W3C
> > Digital Publishing Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > GPG: 0x343F1A3D
> > WebID: http://www.ivan-herman.net/foaf#me
> >
> >
> >
> >
> 

Received on Tuesday, 24 June 2014 11:06:35 UTC