- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Tue, 24 Jun 2014 14:24:39 +0000
- To: Ivan Herman <ivan@w3.org>
- CC: Andy Seaborne <andy@apache.org>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
> -----Original Message----- > From: Ivan Herman [mailto:ivan@w3.org] > Sent: 24 June 2014 13:40 > To: Tandy, Jeremy > Cc: Andy Seaborne; W3C CSV on the Web Working Group > Subject: Re: Attempted example CSV metadata document and template > > Hi Jeremy, > > I think I get it:-) > > But... I see two problems with this approach. > > - (This is the lesser one): do we really want to require the references > to the templates to be part of the Metadata? I never saw a problem with this - but I didn't think about it very hard! > I would guess that, > typically (although not exclusively) the metadata is provided by the > data publisher. Anyone can provide metadata ... see [R-IndependentMetadataPublication][1]. So even if the original data publisher did not provide templates, a third party could publish their own metadata description including references to the templates. [1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-IndependentMetadataPublication > But why would be the case for the templates? The same > data set may be converted to different, say, XML depending on the > application; ie, the templates may very well be end-user specific. Also true, that's why my template blocks include the `hasFormat` key to say what the intended target format is. This then enables conversion software to offer the user a choice of conversions based on the formats expressed in the templates. I agree that this will create an increasingly large number of template blocks for a small number of "power-user" cases. > But > if the user defines his/her template, adding those metadata entries > might be an extra load... > > - If my understanding is correct, the model you have is {Condition via > regexp} -> {one particular template} (Andy, is this also what you > referred to?). I think so ... > Although you do not have an example like that, Could you provide an example? > but I > also presume one can extend that to an array of conditions to provide > conjunction of conditions. However, that means I would have to provide > a set of templates for the different cases, which means I would have to > repeat the common parts over many templates. This looks fairly error > prone to me:-( This is true. My goal so far was to articulate the problem with worked examples ... in the hope that we can iterate toward a solution that is elegant. I still anticipate a few more steps along that road! > > Moving the conditions from the metadata into the templates themselves > seem to be less error prone (although ending up with essentially if- > the-else structures which may be a bit more complicated to implement). > (Of course, we have the syntax issue on how to define the templates so > that it would also work well with XML, Turtle, and JSON as a targeted > output; lots of escape characters ahead...) It would be good to see these ideas encapsulated in examples; I think it makes them easier to discuss! > > Another possibility may be to have some sort of an include facility. > Much like #include in cpp... > Ah, the possibilities ... the trick is, as Einstein is reported to have said, to ensure that "everything [is] made as simple as possible, but not simpler." :-) Jeremy > Ivan > > > > > On 24 Jun 2014, at 13:11 , Tandy, Jeremy > <jeremy.tandy@metoffice.gov.uk> wrote: > > > Hi Andy - > > > > Hopefully the [worked example][1] that I've just created illustrate > your point. Please feel free to fix / amend / re-write as necessary!!! > > > > In both examples, I've created multiple templates, which are > configured to be triggered on a matching condition. > > > > Jeremy > > > > [1]: > > https://github.com/w3c/csvw/blob/gh-pages/examples/conditional- > matchin > > g-in-occupational-listing-hierarchy.md > > > >> -----Original Message----- > >> From: Andy Seaborne [mailto:andy@apache.org] > >> Sent: 24 June 2014 11:39 > >> To: public-csv-wg@w3.org > >> Subject: Re: Attempted example CSV metadata document and template > >> > >> (general observation) > >> > >> There are ways to get conditional effects without explicit "if-the- > >> else" > >> > >> 1/ Apply different templates : that is multiple passes with > different > >> matching conditions. > >> > >> 2/ A template is valid if and only if all its associated templates > >> are defined (the template may not acatully be used) so that a (non- > >> )matching regex is controlling whether the template is applied. > >> > >> These might be applicable separately or together. > >> > >> Andy > >> > >> On 23/06/14 17:35, Ivan Herman wrote: > >>> > >>> On 23 Jun 2014, at 18:03 , Tandy, Jeremy > >> <jeremy.tandy@metoffice.gov.uk> wrote: > >>> > >>>>> -----Original Message----- > >>>>> From: Ivan Herman [mailto:ivan@w3.org] > >>>>> Sent: 21 June 2014 08:38 > >>>>> To: Tandy, Jeremy > >>>>> Cc: Dan Brickley; W3C CSV on the Web Working Group > >>>>> Subject: Re: Attempted example CSV metadata document and template > >>>>> > >>>>> Jeremy, > >>>>> > >>>>> one thing that I was wondering about was that the simple naming > >>>>> mechanism for the various microsyntaxes may not work out. > Consider > >>>>> > >>>>> "columns" : [ > >>>>> { "name" : "datetime", > >>>>> ... > >>>>> "microsytax": [ > >>>>> { "name" : N1, > >>>>> "regexp" : "...." > >>>>> }, > >>>>> ..... > >>>>> ] > >>>>> }, > >>>>> { "name" : "anothercolumn", > >>>>> ... > >>>>> "microsyntax" > >>>>> { "name" : N1, > >>>>> "regexp" : "...." > >>>>> }, > >>>>> ..... > >>>>> } > >>>>> > >>>>> ] > >>>>> > >>>>> > >>>>> When working through the cells in a row, what would 'N1' refer > to? > >>>>> Unless we want to require the unicity of the microsyntax names, > we > >>>>> may hit an issue. And I do not think requiring a unique name is a > >>>>> good idea; if the metadata becomes big, this may become a > nuisance. > >>>> > >>>> Agreed. I made the assumption that all instances of "name" within > a > >> given metadata document would need to be unique. I had not > considered > >> any mechanisms to make this easy for users; e.g. using the "name" > >> from an enclosing object to automatically _namespace_ sub-names. > >>>> > >>>> We could leave it to the user to ensure uniqueness (easy for us; > >> adds load to the end user which is less good); in which case the > >> example above would fail to validate. > >>>> > >>>> Alternatively, we could apply a form of name-spacing; e.g. > >> "datetime/N1" and "anothercolumn/N1" within your example above. > >>>> > >>>>> > >>>>> What this means is that the syntax becomes more complicated. > >>>>> Something like {datetime:N1} or something similar (which raises > >>>>> the issue of escape characters, too:-( > >>>> > >>>> Agreed! I chose a different separator character to you, but the > >>>> same > >> issue applies. > >>>> > >>>>> > >>>>> As for the conditionals: mustache has some syntax for this which > >>>>> is a bit different > >>>>> > >>>>> {{#bla}} > >>>>> .. any template here > >>>>> {{/bla}} > >>>>> > >>>>> although the mustache semantics is a bit different (afaik it > >>>>> relies on the existence or not of a key in an object). We could > >>>>> use the mustache semantics but we probably need something more, > >>>>> too, like "if 'bla' is a microsyntax name and is true if the > value > >>>>> of the > >> cell > >>>>> matches the regexp then it is true". > >>>> > >>>> Syntax-wise, we want our metadata document to be valid JSON, so we > >> would need something different to mustache. However, I agree that > our > >> use cases call for similar semantics. Perhaps the syntax might be > >> something like: > >>>> > >>>> "condition: { > >>>> "operator": "if ({bla})", > >>>> "template": { > >>>> "name": "2010_Occupations-csv-to-ttl", > >>>> "description": "Template converting CSV content to SKOS/RDF > >> (expressed in Turtle syntax).", > >>>> "type": "template", > >>>> "path": "2010_Occupations-csv-to-ttl.ttl", > >>>> "hasFormat": "text/turtle" > >>>> } > >>>> } > >>>> > >>>> In this case, I'm trying to say that the template will be > triggered > >> if the value of {bla} is true / not null etc. ... the value of {bla} > >> is taken by evaluating the column (or microsyntax element) with > >> "name" = "bla" for the row being processed. Like you say: """it > >> relies on the existence or not of a key in an object""" > >>>> > >>>> (I don't really like the syntax; I guess that others can come up > >> with > >>>> better.) > >>> > >>> Ouch, you are right, I forgot about the fact that we want templates > >>> for conditionals:-( > >>> > >>> But before getting into the boring issue of syntax we have to > decide > >> whether we need them... > >>> > >>>> > >>>>> > >>>>> But I agree that the conditional complicates the templates a lot. > >>>>> Here is where our use cases may have to switch in: do our use > >>>>> cases justify the need for conditionals (remembering that, though > >>>>> we are discussing turtle here, I do not see any difference > between > >>>>> generating turtle and generating XML or JSON through the same > >> mechanism). > >>>> > >>>> The requirement is ["R- > ConditionalProcessingBasedOnCellValues"][1], > >> motivated by the ExpressingHierarchyWithinOccupationalListings use > >> case. This use case gives us two requirements: > >>>> > >>>> i) triggering a template if a value of a cell is not null; e.g. to > >> generate the SKOS concept scheme from the SOC structure ... > >>>> > >>>> 15-0000,,,,Computer and Mathematical Occupations,,,,, > >>>> ,15-1100,,,Computer Occupations,,,,, ,,15-1110,,Computer and > >>>> Information Research Scientists,,,,, ,,,15-1111,Computer and > >>>> Information Research Scientists,,,,, > >>>> > >>>> Here we can see that I only want a ex:SOC-MajorGroup entity > created > >> on the first row shown above (where col 1 is populated). > >>>> > >>>> ii) triggering a template if a value of a cell equates to a > >> particular string (or the opposite); e.g. when the value of > "onetsoc- > >> occupation" = "00" as shown in the example shown [earlier in this > >> email thread][3]. ... > >>>> > >>>> "operator": "if ({onetsoc-occupation} == '00')" > >>>> > >>>> Perhaps there are cases for more complex operations? I don't know. > >> Perhaps this is where call-back functions or promises could be used > >> to parse a row and provide a Boolean response as to whether the > >> template should be triggered? Again, I don't know ... and some > >> considerable thought would be required to work out the details of > such. > >>> > >>> For me these seem to be convincing that we need something. My > >> preference would be, though, to avoid all the issues about defining > >> 'if'-s and 'else'-s and comparions operators, etc, etc, and fall > back > >> on regular expressions ('match'-'not match') simply because regular > >> expressions are used elsewhere already. Would that be enough? > >>> > >>> Ivan > >>> > >>>> > >>>> Jeremy > >>>> > >>>> > >>>> > >>>> [1]: > >>>> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R- > >> Con > >>>> ditionalProcessingBasedOnCellValues > >>>> [2]: > >>>> http://w3c.github.io/csvw/use-cases-and- > requirements/index.html#UC- > >> Ex > >>>> pressingHierarchyWithinOccupationalListings > >>>> [3]: > >>>> http://lists.w3.org/Archives/Public/public-csv- > wg/2014Jun/0127.html > >>>> > >>>>> > >>>>> My 2 cents... > >>>>> > >>>>> Ivan > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 19 Jun 2014, at 14:36 , Tandy, Jeremy > >>>>> <jeremy.tandy@metoffice.gov.uk> wrote: > >>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Dan Brickley [mailto:danbri@google.com] > >>>>>>> Sent: 18 June 2014 12:46 > >>>>>>> To: Tandy, Jeremy > >>>>>>> Cc: CSV on the Web Working Group > >>>>>>> Subject: Re: Attempted example CSV metadata document and > >>>>>>> template > >>>>>>> > >>>>>>> On 12 June 2014 12:57, Tandy, Jeremy > >>>>>>> <jeremy.tandy@metoffice.gov.uk> > >>>>>>> wrote: > >>>>>>>> All - > >>>>>>>> > >>>>>>>> I've just uploaded to [GitHub][1] a rework of the "Simple > >> Weather > >>>>>>> Observation" example. I've tried to create a CSV metadata > >> document > >>>>>>> following the rules in the [Metadata Vocabulary for Tabular > >>>>>>> Data][2] and [Generating RDF from Tabular Data on the Web][3] > >> documents. > >>>>>>>> > >>>>>>>> I would be particularly interested in: > >>>>>>>> > >>>>>>>> - corrections to errors! > >>>>>>>> - comments on additional proposed properties in the metadata > >>>>>>>> document ("short-name", "template", "microsyntax") > >>>>>>>> - use of "hasFormat" to specify the Content-Type associated > >>>>>>>> with a Template > >>>>>>>> - use of a REGEXP within a URI Template to convert ISO 8601 > >>>>>>>> syntax to a simplified form > >>>>>>> > >>>>>>> I don't completely understand this mechanism yet, but do you > >> think > >>>>> it > >>>>>>> could be stretched to address the SKOS/codes issue in > >>>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- > >>>>>>> ExpressingHierarchyWithinOccupationalListings > >>>>>>> where we'd want to explode strings like "15-1199.00", "15- > >> 1199.01" > >>>>>>> and emit triples like 'broader' when certain patterns matched? > >>>>>>> > >>>>>>> Dan > >>>>>>> > >>>>>> > >>>>>> OK ... let's have a go. > >>>>>> > >>>>>> Here's the header and a line of data: > >>>>>> > >>>>>> --- > >>>>>> O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2010 > >> Description > >>>>>> 15-1199.03,Web Administrators,"Manage web environment design, > >>>>> deployment, development and maintenance activities. [...]" > >>>>>> --- > >>>>>> > >>>>>> Here's a guess at the CSV metadata description in which I am > >>>>>> using > >>>>> the ["multiple regexp each extracting a single value" > pattern][1]: > >>>>>> > >>>>>> --- > >>>>>> { > >>>>>> "name": "2010_Occupations", > >>>>>> "title": "O*NET-SEC Occupational listing for 2010", > >>>>>> "publisher": [{ > >>>>>> "name": "O*Net Resource Center", > >>>>>> "web": " http://www.onetcenter.org/ " > >>>>>> }], > >>>>>> "resources": [{ > >>>>>> "name": "2010_Occupations-csv", > >>>>>> "path": "2010_Occupations.csv", > >>>>>> "schema": {"columns": [ > >>>>>> { > >>>>>> "name": "onet-soc-2010-code", > >>>>>> "title": "O*NET-SOC 2010 Code", > >>>>>> "description": "O*NET Standard Occupational > >>>>> Classification Code (2010).", > >>>>>> "type": "string", > >>>>>> "required": true, > >>>>>> "unique": true, > >>>>>> "microsyntax": [{ > >>>>>> "name": "soc-major-group", > >>>>>> "regexp": "/^(\d{2})-\d{4}.\d{2}$/" > >>>>>> },{ > >>>>>> "name": "soc-minor-group", > >>>>>> "regexp": "/^\d{2}-(\d{2})\d{2}.\d{2}$/" > >>>>>> },{ > >>>>>> "name": "soc-broad-group", > >>>>>> "regexp": "/^\d{2}-\d{2}(\d)\d.\d{2}$/" > >>>>>> },{ > >>>>>> "name": "soc-detailed-occupation", > >>>>>> "regexp": "/^\d{2}-\d{3}(\d).\d{2}$/" > >>>>>> },{ > >>>>>> "name": "onetsoc-occupation", > >>>>>> "regexp": "/^\d{2}-\d{4}.(\d{2})$/" > >>>>>> } > >>>>>> > >>>>>> ] > >>>>>> }, > >>>>>> { > >>>>>> "name": "title", > >>>>>> "title": "O*NET-SOC 2010 Title", > >>>>>> "description": "Title of occupational > >> classification.", > >>>>>> "type": "string", > >>>>>> "required": true > >>>>>> }, > >>>>>> { > >>>>>> "name": "description", > >>>>>> "title": "O*NET-SOC 2010 Description", > >>>>>> "description": Description of occupational > >>>>> classification.", > >>>>>> "type": "string", > >>>>>> "required": true > >>>>>> } > >>>>>> ]}, > >>>>>> "template": { > >>>>>> "name": "2010_Occupations-csv-to-ttl", > >>>>>> "description": "Template converting CSV content to > >>>>>> SKOS/RDF > >>>>> (expressed in Turtle syntax).", > >>>>>> "type": "template", > >>>>>> "path": "2010_Occupations-csv-to-ttl.ttl", > >>>>>> "hasFormat": "text/turtle" > >>>>>> } > >>>>>> }] > >>>>>> } > >>>>>> --- > >>>>>> > >>>>>> You can see that I've used the `microsyntax` object to capture > >>>>>> the > >>>>>> 5 > >>>>> independent elements of the O*NET-SOC code each with its own > >> regexp: > >>>>> "soc-major-group", "soc-minor-group", "soc-broad-group", > >>>>> "soc-detailed- occupation" and "onetsoc-occupation". Whether this > >> is > >>>>> the _best_ way to do, I don't know ... it's just an idea to get > us > >>>>> talking about possibilities and options! > >>>>>> > >>>>>> The template (prefixes etc. intentionally left out) might then > be: > >>>>>> > >>>>>> --- > >>>>>> ex:{onet-soc-2010-code} a ex:ONETSOC-Occupation ; > >>>>>> skos:notation "{onet-soc-2010-code}" ; > >>>>>> skos:prefLabel "{title}" ; > >>>>>> dct:description "{description}" ; > >>>>>> skos:broader ex:{soc-major-group}-0000, > >>>>>> ex:{soc-major-group}-{soc-minor-group}00, > >>>>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- > >>>>> group}0, > >>>>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- > >>>>> group}{soc-detailed-occupation} . > >>>>>> --- > >>>>>> > >>>>>> However, this does not help when we look at the required > >>>>>> _conditional > >>>>>> behaviour_: when the value of "onetsoc-occupation" = "00" this > is > >>>>>> identical to the term from the SOC taxonomy, and the template > >>>>>> should be more like > >>>>>> > >>>>>> --- > >>>>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad-group}{soc- > >> detaile > >>>>>> d- > >>>>> occupation} a ex:SOC-DetailedOccupation ; > >>>>>> skos:notation "{soc-major-group}-{soc-minor-group}{soc-broad- > >>>>> group}{soc-detailed-occupation}" ; > >>>>>> skos:prefLabel "{title}" ; > >>>>>> dct:description "{description}" ; > >>>>>> skos:broader ex:{soc-major-group}-0000, > >>>>>> ex:{soc-major-group}-{soc-minor-group}00, > >>>>>> ex:{soc-major-group}-{soc-minor-group}{soc-broad- > >>>>> group}0 . > >>>>>> --- > >>>>>> > >>>>>> It occurs to be that we may wish to trigger different templates > >>>>>> based > >>>>> on a conditional response - or even whether we wish to trigger a > >>>>> template at all for a given line! > >>>>>> > >>>>>> Thinking out of the box (is that a euphemism for "making it up > as > >> I > >>>>> go along"?), it would seem that each "template" block in the CSV > >>>>> metadata might have a "condition" statement that tells it when to > >>>>> fire > >>>>> - using values of column names or microsyntax element names? e.g. > >>>>>> > >>>>>> --- > >>>>>> "template": { > >>>>>> "name": "2010_Occupations-csv-to-ttl", > >>>>>> "description": "Template converting CSV content to > >>>>>> SKOS/RDF > >>>>> (expressed in Turtle syntax).", > >>>>>> "type": "template", > >>>>>> "path": "2010_Occupations-csv-to-ttl.ttl", > >>>>>> "hasFormat": "text/turtle", > >>>>>> "condition": "if {soc-detailed-occupation} != '00'" > >>>>>> } > >>>>>> --- > >>>>>> > >>>>>> Default behaviour (if no "condition" statement included) would > be > >>>>> _always_ to trigger the template for each row. > >>>>>> > >>>>>> However, looking at this, I am immediately concerned that > >> including > >>>>> if-then-else blocks and comparison operators hugely increases the > >>>>> complexity of our work. Perhaps this is a good point to "bug out" > >> to > >>>>> some external agent (e.g. call-back function or promise). > >>>>>> > >>>>>> Jeremy > >>>>>> > >>>>>> [1]: > >>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv-metadata- > >> and > >>>>>> - > >>>>> te > >>>>>> mplate-for-simple-weather-obs-example.md#multiple-regexp-each- > >>>>> extracti > >>>>>> ng-single-value > >>>>>> > >>>>>>> > >>>>>>>> - thoughts about a way to describe that microsyntax format > >> within > >>>>>>>> the > >>>>>>> metadata document (see CellMicrosyntax requirement][4]), e.g. > to > >>>>>>> define the sub-elements within the microsyntax that may be > >>>>>>> extracted for use later - see [Parsing cell microsyntax][5]. > >>>>>>>> > >>>>>>>> Comments welcome. > >>>>>>>> > >>>>>>>> Jeremy > >>>>>>>> > >>>>>>>> > >>>>>>>> [1]: > >>>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv- > metadata > >>>>>>>> - > >>>>> and- > >>>>>>> te > >>>>>>>> mplate-for-simple-weather-obs-example.md > >>>>>>>> [2]: http://w3c.github.io/csvw/metadata/index.html > >>>>>>>> [3]: http://w3c.github.io/csvw/csv2rdf/ > >>>>>>>> [4]: > >>>>>>>> http://w3c.github.io/csvw/use-cases-and-requirements/#R- > >>>>>>> CellMicrosynta > >>>>>>>> x > >>>>>>>> [5]: > >>>>>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/csv- > metadata > >>>>>>>> - > >>>>> and- > >>>>>>> te > >>>>>>>> mplate-for-simple-weather-obs-example.md#parsing-cell- > >> microsyntax > >>>>> > >>>>> > >>>>> ---- > >>>>> Ivan Herman, W3C > >>>>> Digital Publishing Activity Lead > >>>>> Home: http://www.w3.org/People/Ivan/ > >>>>> mobile: +31-641044153 > >>>>> GPG: 0x343F1A3D > >>>>> WebID: http://www.ivan-herman.net/foaf#me > >>> > >>> > >>> ---- > >>> Ivan Herman, W3C > >>> Digital Publishing Activity Lead > >>> Home: http://www.w3.org/People/Ivan/ > >>> mobile: +31-641044153 > >>> GPG: 0x343F1A3D > >>> WebID: http://www.ivan-herman.net/foaf#me > >>> > >>> > >>> > >>> > >>> > >> > > > > > > > ---- > Ivan Herman, W3C > Digital Publishing Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > GPG: 0x343F1A3D > WebID: http://www.ivan-herman.net/foaf#me > > > >
Received on Tuesday, 24 June 2014 14:25:08 UTC