Re: Template as mechanism for CSV conversion. from Christopher Gutteridge on 2014-05-15 (public-csv-wg@w3.org from May 2014)

From: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
Date: Thu, 15 May 2014 08:58:18 +0100
To: Alfredo Serafini <seralf@gmail.com>
CC: Jeni Tennison <jeni@jenitennison.com>, CSV on the Web Working Group <public-csv-wg@w3.org>, Andy Seaborne <andy@apache.org>
Message-ID: <EMEW3|c7fe857d2c0de85d10d2a149b7115dfdq4E8wU03cjg|ecs.soton.ac.uk|5374739A.8080>
What is notable about all of these systems is that they don't accept 
tabular data as an input (that I know of).

It might be useful to have a standard mapping from tabular data to JSON 
and XML suitable for feeding templating engines, but columns indexed by 
heading is a very different beast than columns indexed by column number.
(apologies if I have come late and there's a lot of backstory to the 
group; if there's a document I should read before commenting futher, 
that's cool, just URL-me)

On 15/05/2014 08:27, Alfredo Serafini wrote:
> Hi
> mustache is a good solution indeed for generating CSV formats... for 
> other format is useful but sometimes could be counterproductive. For 
> example for generating XML based on templates a good option is also 
> Thymeleaf.
>
> What is interesting about thymleaf is the concept of "natural 
> templates": it could be great if it was possible to produce a similar 
> approach to other formats
>
>
> Alfredo
>
>
> 2014-05-15 9:22 GMT+02:00 Christopher Gutteridge <cjg@ecs.soton.ac.uk 
> <mailto:cjg@ecs.soton.ac.uk>>:
>
>     Hi. What we've done is use XSLT, which is mostly suitable for XML
>     output, but we were focusing on RDF output. I wrote a bit of
>     software called Grinder (never name a tool without doing a bit of
>     googling, eh?) which takes tabular data with a heading row and
>     outputs XML suitable for passing to XSLT.
>     eg.
>     colour, age, top speed, id
>     blue,10,100,1
>     green,2,80,2
>
>     becomes
>     <rows>
>       <row>
>          <colour>blue</colour>
>          <age>10</age>
>          <topSpeed>100</topSpeed>
>          <id>1</id>
>       </row>
>       <row>
>          <colour>green</colour>
>          <age>2</age>
>          <topSpeed>80</topSpeed>
>          <id>2</id>
>       </row>
>     </rows>
>
>     We found a number of cases where it was useful to add more processing;
>     - multiple values in a field, separated with a delineator -- this
>     is quite common in tabular data and allows more complex data
>     without using a separate table eg.
>     building,occupants
>     1,"chemistry,phyics"
>     2, physics
>     - skipping blank leading rows and columns
>     - carrying a value to the next row if the cell below is blank.
>     Some reporting software does things that only output an ID on the
>     first row of a block.
>     - various ways to clean up the values that were tricky in XSLT;
>     md5, sha1, camelcaps (make it suitable for being part of a URI)
>
>     For a complex example see
>     http://data.southampton.ac.uk/dumps/catering/2014-05-15/catering.cfg
>     and
>     http://data.southampton.ac.uk/dumps/catering/2014-05-15/openorg-pos.xsl
>     More details here:
>     https://github.com/cgutteridge/Grinder/blob/master/bin/grinder#L91
>
>     I am not recommending XSLT, but we've solved a bunch of real world
>     CSV=>RDF cases so it may give you some ideas. It does have the
>     merit of being a standard, at least.
>
>     What was most missed was a regular expression search & replace
>     which would have massively reduced the complexity.
>
>     I recently did a project templating using moustache and we had to
>     keep tweaking our data structures to make it understand them,
>     which was not ideal.
>
>
>     On 14/05/2014 19:15, Jeni Tennison wrote:
>
>         Thanks Andy,
>
>         I think it makes a lot of sense to have a general purpose
>         template for mapping CSV to other formats (eg YAML, HTML).
>         Open Refine does something similar as described here [1],
>         which enables you to define:
>
>            * a prefix
>            * a template for each row
>            * a row separator
>            * a suffix
>
>         What about using an existing templating system such as
>         Mustache [2] which has the advantage of being implemented
>         across lots of programming languages? Then you only have to
>         define how the variables that get passed into the template get
>         set up, not the syntax. (I’m not fixated on Mustache — I’d
>         much prefer something more standard — it’s just that I’d
>         really prefer not to have this Working Group invent a new
>         syntax for templates.)
>
>         I have three areas of concerns which mostly relate to the
>         limited flexibility that something like Mustache gives you:
>
>         1. In all the real-life conversions I’ve ever done I’ve always
>         ended up needing conditional statements of some sort. Which
>         means having some kind of logical statements, which means
>         adopting a particular programming language to express them in.
>
>         2. In all the real-life conversions I’ve ever done I’ve always
>         ended up needing to process individual values in some way (ie
>         some level of string parsing), which means defining functions.
>
>         3. In all the real-life conversions I’ve ever done that have
>         involved text-based templating languages that need to produce
>         something with a defined structured syntax I’ve always gotten
>         it wrong and produced non-well-formed/valid output.
>
>         All of which means that while I’m sure that templating is a
>         useful thing to provide for general-purpose conversions, I
>         still think there’s a need for more general purpose languages
>         to “bug out” to. And I’m not 100% convinced (but we’ll only
>         see by doing) that it will be possible to define useful
>         conversions to other formats using templates. For example,
>         just naming things like elements/attributes in XML and things
>         like properties in JSON will require different approaches, I
>         think, that it will be hard to express in generic templates.
>
>         Cheers,
>
>         Jeni
>
>         [1] https://github.com/OpenRefine/OpenRefine/wiki/Exporters
>         [2] http://mustache.github.io/
>
>         ------------------------------------------------------
>         From: Andy Seaborne andy@apache.org <mailto:andy@apache.org>
>         Reply: Andy Seaborne andy@apache.org <mailto:andy@apache.org>
>         Date: 14 May 2014 at 18:16:10
>         To: CSV on the Web Working Group public-csv-wg@w3.org
>         <mailto:public-csv-wg@w3.org>
>         Subject:  Template as mechanism for CSV conversion.
>
>             (from the telecon - JeniT asked for this to be made more
>             visible on the
>             list)
>               Gregg has suggested that if all the conversions are
>             based around the
>             template mechanism, then there could be one conversions
>             document for all
>             of RDF, JSON and XML.
>               That makes sense to me although I also think that
>             someone arrives at the
>             doc wanting, say, the details of JSON conversion, having
>             them all in one
>             place makes for a less focused document.
>               e.g. RDF:
>             http://w3c.github.io/csvw/csv2rdf/#graph-template
>               The templating mechanism is text-based and does not
>             require parsing of
>             some variant of the output syntax ("variant" because of
>             the need for
>             template slots). A processor may provide additional
>             validation of the
>             output but, at a minimum, it can generate output just by
>             text processing
>             (and potentially get illegal syntax due to the lightweight
>             nature of the
>             process).
>               A starting point for templates is URI Templates
>             http://tools.ietf.org/html/rfc6570
>               although there needs to be escaping per syntax support.
>               (*nix) Shell parameter expansion is a similar mechanism.
>             http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion
>             (not the array bits)
>               ${parameter/pattern/string} is a regex replace, for example.
>               Andy
>
>         --
>         Jeni Tennison
>         http://www.jenitennison.com/
>
>
>     -- 
>     Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg
>
>     University of Southampton Open Data Service:
>     http://data.southampton.ac.uk/
>     You should read the ECS Web Team blog:
>     http://blogs.ecs.soton.ac.uk/webteam/
>
>
>

-- 
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/
Received on Thursday, 15 May 2014 07:59:26 UTC