Re: Template as mechanism for CSV conversion. from Alfredo Serafini on 2014-05-15 (public-csv-wg@w3.org from May 2014)

From: Alfredo Serafini <seralf@gmail.com>
Date: Thu, 15 May 2014 09:27:30 +0200
To: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
Cc: Jeni Tennison <jeni@jenitennison.com>, CSV on the Web Working Group <public-csv-wg@w3.org>, Andy Seaborne <andy@apache.org>
Message-ID: <CADawF4MqwP3Mk5uW0q_onijj46AxL_as4mCYxBdRwh7s5i6fZQ@mail.gmail.com>
Hi
mustache is a good solution indeed for generating CSV formats... for other
format is useful but sometimes could be counterproductive. For example for
generating XML based on templates a good option is also Thymeleaf.

What is interesting about thymleaf is the concept of "natural templates":
it could be great if it was possible to produce a similar approach to other
formats


Alfredo


2014-05-15 9:22 GMT+02:00 Christopher Gutteridge <cjg@ecs.soton.ac.uk>:

> Hi. What we've done is use XSLT, which is mostly suitable for XML output,
> but we were focusing on RDF output. I wrote a bit of software called
> Grinder (never name a tool without doing a bit of googling, eh?) which
> takes tabular data with a heading row and outputs XML suitable for passing
> to XSLT.
> eg.
> colour, age, top speed, id
> blue,10,100,1
> green,2,80,2
>
> becomes
> <rows>
>   <row>
>      <colour>blue</colour>
>      <age>10</age>
>      <topSpeed>100</topSpeed>
>      <id>1</id>
>   </row>
>   <row>
>      <colour>green</colour>
>      <age>2</age>
>      <topSpeed>80</topSpeed>
>      <id>2</id>
>   </row>
> </rows>
>
> We found a number of cases where it was useful to add more processing;
> - multiple values in a field, separated with a delineator -- this is quite
> common in tabular data and allows more complex data without using a
> separate table eg.
> building,occupants
> 1,"chemistry,phyics"
> 2, physics
> - skipping blank leading rows and columns
> - carrying a value to the next row if the cell below is blank. Some
> reporting software does things that only output an ID on the first row of a
> block.
> - various ways to clean up the values that were tricky in XSLT; md5, sha1,
> camelcaps (make it suitable for being part of a URI)
>
> For a complex example see http://data.southampton.ac.uk/
> dumps/catering/2014-05-15/catering.cfg and http://data.southampton.ac.uk/
> dumps/catering/2014-05-15/openorg-pos.xsl
> More details here: https://github.com/cgutteridge/Grinder/blob/
> master/bin/grinder#L91
>
> I am not recommending XSLT, but we've solved a bunch of real world
> CSV=>RDF cases so it may give you some ideas. It does have the merit of
> being a standard, at least.
>
> What was most missed was a regular expression search & replace which would
> have massively reduced the complexity.
>
> I recently did a project templating using moustache and we had to keep
> tweaking our data structures to make it understand them, which was not
> ideal.
>
>
> On 14/05/2014 19:15, Jeni Tennison wrote:
>
>> Thanks Andy,
>>
>> I think it makes a lot of sense to have a general purpose template for
>> mapping CSV to other formats (eg YAML, HTML). Open Refine does something
>> similar as described here [1], which enables you to define:
>>
>>    * a prefix
>>    * a template for each row
>>    * a row separator
>>    * a suffix
>>
>> What about using an existing templating system such as Mustache [2] which
>> has the advantage of being implemented across lots of programming
>> languages? Then you only have to define how the variables that get passed
>> into the template get set up, not the syntax. (I’m not fixated on Mustache
>> — I’d much prefer something more standard — it’s just that I’d really
>> prefer not to have this Working Group invent a new syntax for templates.)
>>
>> I have three areas of concerns which mostly relate to the limited
>> flexibility that something like Mustache gives you:
>>
>> 1. In all the real-life conversions I’ve ever done I’ve always ended up
>> needing conditional statements of some sort. Which means having some kind
>> of logical statements, which means adopting a particular programming
>> language to express them in.
>>
>> 2. In all the real-life conversions I’ve ever done I’ve always ended up
>> needing to process individual values in some way (ie some level of string
>> parsing), which means defining functions.
>>
>> 3. In all the real-life conversions I’ve ever done that have involved
>> text-based templating languages that need to produce something with a
>> defined structured syntax I’ve always gotten it wrong and produced
>> non-well-formed/valid output.
>>
>> All of which means that while I’m sure that templating is a useful thing
>> to provide for general-purpose conversions, I still think there’s a need
>> for more general purpose languages to “bug out” to. And I’m not 100%
>> convinced (but we’ll only see by doing) that it will be possible to define
>> useful conversions to other formats using templates. For example, just
>> naming things like elements/attributes in XML and things like properties in
>> JSON will require different approaches, I think, that it will be hard to
>> express in generic templates.
>>
>> Cheers,
>>
>> Jeni
>>
>> [1] https://github.com/OpenRefine/OpenRefine/wiki/Exporters
>> [2] http://mustache.github.io/
>>
>> ------------------------------------------------------
>> From: Andy Seaborne andy@apache.org
>> Reply: Andy Seaborne andy@apache.org
>> Date: 14 May 2014 at 18:16:10
>> To: CSV on the Web Working Group public-csv-wg@w3.org
>> Subject:  Template as mechanism for CSV conversion.
>>
>>  (from the telecon - JeniT asked for this to be made more visible on the
>>> list)
>>>   Gregg has suggested that if all the conversions are based around the
>>> template mechanism, then there could be one conversions document for all
>>> of RDF, JSON and XML.
>>>   That makes sense to me although I also think that someone arrives at
>>> the
>>> doc wanting, say, the details of JSON conversion, having them all in one
>>> place makes for a less focused document.
>>>   e.g. RDF:
>>> http://w3c.github.io/csvw/csv2rdf/#graph-template
>>>   The templating mechanism is text-based and does not require parsing of
>>> some variant of the output syntax ("variant" because of the need for
>>> template slots). A processor may provide additional validation of the
>>> output but, at a minimum, it can generate output just by text processing
>>> (and potentially get illegal syntax due to the lightweight nature of the
>>> process).
>>>   A starting point for templates is URI Templates
>>>   http://tools.ietf.org/html/rfc6570
>>>   although there needs to be escaping per syntax support.
>>>   (*nix) Shell parameter expansion is a similar mechanism.
>>>   http://www.gnu.org/software/bash/manual/bashref.html#
>>> Shell-Parameter-Expansion
>>> (not the array bits)
>>>   ${parameter/pattern/string} is a regex replace, for example.
>>>   Andy
>>>
>>>
>> --
>> Jeni Tennison
>> http://www.jenitennison.com/
>>
>>
> --
> Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg
>
> University of Southampton Open Data Service:
> http://data.southampton.ac.uk/
> You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/
> webteam/
>
>
>
Received on Thursday, 15 May 2014 07:27:57 UTC