- From: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
- Date: Thu, 15 May 2014 08:58:18 +0100
- To: Alfredo Serafini <seralf@gmail.com>
- CC: Jeni Tennison <jeni@jenitennison.com>, CSV on the Web Working Group <public-csv-wg@w3.org>, Andy Seaborne <andy@apache.org>
- Message-ID: <EMEW3|c7fe857d2c0de85d10d2a149b7115dfdq4E8wU03cjg|ecs.soton.ac.uk|5374739A.8080>
What is notable about all of these systems is that they don't accept
tabular data as an input (that I know of).
It might be useful to have a standard mapping from tabular data to JSON
and XML suitable for feeding templating engines, but columns indexed by
heading is a very different beast than columns indexed by column number.
(apologies if I have come late and there's a lot of backstory to the
group; if there's a document I should read before commenting futher,
that's cool, just URL-me)
On 15/05/2014 08:27, Alfredo Serafini wrote:
> Hi
> mustache is a good solution indeed for generating CSV formats... for
> other format is useful but sometimes could be counterproductive. For
> example for generating XML based on templates a good option is also
> Thymeleaf.
>
> What is interesting about thymleaf is the concept of "natural
> templates": it could be great if it was possible to produce a similar
> approach to other formats
>
>
> Alfredo
>
>
> 2014-05-15 9:22 GMT+02:00 Christopher Gutteridge <cjg@ecs.soton.ac.uk
> <mailto:cjg@ecs.soton.ac.uk>>:
>
> Hi. What we've done is use XSLT, which is mostly suitable for XML
> output, but we were focusing on RDF output. I wrote a bit of
> software called Grinder (never name a tool without doing a bit of
> googling, eh?) which takes tabular data with a heading row and
> outputs XML suitable for passing to XSLT.
> eg.
> colour, age, top speed, id
> blue,10,100,1
> green,2,80,2
>
> becomes
> <rows>
> <row>
> <colour>blue</colour>
> <age>10</age>
> <topSpeed>100</topSpeed>
> <id>1</id>
> </row>
> <row>
> <colour>green</colour>
> <age>2</age>
> <topSpeed>80</topSpeed>
> <id>2</id>
> </row>
> </rows>
>
> We found a number of cases where it was useful to add more processing;
> - multiple values in a field, separated with a delineator -- this
> is quite common in tabular data and allows more complex data
> without using a separate table eg.
> building,occupants
> 1,"chemistry,phyics"
> 2, physics
> - skipping blank leading rows and columns
> - carrying a value to the next row if the cell below is blank.
> Some reporting software does things that only output an ID on the
> first row of a block.
> - various ways to clean up the values that were tricky in XSLT;
> md5, sha1, camelcaps (make it suitable for being part of a URI)
>
> For a complex example see
> http://data.southampton.ac.uk/dumps/catering/2014-05-15/catering.cfg
> and
> http://data.southampton.ac.uk/dumps/catering/2014-05-15/openorg-pos.xsl
> More details here:
> https://github.com/cgutteridge/Grinder/blob/master/bin/grinder#L91
>
> I am not recommending XSLT, but we've solved a bunch of real world
> CSV=>RDF cases so it may give you some ideas. It does have the
> merit of being a standard, at least.
>
> What was most missed was a regular expression search & replace
> which would have massively reduced the complexity.
>
> I recently did a project templating using moustache and we had to
> keep tweaking our data structures to make it understand them,
> which was not ideal.
>
>
> On 14/05/2014 19:15, Jeni Tennison wrote:
>
> Thanks Andy,
>
> I think it makes a lot of sense to have a general purpose
> template for mapping CSV to other formats (eg YAML, HTML).
> Open Refine does something similar as described here [1],
> which enables you to define:
>
> * a prefix
> * a template for each row
> * a row separator
> * a suffix
>
> What about using an existing templating system such as
> Mustache [2] which has the advantage of being implemented
> across lots of programming languages? Then you only have to
> define how the variables that get passed into the template get
> set up, not the syntax. (I’m not fixated on Mustache — I’d
> much prefer something more standard — it’s just that I’d
> really prefer not to have this Working Group invent a new
> syntax for templates.)
>
> I have three areas of concerns which mostly relate to the
> limited flexibility that something like Mustache gives you:
>
> 1. In all the real-life conversions I’ve ever done I’ve always
> ended up needing conditional statements of some sort. Which
> means having some kind of logical statements, which means
> adopting a particular programming language to express them in.
>
> 2. In all the real-life conversions I’ve ever done I’ve always
> ended up needing to process individual values in some way (ie
> some level of string parsing), which means defining functions.
>
> 3. In all the real-life conversions I’ve ever done that have
> involved text-based templating languages that need to produce
> something with a defined structured syntax I’ve always gotten
> it wrong and produced non-well-formed/valid output.
>
> All of which means that while I’m sure that templating is a
> useful thing to provide for general-purpose conversions, I
> still think there’s a need for more general purpose languages
> to “bug out” to. And I’m not 100% convinced (but we’ll only
> see by doing) that it will be possible to define useful
> conversions to other formats using templates. For example,
> just naming things like elements/attributes in XML and things
> like properties in JSON will require different approaches, I
> think, that it will be hard to express in generic templates.
>
> Cheers,
>
> Jeni
>
> [1] https://github.com/OpenRefine/OpenRefine/wiki/Exporters
> [2] http://mustache.github.io/
>
> ------------------------------------------------------
> From: Andy Seaborne andy@apache.org <mailto:andy@apache.org>
> Reply: Andy Seaborne andy@apache.org <mailto:andy@apache.org>
> Date: 14 May 2014 at 18:16:10
> To: CSV on the Web Working Group public-csv-wg@w3.org
> <mailto:public-csv-wg@w3.org>
> Subject: Template as mechanism for CSV conversion.
>
> (from the telecon - JeniT asked for this to be made more
> visible on the
> list)
> Gregg has suggested that if all the conversions are
> based around the
> template mechanism, then there could be one conversions
> document for all
> of RDF, JSON and XML.
> That makes sense to me although I also think that
> someone arrives at the
> doc wanting, say, the details of JSON conversion, having
> them all in one
> place makes for a less focused document.
> e.g. RDF:
> http://w3c.github.io/csvw/csv2rdf/#graph-template
> The templating mechanism is text-based and does not
> require parsing of
> some variant of the output syntax ("variant" because of
> the need for
> template slots). A processor may provide additional
> validation of the
> output but, at a minimum, it can generate output just by
> text processing
> (and potentially get illegal syntax due to the lightweight
> nature of the
> process).
> A starting point for templates is URI Templates
> http://tools.ietf.org/html/rfc6570
> although there needs to be escaping per syntax support.
> (*nix) Shell parameter expansion is a similar mechanism.
> http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion
> (not the array bits)
> ${parameter/pattern/string} is a regex replace, for example.
> Andy
>
> --
> Jeni Tennison
> http://www.jenitennison.com/
>
>
> --
> Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg
>
> University of Southampton Open Data Service:
> http://data.southampton.ac.uk/
> You should read the ECS Web Team blog:
> http://blogs.ecs.soton.ac.uk/webteam/
>
>
>
--
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg
University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/
Received on Thursday, 15 May 2014 07:59:26 UTC