- From: Alfredo Serafini <seralf@gmail.com>
- Date: Thu, 15 May 2014 09:27:30 +0200
- To: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
- Cc: Jeni Tennison <jeni@jenitennison.com>, CSV on the Web Working Group <public-csv-wg@w3.org>, Andy Seaborne <andy@apache.org>
- Message-ID: <CADawF4MqwP3Mk5uW0q_onijj46AxL_as4mCYxBdRwh7s5i6fZQ@mail.gmail.com>
Hi mustache is a good solution indeed for generating CSV formats... for other format is useful but sometimes could be counterproductive. For example for generating XML based on templates a good option is also Thymeleaf. What is interesting about thymleaf is the concept of "natural templates": it could be great if it was possible to produce a similar approach to other formats Alfredo 2014-05-15 9:22 GMT+02:00 Christopher Gutteridge <cjg@ecs.soton.ac.uk>: > Hi. What we've done is use XSLT, which is mostly suitable for XML output, > but we were focusing on RDF output. I wrote a bit of software called > Grinder (never name a tool without doing a bit of googling, eh?) which > takes tabular data with a heading row and outputs XML suitable for passing > to XSLT. > eg. > colour, age, top speed, id > blue,10,100,1 > green,2,80,2 > > becomes > <rows> > <row> > <colour>blue</colour> > <age>10</age> > <topSpeed>100</topSpeed> > <id>1</id> > </row> > <row> > <colour>green</colour> > <age>2</age> > <topSpeed>80</topSpeed> > <id>2</id> > </row> > </rows> > > We found a number of cases where it was useful to add more processing; > - multiple values in a field, separated with a delineator -- this is quite > common in tabular data and allows more complex data without using a > separate table eg. > building,occupants > 1,"chemistry,phyics" > 2, physics > - skipping blank leading rows and columns > - carrying a value to the next row if the cell below is blank. Some > reporting software does things that only output an ID on the first row of a > block. > - various ways to clean up the values that were tricky in XSLT; md5, sha1, > camelcaps (make it suitable for being part of a URI) > > For a complex example see http://data.southampton.ac.uk/ > dumps/catering/2014-05-15/catering.cfg and http://data.southampton.ac.uk/ > dumps/catering/2014-05-15/openorg-pos.xsl > More details here: https://github.com/cgutteridge/Grinder/blob/ > master/bin/grinder#L91 > > I am not recommending XSLT, but we've solved a bunch of real world > CSV=>RDF cases so it may give you some ideas. It does have the merit of > being a standard, at least. > > What was most missed was a regular expression search & replace which would > have massively reduced the complexity. > > I recently did a project templating using moustache and we had to keep > tweaking our data structures to make it understand them, which was not > ideal. > > > On 14/05/2014 19:15, Jeni Tennison wrote: > >> Thanks Andy, >> >> I think it makes a lot of sense to have a general purpose template for >> mapping CSV to other formats (eg YAML, HTML). Open Refine does something >> similar as described here [1], which enables you to define: >> >> * a prefix >> * a template for each row >> * a row separator >> * a suffix >> >> What about using an existing templating system such as Mustache [2] which >> has the advantage of being implemented across lots of programming >> languages? Then you only have to define how the variables that get passed >> into the template get set up, not the syntax. (I’m not fixated on Mustache >> — I’d much prefer something more standard — it’s just that I’d really >> prefer not to have this Working Group invent a new syntax for templates.) >> >> I have three areas of concerns which mostly relate to the limited >> flexibility that something like Mustache gives you: >> >> 1. In all the real-life conversions I’ve ever done I’ve always ended up >> needing conditional statements of some sort. Which means having some kind >> of logical statements, which means adopting a particular programming >> language to express them in. >> >> 2. In all the real-life conversions I’ve ever done I’ve always ended up >> needing to process individual values in some way (ie some level of string >> parsing), which means defining functions. >> >> 3. In all the real-life conversions I’ve ever done that have involved >> text-based templating languages that need to produce something with a >> defined structured syntax I’ve always gotten it wrong and produced >> non-well-formed/valid output. >> >> All of which means that while I’m sure that templating is a useful thing >> to provide for general-purpose conversions, I still think there’s a need >> for more general purpose languages to “bug out” to. And I’m not 100% >> convinced (but we’ll only see by doing) that it will be possible to define >> useful conversions to other formats using templates. For example, just >> naming things like elements/attributes in XML and things like properties in >> JSON will require different approaches, I think, that it will be hard to >> express in generic templates. >> >> Cheers, >> >> Jeni >> >> [1] https://github.com/OpenRefine/OpenRefine/wiki/Exporters >> [2] http://mustache.github.io/ >> >> ------------------------------------------------------ >> From: Andy Seaborne andy@apache.org >> Reply: Andy Seaborne andy@apache.org >> Date: 14 May 2014 at 18:16:10 >> To: CSV on the Web Working Group public-csv-wg@w3.org >> Subject: Template as mechanism for CSV conversion. >> >> (from the telecon - JeniT asked for this to be made more visible on the >>> list) >>> Gregg has suggested that if all the conversions are based around the >>> template mechanism, then there could be one conversions document for all >>> of RDF, JSON and XML. >>> That makes sense to me although I also think that someone arrives at >>> the >>> doc wanting, say, the details of JSON conversion, having them all in one >>> place makes for a less focused document. >>> e.g. RDF: >>> http://w3c.github.io/csvw/csv2rdf/#graph-template >>> The templating mechanism is text-based and does not require parsing of >>> some variant of the output syntax ("variant" because of the need for >>> template slots). A processor may provide additional validation of the >>> output but, at a minimum, it can generate output just by text processing >>> (and potentially get illegal syntax due to the lightweight nature of the >>> process). >>> A starting point for templates is URI Templates >>> http://tools.ietf.org/html/rfc6570 >>> although there needs to be escaping per syntax support. >>> (*nix) Shell parameter expansion is a similar mechanism. >>> http://www.gnu.org/software/bash/manual/bashref.html# >>> Shell-Parameter-Expansion >>> (not the array bits) >>> ${parameter/pattern/string} is a regex replace, for example. >>> Andy >>> >>> >> -- >> Jeni Tennison >> http://www.jenitennison.com/ >> >> > -- > Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg > > University of Southampton Open Data Service: > http://data.southampton.ac.uk/ > You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/ > webteam/ > > >
Received on Thursday, 15 May 2014 07:27:57 UTC