- From: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
- Date: Thu, 15 May 2014 08:58:18 +0100
- To: Alfredo Serafini <seralf@gmail.com>
- CC: Jeni Tennison <jeni@jenitennison.com>, CSV on the Web Working Group <public-csv-wg@w3.org>, Andy Seaborne <andy@apache.org>
- Message-ID: <EMEW3|c7fe857d2c0de85d10d2a149b7115dfdq4E8wU03cjg|ecs.soton.ac.uk|5374739A.8080>
What is notable about all of these systems is that they don't accept tabular data as an input (that I know of). It might be useful to have a standard mapping from tabular data to JSON and XML suitable for feeding templating engines, but columns indexed by heading is a very different beast than columns indexed by column number. (apologies if I have come late and there's a lot of backstory to the group; if there's a document I should read before commenting futher, that's cool, just URL-me) On 15/05/2014 08:27, Alfredo Serafini wrote: > Hi > mustache is a good solution indeed for generating CSV formats... for > other format is useful but sometimes could be counterproductive. For > example for generating XML based on templates a good option is also > Thymeleaf. > > What is interesting about thymleaf is the concept of "natural > templates": it could be great if it was possible to produce a similar > approach to other formats > > > Alfredo > > > 2014-05-15 9:22 GMT+02:00 Christopher Gutteridge <cjg@ecs.soton.ac.uk > <mailto:cjg@ecs.soton.ac.uk>>: > > Hi. What we've done is use XSLT, which is mostly suitable for XML > output, but we were focusing on RDF output. I wrote a bit of > software called Grinder (never name a tool without doing a bit of > googling, eh?) which takes tabular data with a heading row and > outputs XML suitable for passing to XSLT. > eg. > colour, age, top speed, id > blue,10,100,1 > green,2,80,2 > > becomes > <rows> > <row> > <colour>blue</colour> > <age>10</age> > <topSpeed>100</topSpeed> > <id>1</id> > </row> > <row> > <colour>green</colour> > <age>2</age> > <topSpeed>80</topSpeed> > <id>2</id> > </row> > </rows> > > We found a number of cases where it was useful to add more processing; > - multiple values in a field, separated with a delineator -- this > is quite common in tabular data and allows more complex data > without using a separate table eg. > building,occupants > 1,"chemistry,phyics" > 2, physics > - skipping blank leading rows and columns > - carrying a value to the next row if the cell below is blank. > Some reporting software does things that only output an ID on the > first row of a block. > - various ways to clean up the values that were tricky in XSLT; > md5, sha1, camelcaps (make it suitable for being part of a URI) > > For a complex example see > http://data.southampton.ac.uk/dumps/catering/2014-05-15/catering.cfg > and > http://data.southampton.ac.uk/dumps/catering/2014-05-15/openorg-pos.xsl > More details here: > https://github.com/cgutteridge/Grinder/blob/master/bin/grinder#L91 > > I am not recommending XSLT, but we've solved a bunch of real world > CSV=>RDF cases so it may give you some ideas. It does have the > merit of being a standard, at least. > > What was most missed was a regular expression search & replace > which would have massively reduced the complexity. > > I recently did a project templating using moustache and we had to > keep tweaking our data structures to make it understand them, > which was not ideal. > > > On 14/05/2014 19:15, Jeni Tennison wrote: > > Thanks Andy, > > I think it makes a lot of sense to have a general purpose > template for mapping CSV to other formats (eg YAML, HTML). > Open Refine does something similar as described here [1], > which enables you to define: > > * a prefix > * a template for each row > * a row separator > * a suffix > > What about using an existing templating system such as > Mustache [2] which has the advantage of being implemented > across lots of programming languages? Then you only have to > define how the variables that get passed into the template get > set up, not the syntax. (I’m not fixated on Mustache — I’d > much prefer something more standard — it’s just that I’d > really prefer not to have this Working Group invent a new > syntax for templates.) > > I have three areas of concerns which mostly relate to the > limited flexibility that something like Mustache gives you: > > 1. In all the real-life conversions I’ve ever done I’ve always > ended up needing conditional statements of some sort. Which > means having some kind of logical statements, which means > adopting a particular programming language to express them in. > > 2. In all the real-life conversions I’ve ever done I’ve always > ended up needing to process individual values in some way (ie > some level of string parsing), which means defining functions. > > 3. In all the real-life conversions I’ve ever done that have > involved text-based templating languages that need to produce > something with a defined structured syntax I’ve always gotten > it wrong and produced non-well-formed/valid output. > > All of which means that while I’m sure that templating is a > useful thing to provide for general-purpose conversions, I > still think there’s a need for more general purpose languages > to “bug out” to. And I’m not 100% convinced (but we’ll only > see by doing) that it will be possible to define useful > conversions to other formats using templates. For example, > just naming things like elements/attributes in XML and things > like properties in JSON will require different approaches, I > think, that it will be hard to express in generic templates. > > Cheers, > > Jeni > > [1] https://github.com/OpenRefine/OpenRefine/wiki/Exporters > [2] http://mustache.github.io/ > > ------------------------------------------------------ > From: Andy Seaborne andy@apache.org <mailto:andy@apache.org> > Reply: Andy Seaborne andy@apache.org <mailto:andy@apache.org> > Date: 14 May 2014 at 18:16:10 > To: CSV on the Web Working Group public-csv-wg@w3.org > <mailto:public-csv-wg@w3.org> > Subject: Template as mechanism for CSV conversion. > > (from the telecon - JeniT asked for this to be made more > visible on the > list) > Gregg has suggested that if all the conversions are > based around the > template mechanism, then there could be one conversions > document for all > of RDF, JSON and XML. > That makes sense to me although I also think that > someone arrives at the > doc wanting, say, the details of JSON conversion, having > them all in one > place makes for a less focused document. > e.g. RDF: > http://w3c.github.io/csvw/csv2rdf/#graph-template > The templating mechanism is text-based and does not > require parsing of > some variant of the output syntax ("variant" because of > the need for > template slots). A processor may provide additional > validation of the > output but, at a minimum, it can generate output just by > text processing > (and potentially get illegal syntax due to the lightweight > nature of the > process). > A starting point for templates is URI Templates > http://tools.ietf.org/html/rfc6570 > although there needs to be escaping per syntax support. > (*nix) Shell parameter expansion is a similar mechanism. > http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion > (not the array bits) > ${parameter/pattern/string} is a regex replace, for example. > Andy > > -- > Jeni Tennison > http://www.jenitennison.com/ > > > -- > Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg > > University of Southampton Open Data Service: > http://data.southampton.ac.uk/ > You should read the ECS Web Team blog: > http://blogs.ecs.soton.ac.uk/webteam/ > > > -- Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg University of Southampton Open Data Service: http://data.southampton.ac.uk/ You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/
Received on Thursday, 15 May 2014 07:59:26 UTC