Re: Reflection on the special telco of CSVW from Ivan Herman on 2014-09-10 (public-csv-wg@w3.org from September 2014)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 10 Sep 2014 10:40:30 +0200
To: Dan Brickley <danbri@google.com>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <98E5864A-128A-4623-973D-7DA119F5E03B@w3.org>
On 10 Sep 2014, at 10:18 , Dan Brickley <danbri@google.com> wrote:

> On 10 September 2014 08:51, Ivan Herman <ivan@w3.org> wrote:
>> Guys,
>> 
>> reflecting on Jeni's mail...
>> 
>> I think there is a fundamental decision to make on the possible CSV->* approaches; at the moment, there is a feeling of a lack of direction in the group that is beginning to bite. It was discussed at the telco on the 3rd September, and there was a feeling that a clear direction has to be set soon, otherwise we will get into trouble.
>> 
>> We had quite some discussions the last few months which was actually useful insofar as we explored one approach, namely the usage of some sort of a templating language. The writeup I had done, that was discussed on the call ([1]) attempted to give a reasonable summary of where we are. I think, without going into the details, that we may have two (maybe three, see below) different general alternatives to choose from, and it would be important that we would try to set a clear direction for the group. In what follows I try to summarize the alternatives; I would think that we should to discuss it among ourselves, get a consensus, and propose the result to the group as a way to go forward.
> 
> Thanks for this, Ivan. A couple of small points interjected below.
> 
>> ----
>> 
>> *Alternative1:* define a suitable template language. A general outline of what that would mean, based on earlier discussions, is in [1]. (There are, obviously, possible design variants within this alternative, but let us try not to go into the details right now.)
>> 
>> *Pro1:* it seems that what is in [1] covers a reasonable percentage of our use cases directly; Jeremy's exploration and his knowledge of the use cases seem to indicate that.
>> 
>> *Pro2:* it is one language to rule them all. Ie, the same mechanism/template language can be used to generate JSON, XML, Turtle, or any other syntax.
>> 
>> *Con1:* It became obvious that the template language may easily become complicated. The introduction of if-then-else, variables, etc, as well as other entries like {{#repeat}} make the structure far from simple. The precise specification of the language becomes more demanding, and the implementation is not obvious either (ie, it is not a simple regexp-like transformation any more). Do we have the energy and time to go there?
>> 
>> *Con2:* How to justify W3C defining a new language (with all that it requires in terms of testing, conformance, etc)? We already have different transformation languages (defined at W3C or elsewhere) for different syntaxes (XSLT, SPARQL, RIF, SPIN, XPROC, you-name-it); does the community really need a new one? Note that we cannot simply rely on, say, Mustache[2]; the latter can change at any time, which does not work for a standard. Ie, we have to provide a fix specification. Is the community really ready for Yet Another Transformation Language Standard from W3C?
> 
> I'd suggest a requirement here: if the group does come up with such a
> template language, it should be deployed in a manner that doesn't
> privilege it over potential successors/rivals. For example, a
> mime-typed reference from a JSON-LD metadata file could point to such
> a template. But it could also point to other approaches to the
> problem. In other words, provide an extension point for a templating
> language but being neutral about what that templating language is.

Yes, even if we engage into Alternative 1, we should have extension points so that people could defer to other means. The question is whether we should get into the definition of a full template language in the first place.


> 
>> -----
>> 
>> *Alternative2:* define a simple, procedural mapping from CSV+Metadata to XML/JSON/RDF, much like the Direct Mapping for RDB; plus have a metadata entry that may refer to a procedural engines to transform the result for different formats. Ie, refer to an XSLT script, a Javascript callback function, to a SPIN specification, whatever. The standard would not define the engine formats, just the 'hook' in the metadata.
>> 
>> *Pro1:* The core conversion can be described in a procedural specification once and for all. An attempt has been made for RDF[3,4], it is fairly straightforward, the same structure can be defined for JSON and XML as well. The implementation is also (probably) easy.
>> 
>> *Pro2:* The simple case is covered without any further step for the user (in contrast to Alternative1/Con3)
>> 
>> *Con2:* Although it may cover lots of simple use cases, it clearly falls short for slightly more complex ones. Put it another way, the end user community will have to settle on some other transformations in a large percentage of the cases, and that may be seen as a shortcoming.
>> 
>> -----
>> 
>> Actually, after writing this mail, I just realized that there is also an
>> 
>> *Alternative2.5:* define a *simple* template language *without* any if-then-else structure, any regexp based variables; essentially stopping at [5], and add the hooks for further processing just like in Alternative2. It is a kind of a mixture of the two previous alternatives (the template may easily cover the procedural mapping).
> 
> It's worth noting that people seem to get quite a lot of functionality
> out of Mustache, even though it proclaims itself to be "logic-less":
> 
> "We call it "logic-less" because there are no if statements, else
> clauses, or for loops. Instead there are only tags. Some tags are
> replaced with a value, some nothing, and others a series of values.
> This document explains the different types of Mustache tags."
> (http://mustache.github.io/mustache.5.html)
> 
> In fact Mustache does have a few pieces of minimal logic (e.g. for
> missing values). Perhaps enough for us?
> 
> Also relevant - we have had a little discussion here on use of regex
> patterns to 'break up' complex CSV cell values into constituent
> sub-fields. I would take as a working hypothesis that we could address
> several use cases via an inspired-by-Mustache language, whose input
> wasn't raw CSV cells but regex-expanded cell sub-structure. By
> providing richer input, we can take some of the work off the shoulders
> of the core templating language.
> 

The question is also how we interpret the use cases. For the time being, we looked at the use cases in terms of what features are necessary. However, I do not have a feel on what the 'weight' of a specific requirement is, ie, what is the distribution of simple vs. complex CSV data on the Web. If an overarching majority of the files have a simple structure for which the simple version of template is enough (a bit like what I hacked), then this is what we should do...


> My feeling from some similar hacking to Ivan's is that re-using or
> tweaking existing template languages may be feasible. Even if speccing
> is a daunting task. I'd like us to  verify whether we think a useful
> set of our use cases for mappings can be addressed via existing
> opensource libs (or modest hacks to them, for some debatable sense of
> 'modest'). If not, I think we have to seriously consider postponing
> template language work.

+1. I may not have emphasized in my mail that the implementation I use is simply using Mustache underneath, and that on a very simple level, essentially a simple {{name}}->{{value}} replacement and nothing else. (To increase efficiency, the latest version takes care of the {{#row}}, though that could also be based on Mustache though would be inefficient for large files.)

Ivan

> 
> Dan
> 
>> *Pro1:* Simple cases can be done easily, albeit getting a bit more structure in the output than just the core mapping, and it is also easier to implement and understand. It still provides somewhat more flexibility than Alternative2.
>> 
>> *Pro2:* Same as Alternative1/Pro2.
>> 
>> *Con1:* Same as Alternative2/Con2.
>> 
>> *Con2:* Same as Alternative1/Con3.
>> 
>> -----
>> 
>> That is where we are imho...
>> 
>> I will send out a separate mail that outlines a simple version of Alternative2.5 which can be implemented on the top of an engine like Mustache; as an exercise (and as a way of learning Javascript...) I have implemented this on top of jQuery. See separate mail.
>> 
>> Ivan
>> 
>> P.S. At this moment, I am tempted to vote for Alternative 2.5. But not 100% sure yet...
>> 
>> 
>> [1] https://www.w3.org/2013/csvw/wiki/CSVTemplating_status
>> [2] https://mustache.github.io/
>> [3] https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
>> [4] http://htmlpreview.github.io/?https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
>> [5] https://www.w3.org/2013/csvw/wiki/CSVTemplating_status#Simple_example_and_template
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> WebID: http://www.ivan-herman.net/foaf#me


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me
Received on Wednesday, 10 September 2014 08:41:09 UTC