Re: Reflection on the special telco of CSVW

On 10 September 2014 08:51, Ivan Herman <ivan@w3.org> wrote:
> Guys,
>
> reflecting on Jeni's mail...
>
> I think there is a fundamental decision to make on the possible CSV->* approaches; at the moment, there is a feeling of a lack of direction in the group that is beginning to bite. It was discussed at the telco on the 3rd September, and there was a feeling that a clear direction has to be set soon, otherwise we will get into trouble.
>
> We had quite some discussions the last few months which was actually useful insofar as we explored one approach, namely the usage of some sort of a templating language. The writeup I had done, that was discussed on the call ([1]) attempted to give a reasonable summary of where we are. I think, without going into the details, that we may have two (maybe three, see below) different general alternatives to choose from, and it would be important that we would try to set a clear direction for the group. In what follows I try to summarize the alternatives; I would think that we should to discuss it among ourselves, get a consensus, and propose the result to the group as a way to go forward.

Thanks for this, Ivan. A couple of small points interjected below.

> ----
>
> *Alternative1:* define a suitable template language. A general outline of what that would mean, based on earlier discussions, is in [1]. (There are, obviously, possible design variants within this alternative, but let us try not to go into the details right now.)
>
> *Pro1:* it seems that what is in [1] covers a reasonable percentage of our use cases directly; Jeremy's exploration and his knowledge of the use cases seem to indicate that.
>
> *Pro2:* it is one language to rule them all. Ie, the same mechanism/template language can be used to generate JSON, XML, Turtle, or any other syntax.
>
> *Con1:* It became obvious that the template language may easily become complicated. The introduction of if-then-else, variables, etc, as well as other entries like {{#repeat}} make the structure far from simple. The precise specification of the language becomes more demanding, and the implementation is not obvious either (ie, it is not a simple regexp-like transformation any more). Do we have the energy and time to go there?
>
> *Con2:* How to justify W3C defining a new language (with all that it requires in terms of testing, conformance, etc)? We already have different transformation languages (defined at W3C or elsewhere) for different syntaxes (XSLT, SPARQL, RIF, SPIN, XPROC, you-name-it); does the community really need a new one? Note that we cannot simply rely on, say, Mustache[2]; the latter can change at any time, which does not work for a standard. Ie, we have to provide a fix specification. Is the community really ready for Yet Another Transformation Language Standard from W3C?

I'd suggest a requirement here: if the group does come up with such a
template language, it should be deployed in a manner that doesn't
privilege it over potential successors/rivals. For example, a
mime-typed reference from a JSON-LD metadata file could point to such
a template. But it could also point to other approaches to the
problem. In other words, provide an extension point for a templating
language but being neutral about what that templating language is.

> -----
>
> *Alternative2:* define a simple, procedural mapping from CSV+Metadata to XML/JSON/RDF, much like the Direct Mapping for RDB; plus have a metadata entry that may refer to a procedural engines to transform the result for different formats. Ie, refer to an XSLT script, a Javascript callback function, to a SPIN specification, whatever. The standard would not define the engine formats, just the 'hook' in the metadata.
>
> *Pro1:* The core conversion can be described in a procedural specification once and for all. An attempt has been made for RDF[3,4], it is fairly straightforward, the same structure can be defined for JSON and XML as well. The implementation is also (probably) easy.
>
> *Pro2:* The simple case is covered without any further step for the user (in contrast to Alternative1/Con3)
>
> *Con2:* Although it may cover lots of simple use cases, it clearly falls short for slightly more complex ones. Put it another way, the end user community will have to settle on some other transformations in a large percentage of the cases, and that may be seen as a shortcoming.
>
> -----
>
> Actually, after writing this mail, I just realized that there is also an
>
> *Alternative2.5:* define a *simple* template language *without* any if-then-else structure, any regexp based variables; essentially stopping at [5], and add the hooks for further processing just like in Alternative2. It is a kind of a mixture of the two previous alternatives (the template may easily cover the procedural mapping).

It's worth noting that people seem to get quite a lot of functionality
out of Mustache, even though it proclaims itself to be "logic-less":

"We call it "logic-less" because there are no if statements, else
clauses, or for loops. Instead there are only tags. Some tags are
replaced with a value, some nothing, and others a series of values.
This document explains the different types of Mustache tags."
(http://mustache.github.io/mustache.5.html)

In fact Mustache does have a few pieces of minimal logic (e.g. for
missing values). Perhaps enough for us?

Also relevant - we have had a little discussion here on use of regex
patterns to 'break up' complex CSV cell values into constituent
sub-fields. I would take as a working hypothesis that we could address
several use cases via an inspired-by-Mustache language, whose input
wasn't raw CSV cells but regex-expanded cell sub-structure. By
providing richer input, we can take some of the work off the shoulders
of the core templating language.

My feeling from some similar hacking to Ivan's is that re-using or
tweaking existing template languages may be feasible. Even if speccing
is a daunting task. I'd like us to  verify whether we think a useful
set of our use cases for mappings can be addressed via existing
opensource libs (or modest hacks to them, for some debatable sense of
'modest'). If not, I think we have to seriously consider postponing
template language work.

Dan

> *Pro1:* Simple cases can be done easily, albeit getting a bit more structure in the output than just the core mapping, and it is also easier to implement and understand. It still provides somewhat more flexibility than Alternative2.
>
> *Pro2:* Same as Alternative1/Pro2.
>
> *Con1:* Same as Alternative2/Con2.
>
> *Con2:* Same as Alternative1/Con3.
>
> -----
>
> That is where we are imho...
>
> I will send out a separate mail that outlines a simple version of Alternative2.5 which can be implemented on the top of an engine like Mustache; as an exercise (and as a way of learning Javascript...) I have implemented this on top of jQuery. See separate mail.
>
> Ivan
>
> P.S. At this moment, I am tempted to vote for Alternative 2.5. But not 100% sure yet...
>
>
> [1] https://www.w3.org/2013/csvw/wiki/CSVTemplating_status
> [2] https://mustache.github.io/
> [3] https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
> [4] http://htmlpreview.github.io/?https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
> [5] https://www.w3.org/2013/csvw/wiki/CSVTemplating_status#Simple_example_and_template
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>

Received on Wednesday, 10 September 2014 08:19:12 UTC