Reflection on the special telco of CSVW

Guys,

reflecting on Jeni's mail...

I think there is a fundamental decision to make on the possible CSV->* approaches; at the moment, there is a feeling of a lack of direction in the group that is beginning to bite. It was discussed at the telco on the 3rd September, and there was a feeling that a clear direction has to be set soon, otherwise we will get into trouble.

We had quite some discussions the last few months which was actually useful insofar as we explored one approach, namely the usage of some sort of a templating language. The writeup I had done, that was discussed on the call ([1]) attempted to give a reasonable summary of where we are. I think, without going into the details, that we may have two (maybe three, see below) different general alternatives to choose from, and it would be important that we would try to set a clear direction for the group. In what follows I try to summarize the alternatives; I would think that we should to discuss it among ourselves, get a consensus, and propose the result to the group as a way to go forward.

----

*Alternative1:* define a suitable template language. A general outline of what that would mean, based on earlier discussions, is in [1]. (There are, obviously, possible design variants within this alternative, but let us try not to go into the details right now.)

*Pro1:* it seems that what is in [1] covers a reasonable percentage of our use cases directly; Jeremy's exploration and his knowledge of the use cases seem to indicate that. 

*Pro2:* it is one language to rule them all. Ie, the same mechanism/template language can be used to generate JSON, XML, Turtle, or any other syntax.

*Con1:* It became obvious that the template language may easily become complicated. The introduction of if-then-else, variables, etc, as well as other entries like {{#repeat}} make the structure far from simple. The precise specification of the language becomes more demanding, and the implementation is not obvious either (ie, it is not a simple regexp-like transformation any more). Do we have the energy and time to go there?

*Con2:* How to justify W3C defining a new language (with all that it requires in terms of testing, conformance, etc)? We already have different transformation languages (defined at W3C or elsewhere) for different syntaxes (XSLT, SPARQL, RIF, SPIN, XPROC, you-name-it); does the community really need a new one? Note that we cannot simply rely on, say, Mustache[2]; the latter can change at any time, which does not work for a standard. Ie, we have to provide a fix specification. Is the community really ready for Yet Another Transformation Language Standard from W3C?

-----

*Alternative2:* define a simple, procedural mapping from CSV+Metadata to XML/JSON/RDF, much like the Direct Mapping for RDB; plus have a metadata entry that may refer to a procedural engines to transform the result for different formats. Ie, refer to an XSLT script, a Javascript callback function, to a SPIN specification, whatever. The standard would not define the engine formats, just the 'hook' in the metadata.

*Pro1:* The core conversion can be described in a procedural specification once and for all. An attempt has been made for RDF[3,4], it is fairly straightforward, the same structure can be defined for JSON and XML as well. The implementation is also (probably) easy.

*Pro2:* The simple case is covered without any further step for the user (in contrast to Alternative1/Con3)

*Con2:* Although it may cover lots of simple use cases, it clearly falls short for slightly more complex ones. Put it another way, the end user community will have to settle on some other transformations in a large percentage of the cases, and that may be seen as a shortcoming.

-----

Actually, after writing this mail, I just realized that there is also an

*Alternative2.5:* define a *simple* template language *without* any if-then-else structure, any regexp based variables; essentially stopping at [5], and add the hooks for further processing just like in Alternative2. It is a kind of a mixture of the two previous alternatives (the template may easily cover the procedural mapping).

*Pro1:* Simple cases can be done easily, albeit getting a bit more structure in the output than just the core mapping, and it is also easier to implement and understand. It still provides somewhat more flexibility than Alternative2.

*Pro2:* Same as Alternative1/Pro2.

*Con1:* Same as Alternative2/Con2.

*Con2:* Same as Alternative1/Con3.

-----

That is where we are imho...

I will send out a separate mail that outlines a simple version of Alternative2.5 which can be implemented on the top of an engine like Mustache; as an exercise (and as a way of learning Javascript...) I have implemented this on top of jQuery. See separate mail. 

Ivan

P.S. At this moment, I am tempted to vote for Alternative 2.5. But not 100% sure yet...


[1] https://www.w3.org/2013/csvw/wiki/CSVTemplating_status
[2] https://mustache.github.io/
[3] https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
[4] http://htmlpreview.github.io/?https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
[5] https://www.w3.org/2013/csvw/wiki/CSVTemplating_status#Simple_example_and_template

----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me

Received on Wednesday, 10 September 2014 07:52:10 UTC