Some comments on the RDF->CSV document from Ivan Herman on 2014-04-23 (public-csv-wg@w3.org from April 2014)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 23 Apr 2014 17:13:14 +0200
To: Andy Seaborne <andy@apache.org>, Gregg Kellogg <gregg@greggkellogg.net>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <C22A6CE5-51CC-4B50-B98E-2611978601C8@w3.org>

(To avoid any misunderstandings, I looked at http://w3c.github.io/csvw/csv2rdf/)

I am o.k. with the general approach, and with the level of simplicity/complexity of the templates. I would probably want each feature in the templates to be backed up with a reasonable use case (ideally, a use case in real use), but the 'melody', as is documented now, is fine to me. My litmus test is whether the mapping is implementable in simple and small JS library running on client side (not exclusively there, but also there). I think this is essential if we want any acceptance of this by client side web apps, ie, if we want to maintain a minimal level of hope that client side applications would use this:-).

For the syntax question: I think my litmus test also means that a JSON syntax is almost a must: I do not expect anybody to start writing a turtle parser in JS for the purpose of an RDF mapping. The template seems to be fairly simple and probably has a straightforward description in JSON, ie, I do not believe that to be an issue...

---

The templates are on rows on columns, which presupposes a homogeneity of the table; again, I would want to check that against use cases. In particular, I wonder whether the templates that sets the language tag for a whole column is o.k. (e.g., if the column is something like 'native name' for cities, then each cell may have a different language tag; I am not sure how we would handle that.)

---

From a more general point of view, an obvious issue on which we will have to give an answer to is the relationship of the template language to R2RML. As far as I could see, the features in the current template language are an almost strict subset of R2RML (I am not sure about the datatype mappings; R2RML makes use of SQL datatypes which we do not want to refer to).

That being said, if we just referred to R2RML in our spec we would scare away a lot of people; meaning that we should probably not do it. However, a precise mapping to R2RML may still be necessary to be written down in the document, in case somebody want to use an existing R2RML engine. We should also check that the simple (template-less) mapping is similarly a subset to Direct Mapping, and document that

---

I was also wondering on the call, whether the template is RDF specific, or whether at least the general direction could be reused for a JSON mapping or, if needed, XML. I guess this is certainly true for JSON: the templates to use the right predicate names can be reused to generate the keys, for example. But I have not done a detailed analysis on this, and there are, almost surely, RDF specific features. But we should probably try to factor out the common parts.

(Of course, there is a question whether we need a separate JSON, or whether the current mapping would simply produce JSON-LD, ie, JSON. I am a little bit afraid of the RDF features, like blank nodes or @type, transpire into generic JSON which people may not want...

---

Minor issue: the automatic numbering/naming of predicates should take into account RTL writing direction, see Yakov's examples for CSV files in Arabic or Hebrew...

Ivan

----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Received on Wednesday, 23 April 2014 15:13:44 UTC