Re: Some comments on the RDF->CSV document from Ivan Herman on 2014-04-27 (public-csv-wg@w3.org from April 2014)

From: Ivan Herman <ivan@w3.org>
Date: Sun, 27 Apr 2014 14:26:52 +0200
To: Andy Seaborne <andy@apache.org>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <E13084BB-8E94-4F31-BE96-143FAA21E583@w3.org>
On 27 Apr 2014, at 12:56 , Andy Seaborne <andy@apache.org> wrote:

> On 23/04/14 16:13, Ivan Herman wrote:
>> (To avoid any misunderstandings, I looked at
>> http://w3c.github.io/csvw/csv2rdf/)
> 
>> I am o.k. with the general approach, and with the level of
>> simplicity/complexity of the templates. I would probably want each
>> feature in the templates to be backed up with a reasonable use case
>> (ideally, a use case in real use), but the 'melody', as is documented
>> now, is fine to me. My litmus test is whether the mapping is
>> implementable in simple and small JS library running on client side
>> (not exclusively there, but also there). I think this is essential if
>> we want any acceptance of this by client side web apps, ie, if we
>> want to maintain a minimal level of hope that client side
>> applications would use this:-).
> 
> FWIW My litmus test is bulk conversion of large CSV files, (e.g. inside a DB loading pipeline).
> 

Fair enough...

>> 
>> For the syntax question: I think my litmus test also means that a
>> JSON syntax is almost a must:
> 
> The doc is "CSV2RDF" :-)
> 
> Did you have in mind that your small JS library is working in the RDF data model or JSON?  So while I agree JSON is a "must" for the WG, for your case, the CSV->JSON is the need.  This doc you reviewed may not be the one you want.
> 
> Maybe we end up with a lot of sharing (good) but we don't know yet.

By 'syntax' I meant the syntax used for the templates themselves. Ie, still a CSV->RDF (whether producing turtle or JSON-LD is a secondary issue at this point).

> 
>> I do not expect anybody to start
>> writing a turtle parser in JS for the purpose of an RDF mapping. The
>> template seems to be fairly simple and probably has a straightforward
>> description in JSON, ie, I do not believe that to be an issue...
> 
> When I read the template description, I thought of it as text processing, not parsing as Turtle.

Well, if the templates are themselves described in Turtle, then I would need a Turtle parser to be able to use them. That was my problem... Ie, I think using JSON for the definition of the templates would be better.

Ivan

> 
> The process to produce and output file/stream by text processing, not data structure manipulation. Hence sharing with a JSON conversion is potentially there.
> 
>> 
>> ---
>> 
>> The templates are on rows on columns, which presupposes a homogeneity
>> of the table; again, I would want to check that against use cases. In
>> particular, I wonder whether the templates that sets the language tag
>> for a whole column is o.k. (e.g., if the column is something like
>> 'native name' for cities, then each cell may have a different
>> language tag; I am not sure how we would handle that.)
>> 
>> ---
>> 
>> From a more general point of view, an obvious issue on which we will
>> have to give an answer to is the relationship of the template
>> language to R2RML. As far as I could see, the features in the current
>> template language are an almost strict subset of R2RML (I am not sure
>> about the datatype mappings; R2RML makes use of SQL datatypes which
>> we do not want to refer to).
>> 
>> That being said, if we just referred to R2RML in our spec we would
>> scare away a lot of people; meaning that we should probably not do
>> it. However, a precise mapping to R2RML may still be necessary to be
>> written down in the document, in case somebody want to use an
>> existing R2RML engine. We should also check that the simple
>> (template-less) mapping is similarly a subset to Direct Mapping, and
>> document that
>> 
>> ---
>> 
>> I was also wondering on the call, whether the template is RDF
>> specific, or whether at least the general direction could be reused
>> for a JSON mapping or, if needed, XML. I guess this is certainly true
>> for JSON: the templates to use the right predicate names can be
>> reused to generate the keys, for example. But I have not done a
>> detailed analysis on this, and there are, almost surely, RDF specific
>> features. But we should probably try to factor out the common parts.
>> 
>> (Of course, there is a question whether we need a separate JSON, or
>> whether the current mapping would simply produce JSON-LD, ie, JSON. I
>> am a little bit afraid of the RDF features, like blank nodes or
>> @type, transpire into generic JSON which people may not want...
>> 
>> ---
>> 
>> Minor issue: the automatic numbering/naming of predicates should take
>> into account RTL writing direction, see Yakov's examples for CSV
>> files in Arabic or Hebrew...
> 
> One possibility is to define a "canonicalization" step, which is CSV to Tabular Data Model that puts the CSV into some sort of expected form.
> 
> This step would include data cleaning and generally fixing things, dealing with alternative separator, dealing with new lines, and could be the place to deal with RTL.
> 
> 	Andy
> 
>> 
>> Ivan
>> 
>> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D
>> FOAF: http://www.ivan-herman.net/foaf


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf
Received on Sunday, 27 April 2014 12:27:23 UTC