RE: Some comments on the RDF->CSV document from Tandy, Jeremy on 2014-04-28 (public-csv-wg@w3.org from April 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Mon, 28 Apr 2014 13:56:09 +0000
To: Andy Seaborne <andy@apache.org>, Ivan Herman <ivan@w3.org>
CC: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE20883AC50@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
> One possibility is to define a "canonicalization" step, which is CSV 
> to Tabular Data Model that puts the CSV into some sort of expected form.

> This step would include data cleaning and generally fixing things, 
> dealing with alternative separator, dealing with new lines, and could 
> be the place to deal with RTL.

I agree with Andy - I think that some form of text-based processing to get the "tabular data" into a well-formed state will be required in a lot of cases. This is something that I think should occur _before_ any transformation to rdf (or json or other format) is attempted.

Jeremy

-----Original Message-----
From: Andy Seaborne [mailto:andy@apache.org] 
Sent: 27 April 2014 11:57
To: Ivan Herman
Cc: W3C CSV on the Web Working Group
Subject: Re: Some comments on the RDF->CSV document

On 23/04/14 16:13, Ivan Herman wrote:
> (To avoid any misunderstandings, I looked at
> http://w3c.github.io/csvw/csv2rdf/)

> I am o.k. with the general approach, and with the level of
> simplicity/complexity of the templates. I would probably want each
> feature in the templates to be backed up with a reasonable use case
> (ideally, a use case in real use), but the 'melody', as is documented
> now, is fine to me. My litmus test is whether the mapping is
> implementable in simple and small JS library running on client side
> (not exclusively there, but also there). I think this is essential if
> we want any acceptance of this by client side web apps, ie, if we
> want to maintain a minimal level of hope that client side
> applications would use this:-).

FWIW My litmus test is bulk conversion of large CSV files, (e.g. inside 
a DB loading pipeline).

>
> For the syntax question: I think my litmus test also means that a
> JSON syntax is almost a must:

The doc is "CSV2RDF" :-)

Did you have in mind that your small JS library is working in the RDF 
data model or JSON?  So while I agree JSON is a "must" for the WG, for 
your case, the CSV->JSON is the need.  This doc you reviewed may not be 
the one you want.

Maybe we end up with a lot of sharing (good) but we don't know yet.

> I do not expect anybody to start
> writing a turtle parser in JS for the purpose of an RDF mapping. The
> template seems to be fairly simple and probably has a straightforward
> description in JSON, ie, I do not believe that to be an issue...

When I read the template description, I thought of it as text 
processing, not parsing as Turtle.

The process to produce and output file/stream by text processing, not 
data structure manipulation. Hence sharing with a JSON conversion is 
potentially there.

>
> ---
>
> The templates are on rows on columns, which presupposes a homogeneity
> of the table; again, I would want to check that against use cases. In
> particular, I wonder whether the templates that sets the language tag
> for a whole column is o.k. (e.g., if the column is something like
> 'native name' for cities, then each cell may have a different
> language tag; I am not sure how we would handle that.)
>
> ---
>
> From a more general point of view, an obvious issue on which we will
> have to give an answer to is the relationship of the template
> language to R2RML. As far as I could see, the features in the current
> template language are an almost strict subset of R2RML (I am not sure
> about the datatype mappings; R2RML makes use of SQL datatypes which
> we do not want to refer to).
>
> That being said, if we just referred to R2RML in our spec we would
> scare away a lot of people; meaning that we should probably not do
> it. However, a precise mapping to R2RML may still be necessary to be
> written down in the document, in case somebody want to use an
> existing R2RML engine. We should also check that the simple
> (template-less) mapping is similarly a subset to Direct Mapping, and
> document that
>
> ---
>
> I was also wondering on the call, whether the template is RDF
> specific, or whether at least the general direction could be reused
> for a JSON mapping or, if needed, XML. I guess this is certainly true
> for JSON: the templates to use the right predicate names can be
> reused to generate the keys, for example. But I have not done a
> detailed analysis on this, and there are, almost surely, RDF specific
> features. But we should probably try to factor out the common parts.
>
> (Of course, there is a question whether we need a separate JSON, or
> whether the current mapping would simply produce JSON-LD, ie, JSON. I
> am a little bit afraid of the RDF features, like blank nodes or
> @type, transpire into generic JSON which people may not want...
>
> ---
>
> Minor issue: the automatic numbering/naming of predicates should take
> into account RTL writing direction, see Yakov's examples for CSV
> files in Arabic or Hebrew...

One possibility is to define a "canonicalization" step, which is CSV to 
Tabular Data Model that puts the CSV into some sort of expected form.

This step would include data cleaning and generally fixing things, 
dealing with alternative separator, dealing with new lines, and could be 
the place to deal with RTL.

	Andy

>
> Ivan
>
> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D
> FOAF: http://www.ivan-herman.net/foaf
>
>
>
>
>
Received on Monday, 28 April 2014 13:56:39 UTC