Re: CSV2RDF and R2RML from Ivan Herman on 2014-02-21 (public-csv-wg@w3.org from February 2014)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 21 Feb 2014 10:05:54 +0100
To: Gregg Kellogg <gregg@greggkellogg.net>
CC: Andy Seaborne <andy@apache.org>, Juan Sequeda <juanfederico@gmail.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <530716F2.40001@w3.org>
Gregg,

see below


Gregg Kellogg wrote:
>>>
>> You will have to explain this. Is this something that JSON-LD already has (and then I missed it) or is it something you just created?
> 
> JSON-LD has the concept of frames, in the context of a not-completed spec [1]. The purpose of framing is to take a JSON-LD document, which is flattened, and re-create a structured JSON-LD document based on the framing instructions. In the example above, we define a frame with a single changed relationship. without the clutter, it's like the following:
> 
> {
>   "@type": "ex:SalesRegion",
>   "Sales Region": null,
>   "ex:period": {
>     "@type": "ex:SalesPeriod",
>     "Quarter": null,
>     "Sales": null
>   }
> }
> 
> This defines a type on the outer node definition, and relates it to the inner node definition using a made up ex:period property, giving both node definitions types, but that's not strictly necessary.
> 
> The notion is, that when processing a row from a CSV, I map column values to instances of this frame, substituting the column values where the "null" values exist. Nominally, I'd need a way to either map Sales Region to a unique identifier (say BNode ID), or describe the Sales Region property as being unique. This doesn't exist, either in  JSON-LD framing, or in my existing proposal, but we might see a way where a templating system could be used to create IRIs or BNode identifiers through suitable substitution of column values.
> 
> I've been hand-waving around this, but one way to do this might be to extend the context definition to describe identifier templates:
> 
> {
>   "region_id": {"@id": "_:{Sales Region}", "@type": "@idTemplate"}
> }
> 
> I'm sure we can do much better, but the basic idea is that column values can be used within a template used to construct an IRI or BNode identifier, using some suitable rules. We could then use "region_id" in the frame, with the understanding that it will be expanded using the template defined in the context.
> 
> {
>   "@id": "region_id",
>   "@type": "ex:SalesRegion",
>   "Sales Region": null,
>   "ex:period": {
>     "@type": "ex:SalesPeriod",
>     "Quarter": null,
>     "Sales": null
>   }
> }
> 
> This also handles composite key creation, although it's not really required in this case. If I ran my hypothetical algorithm over each line of input, I'd get JSON-LD node definitions such as the following (would actually use full IRIs):
> 
> [
> { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q1", "ex:value": 10}},
> { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q2", "ex:value": 15}},
> { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q3", "ex:value": 7}},
> { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q4", "ex:value": 25}},
> { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q1", "ex:value": 9}},
> { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q2", "ex:value": 15}},
> { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q3", "ex:value": 16}},
> { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South", "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q4", "ex:value": 31}}
> ]
> 
> This can then be compacted to get a hierarchical JSON-LD document, or just directly turned into RDF using the JSON-LD to RDF algorithm.
> 
> Gregg
> 

I am torn:-(

- On the one hand, I can see the potential, in general, of this line of work in
manipulating JSON-LD, essentially defining some sort of a JSON-LD
transformation. Yes, this may go in direction of what Andy was calling
'uplifting', if I remember well.

- However... I do not think we should to this in this Working Group. This is
where I think I agree with what (I think...) Andy was saying: we should define
something relatively simple that covers lots of needs (and I believe a simple
mapping to JSON-LD and a usage for today's @context may lead to a long way).
Data consumers _may_ want to use other tools (transformation via JSON-LD tricks,
transformation to RDF and use SPARQL, SPIN, whatever), but I do not think this
Working Group should define those. If we are not careful, the "mapping to RDF"
part of the work would consume most of the WG's energy and this is not, I
believe, the main goal of this group.


Ivan
Received on Friday, 21 February 2014 09:06:29 UTC