Re: CSV2RDF and R2RML from Dan Brickley on 2014-02-21 (public-csv-wg@w3.org from February 2014)

From: Dan Brickley <danbri@google.com>
Date: Fri, 21 Feb 2014 12:03:03 +0000
To: Ivan Herman <ivan@w3.org>
Cc: public-csv-wg@w3.org, Gregg Kellogg <gregg@greggkellogg.net>, Juan Sequeda <juanfederico@gmail.com>, Andy Seaborne <andy@apache.org>
Message-ID: <CAK-qy=4e_P1kf5YCSLQ5ynCUFPbCSbDgh085Pno38WXkzn9a-g@mail.gmail.com>
Excuse the top-posting, I'm writing this on a portable telephone.

Can we please stop comparing possible solutions before we finish collecting
and documenting (with full examples) our use cases and requirements.

Cheers,

Dan
On 21 Feb 2014 01:06, "Ivan Herman" <ivan@w3.org> wrote:

> Gregg,
>
> see below
>
>
> Gregg Kellogg wrote:
> >>>
> >> You will have to explain this. Is this something that JSON-LD already
> has (and then I missed it) or is it something you just created?
> >
> > JSON-LD has the concept of frames, in the context of a not-completed
> spec [1]. The purpose of framing is to take a JSON-LD document, which is
> flattened, and re-create a structured JSON-LD document based on the framing
> instructions. In the example above, we define a frame with a single changed
> relationship. without the clutter, it's like the following:
> >
> > {
> >   "@type": "ex:SalesRegion",
> >   "Sales Region": null,
> >   "ex:period": {
> >     "@type": "ex:SalesPeriod",
> >     "Quarter": null,
> >     "Sales": null
> >   }
> > }
> >
> > This defines a type on the outer node definition, and relates it to the
> inner node definition using a made up ex:period property, giving both node
> definitions types, but that's not strictly necessary.
> >
> > The notion is, that when processing a row from a CSV, I map column
> values to instances of this frame, substituting the column values where the
> "null" values exist. Nominally, I'd need a way to either map Sales Region
> to a unique identifier (say BNode ID), or describe the Sales Region
> property as being unique. This doesn't exist, either in  JSON-LD framing,
> or in my existing proposal, but we might see a way where a templating
> system could be used to create IRIs or BNode identifiers through suitable
> substitution of column values.
> >
> > I've been hand-waving around this, but one way to do this might be to
> extend the context definition to describe identifier templates:
> >
> > {
> >   "region_id": {"@id": "_:{Sales Region}", "@type": "@idTemplate"}
> > }
> >
> > I'm sure we can do much better, but the basic idea is that column values
> can be used within a template used to construct an IRI or BNode identifier,
> using some suitable rules. We could then use "region_id" in the frame, with
> the understanding that it will be expanded using the template defined in
> the context.
> >
> > {
> >   "@id": "region_id",
> >   "@type": "ex:SalesRegion",
> >   "Sales Region": null,
> >   "ex:period": {
> >     "@type": "ex:SalesPeriod",
> >     "Quarter": null,
> >     "Sales": null
> >   }
> > }
> >
> > This also handles composite key creation, although it's not really
> required in this case. If I ran my hypothetical algorithm over each line of
> input, I'd get JSON-LD node definitions such as the following (would
> actually use full IRIs):
> >
> > [
> > { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q1", "ex:value": 10}},
> > { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q2", "ex:value": 15}},
> > { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q3", "ex:value": 7}},
> > { "@id": "_:North", "@type": "ex:SalesRegion", "dc:title": "North",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q4", "ex:value": 25}},
> > { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q1", "ex:value": 9}},
> > { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q2", "ex:value": 15}},
> > { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q3", "ex:value": 16}},
> > { "@id": "_:South", "@type": "ex:SalesRegion", "dc:title": "South",
> "ex:period": {"@type": "ex:SalesPeriod", "dc:title": "Q4", "ex:value": 31}}
> > ]
> >
> > This can then be compacted to get a hierarchical JSON-LD document, or
> just directly turned into RDF using the JSON-LD to RDF algorithm.
> >
> > Gregg
> >
>
> I am torn:-(
>
> - On the one hand, I can see the potential, in general, of this line of
> work in
> manipulating JSON-LD, essentially defining some sort of a JSON-LD
> transformation. Yes, this may go in direction of what Andy was calling
> 'uplifting', if I remember well.
>
> - However... I do not think we should to this in this Working Group. This
> is
> where I think I agree with what (I think...) Andy was saying: we should
> define
> something relatively simple that covers lots of needs (and I believe a
> simple
> mapping to JSON-LD and a usage for today's @context may lead to a long
> way).
> Data consumers _may_ want to use other tools (transformation via JSON-LD
> tricks,
> transformation to RDF and use SPARQL, SPIN, whatever), but I do not think
> this
> Working Group should define those. If we are not careful, the "mapping to
> RDF"
> part of the work would consume most of the WG's energy and this is not, I
> believe, the main goal of this group.
>
>
> Ivan
>
>
>
>
Received on Friday, 21 February 2014 12:03:31 UTC