Re: Blank Node Identifiers and RDF Dataset Normalization from Steve Harris on 2013-03-01 (public-linked-json@w3.org from March 2013)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 1 Mar 2013 20:24:53 +0000
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, RDF WG <public-rdf-wg@w3.org>, Linked JSON <public-linked-json@w3.org>
Message-Id: <E4296087-64A9-428C-9B65-261EF5746377@garlik.com>

On 1 Mar 2013, at 16:10, Kingsley Idehen wrote:

> On 3/1/13 6:51 AM, Steve Harris wrote:
>> On 2013-02-27, at 16:36, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>> 
>>> On 2/27/13 10:37 AM, Steve Harris wrote:
>>>> I don't want to throw numbers about, but for us the cost of anything that significantly decreases the efficiency of our RDF storage carries a huge monetary cost - we couldn't justify it without a significant upside.
>>> This is a very important point, and from the DBMS engineering perspective it's true. There are costs to existing RDF stores and DBMS engines.
>>> 
>>> A suggestion:
>>> 
>>> Manu: JSON-LD should make a note about the use of bnodes to denote graphs. That note could then hone into its special use case scenarios e.g., where there's high velocity data with little mass.
>>> 
>>> Steve:
>>> As already acknowledged above, you are correct about the optimization cost to existing RDF stores and DBMS engines (it will hit Virtuoso too) . Thus, when our engines encounter such data, we could simply  just remap the IRIs as part of our data ingestion (insert | import) routines. That's what we'll end up doing.
>>> 
>>> Naturally, this means tweaking existing code re. data import, ingestion, and creation etc.. Personally, I believe we have the ability to close out this matter without holding up the various workgroups i.e., RDF 1.1 stays as is. JSON-LD has a fleshed out version of the note I suggested to Manu etc..
>>> 
>>> Manu/Steve:
>>> 
>>> What do you think?
>> I believe that would be equivalent to defining the syntactic construct to generate Skolem URIs at parse time - but I've not through about it too deeply.
>> 
>> - Steve
>> 
> 
> Yes, so a little work, but worthwhile since it keeps the data being loaded distinct from the store and its specific data management functionality. This also means that loaders can be crafted with switches to control the load modality etc..
> 
> Ultimately, this also means that the RDF model can evolve separately from notations (e.g., Turtle, TriG, JSON-LD etc..), consumer and client apps, data stores. Basically, everything remains (or becomes) loosely coupled.

I don't agree about that at all - the parser becomes much more tightly coupled to the processing engine. But the advantage is that all the work happens in a new module (the JSON-LD parser) which has to be developed in any case, if you want to support JSON-LD.

- Steve

-- 
Steve Harris, CTO
Garlik, a part of Experian
+44 7854 417 874  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ

Received on Friday, 1 March 2013 20:25:18 UTC