Re: Blank Node Identifiers and RDF Dataset Normalization

On 3/1/13 3:24 PM, Steve Harris wrote:
> On 1 Mar 2013, at 16:10, Kingsley Idehen wrote:
>
>> On 3/1/13 6:51 AM, Steve Harris wrote:
>>> On 2013-02-27, at 16:36, Kingsley Idehen <kidehen@openlinksw.com> 
>>> wrote:
>>>
>>>> On 2/27/13 10:37 AM, Steve Harris wrote:
>>>>> I don't want to throw numbers about, but for us the cost of 
>>>>> anything that significantly decreases the efficiency of our RDF 
>>>>> storage carries a huge monetary cost - we couldn't justify it 
>>>>> without a significant upside.
>>>> This is a very important point, and from the DBMS engineering 
>>>> perspective it's true. There are costs to existing RDF stores and 
>>>> DBMS engines.
>>>>
>>>> A suggestion:
>>>>
>>>> Manu: JSON-LD should make a note about the use of bnodes to denote 
>>>> graphs. That note could then hone into its special use case 
>>>> scenarios e.g., where there's high velocity data with little mass.
>>>>
>>>> Steve:
>>>> As already acknowledged above, you are correct about the 
>>>> optimization cost to existing RDF stores and DBMS engines (it will 
>>>> hit Virtuoso too) . Thus, when our engines encounter such data, we 
>>>> could simply  just remap the IRIs as part of our data ingestion 
>>>> (insert | import) routines. That's what we'll end up doing.
>>>>
>>>> Naturally, this means tweaking existing code re. data import, 
>>>> ingestion, and creation etc.. Personally, I believe we have the 
>>>> ability to close out this matter without holding up the various 
>>>> workgroups i.e., RDF 1.1 stays as is. JSON-LD has a fleshed out 
>>>> version of the note I suggested to Manu etc..
>>>>
>>>> Manu/Steve:
>>>>
>>>> What do you think?
>>> I believe that would be equivalent to defining the syntactic 
>>> construct to generate Skolem URIs at parse time - but I've not 
>>> through about it too deeply.
>>>
>>> - Steve
>>>
>> Yes, so a little work, but worthwhile since it keeps the data being 
>> loaded distinct from the store and its specific data management 
>> functionality. This also means that loaders can be crafted with 
>> switches to control the load modality etc..
>>
>> Ultimately, this also means that the RDF model can evolve separately 
>> from notations (e.g., Turtle, TriG, JSON-LD etc..), consumer and 
>> client apps, data stores. Basically, everything remains (or becomes) 
>> loosely coupled.
> I don't agree about that at all - the parser becomes much more tightly 
> coupled to the processing engine.

The parser is part of the engine that doesn't have to be tightly coupled 
per se. We don't have our parsers tightly coupled which is why we can 
transform somewhere close to 100+ different data sources into RDF based 
Linked Data via ETL processing pipelines that terminate in DBMS storage.

The data isn't part of the engine.

The engine has its parsers and their associated processing rules.

The engine has the option to accommodate perceive idiosyncrasies in the 
data it ingests.

>   But the advantage is that all the work happens in a new module (the 
> JSON-LD parser) which has to be developed in any case, if you want to 
> support JSON-LD.

Yes for JSON-LD or any other data representation format that's close 
enough to the RDF data model.

Thus, a change in JSON-LD or RDF shouldn't create monumental ripple 
effects on stores or DBMS products.

That's what I mean by keeping the following loosely coupled:

1. RDF model
2. RDF syntax
3. RDF syntax notation
4. RDF stores and DBMS engines .

>
> - Steve
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Received on Friday, 1 March 2013 20:42:12 UTC