Re: RDF graph merging: How useful is it really? (was Re: Blank Nodes Re: Toward easier RDF: a proposal) from Michael Brunnbauer on 2018-11-28 (semantic-web@w3.org from November 2018)

From: Michael Brunnbauer <brunni@netestate.de>
Date: Wed, 28 Nov 2018 10:45:56 +0100
To: Dimitris Kontokostas <jimkont@gmail.com>
Cc: Semantic Web <semantic-web@w3.org>, dsr@w3.org
Message-ID: <20181128094556.GA27511@netestate.de>

Hello Dimitris,

On Wed, Nov 28, 2018 at 07:24:30AM +0100, Dimitris Kontokostas wrote:
> Of course you could do that, RDF does not need to monopolize the data
> integration space. I could also argue that is could be easier in some
> cases, e.g. when you have 2 different sources of the same structure,
> prefixed rdbms tables could do the trick. But, what if you have 5 or 10 or
> 100 sources and the structure is not exactly the same?

I could argue that the data integration space is not something the average
developer is concerned with. I mean stuff that goes beyond occasionaly
working with a new dataset because it has relevant information. When you work
with 100+ sources, RDF looks nice for data exchange but you are still lost
when those sources are not compatible. RDF makes it *possible* for them to be
compatible. It's the idea of delegating data integration work to the data
producer. The existence of schema.org and the data produced with it show that 
this is not a self-seller.

How data is saved internally is of course an entirely different topic.
Unless you're into serendipitous discovery (a phrase I had already forgotten
but I only had to google Kingsley), having a triple store is a bit like 
delaying work to later.

> How easy would it be to implement value level provenance on RDBMS if you
> had to?

You mean where provenance information is not coded into table/database names?
Not easy at all I guess.

> Yes, RDF does not magically deduplicate  your duplicate data but why can't
> you create a new named graph (or a different RDF DB) to merge your entities
> and deal with duplicates later as well?

I can. It just does not look much easier with RDF.

> How will the merged RDBMS table(s) look like if the source strucutres do
> not exactly match?

Same problem with RDF - but on the "row" level instead of the schema level.

> will you use the minimal set of fields or maximul and
> use nulls on empty cells?

Will I use the minimal SPARQL query or pepper it with OPTIONALs?

> How would you deal with field name clashes that represent different things?

I would just rename them (remember I have to look at the schemas anyway).

> How would you deal with identifier clashes?

As I'm deduplicating I will probably create new ones (unless the identifiers
do not have local scope - in which case a clash is less probable).

[joins]
> How would you query multple RDBMS DBs at the same time or how would you
> query hundreds of prefixed tables?

Why would I want to join hundreds of tables? A typical query will only join
a handful of tables.

> > And is the potential time saved relevant for the average developer? Who
> > will probably have to invest a lot of time anyway to make sure that the new
> > data does not screw up his app?
> 
> I agree, the learning curve of RDF is quite steep for an average developer

True but that was not my point. My point was that the dev usually has to spend 
a lot of time with data integration regardless of the technology used - 
learning RDF or relational databases comes on top of that.

I should be constructive. What has bothered me most in my work with RDF?

That certainly would be: SPARQL queries getting more complicated than necessary.

Most notably: The lack of persistent blank node identifiers for followup
queries. I know some triples stores can do this but it should really be
supported by all of them.

Also working with named graphs gets complicated quick - like adding a 
{ GRAPH ?graph { } } block for every bit of info that may come from a different
graph.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  https://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Received on Wednesday, 28 November 2018 09:46:20 UTC