Re: RDF graph merging: How useful is it really? (was Re: Blank Nodes Re: Toward easier RDF: a proposal)

Hi Michael

I feel that the example you provide is quite simplified and does not
showcase the strengths of RDF. I 'll try to raise some points on your
comments inline

On Wed, Nov 28, 2018 at 3:07 AM Michael Brunnbauer <brunni@netestate.de>
wrote:

>
> hi all,
>
> mhmm... I just realized that maybe I should not call out people on visions
> while I spread negative visions about computer security myself :-)
>
> So change of topic:
>
> RDF graph merging has been named as one of the big pros of RDF. Does this
> stand up to scrutiny - especially for the "average 33%" use case?
>
> I can easily import relational data from different sources into a RDB
> server - one database per source or rename tables with a prefix. Am I
> really so much worse off than the guy with the triple store trying to make
> sense of his triple soup?
>

Of course you could do that, RDF does not need to monopolize the data
integration space. I could also argue that is could be easier in some
cases, e.g. when you have 2 different sources of the same structure,
prefixed rdbms tables could do the trick. But, what if you have 5 or 10 or
100 sources and the structure is not exactly the same?


> Depends on the quality of the data sources I guess.


Yes, RDF cannot spare you from bad source data


> Triples - (re)use of well known ontologies and URIs. RDB - good
> documentation and common keys. But RDF won't free me from some detailed
> inspection and cleaning.


Yes, RDF cannot spare you from data quality assessment tasks


> The RDB makes this easier because it has provenance information - unless I
> use quads in the triple store to keep provenance.
>

Yes, if you need provenance then some mechanism for provenance on the
source or triple level would be needed, depending on what you choose you
might need quads indeed.
How easy would it be to implement value level provenance on RDBMS if you
had to?


> What about querying?
>
> RDF will shine when I want to query the combined pool for some entity -
> like persons from source A together with persons from source B. But I still
> may have duplicates! In the RDB, I would probably create a new table to
> merge the entities and address the problem with duplicates on the way.
>

Yes, RDF does not magically deduplicate  your duplicate data but why can't
you create a new named graph (or a different RDF DB) to merge your entities
and deal with duplicates later as well?

How will the merged RDBMS table(s) look like if the source strucutres do
not exactly match? will you use the minimal set of fields or maximul and
use nulls on empty cells? what will be the semantics of the filled nulls
compared to the existing nulls you may have? How would you deal with field
name clashes that represent different things? How would you deal with
identifier clashes?
You can of course deal with all these in RDBMS but the complexity of the
solution starts getting higher the more (diverse) sources you try to
integrate.


> And joins? If I use quads for provenance, will my SPAQRL queries be easier
> than my RDB joins? I doubt it.
>

How would you query multple RDBMS DBs at the same time or how would you
query hundreds of prefixed tables?


> And is the potential time saved relevant for the average developer? Who
> will probably have to invest a lot of time anyway to make sure that the new
> data does not screw up his app?
>

I agree, the learning curve of RDF is quite steep for an average developer


> Maybe someone will mention SHACL now or some similar stuff. But aren't
> most of the problems addressed by that already solved out of the box in my
> RDB?
>

The point I am trying to make here is that indeed, you can solve the
problems RDF tries to solve with different technologies, RDBMS being one
choice.
However, some would argue that RDF makes it (much) less painfull,
especially as the diversity and/or the sources size increases


> Regards,
>
> Michael Brunnbauer
>
> On Wed, Nov 28, 2018 at 12:42:54AM +0100, Michael Brunnbauer wrote:
> >
> > Hello Dave,
> >
> > On Tue, Nov 27, 2018 at 09:31:46PM +0000, Dave Raggett wrote:
> > > This is the basis for the Web of Things :-)
> > > RDF as the basis for a) semantic descriptions of the kinds of things
> and their relationships to each other and to the context in which they
> reside, and b) describing software objects that applications can interact
> with locally independent of where the actual thing is or the means to
> communicate with it.
> >
> > This does not sound like something that is needed right now. This sounds
> like a vision.
> >
> > I hope we can describe what problems RDF+friends is meant to solve
> without resorting to visions. As one of our chancellors once said: "People
> with visions should go to the doctor" :-) Why? They are hard to get right.
> The bigger they are, the more likely it is you got it wrong. I think even
> the Web started with moderate ambitions - but then surprised everybody.
> >
> > Besides: IMO, everybody who thinks that connecting more "things" to the
> Internet is a good idea should read more security news. And we are even
> unable to get very basic and supposedly simple stuff right. Stuff like
> glibc strstr() for example.
> >
> > Regards,
> >
> > Michael Brunnbauer
> >
> > --
> > ++  Michael Brunnbauer
> > ++  netEstate GmbH
> > ++  Geisenhausener Straße 11a
> > ++  81379 München
> > ++  Tel +49 89 32 19 77 80
> > ++  Fax +49 89 32 19 77 89
> > ++  E-Mail brunni@netestate.de
> > ++  https://www.netestate.de/
> > ++
> > ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> > ++  USt-IdNr. DE221033342
> > ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> > ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>
>
>
> --
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  Geisenhausener Straße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89
> ++  E-Mail brunni@netestate.de
> ++  https://www.netestate.de/
> ++
> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> ++  USt-IdNr. DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>


-- 
Kontokostas Dimitris

Received on Wednesday, 28 November 2018 06:25:20 UTC