Re: RDF graph merging: How useful is it really? (was Re: Blank Nodes Re: Toward easier RDF: a proposal)

Michael,

You mention import into RDB but do not address the merge itself, so it
becomes apples/oranges comparison.

And merge is one of the points where RDB breaks apart. Say you have 2
or more tables, possibly coming from different sources, with
essentially the same kind of information, but split differently by
columns. Maybe one contains FullName and the other one
LastName,FirstName. Because the schema is fixed and mandatory, even
such minor differences prevent you from automatically merging the
tables. And it gets worse with the growing number of sources. RDF does
not have this problem at the physical level (even though identifiers
and vocabularies might need to be aligned).

Another issue with RDB is storing semi-structured data and/or
open-ended "custom" fields. You end up with very sparse tables with
mostly NULLs, and I've seen some implementations with table columns
CUSTOM_FIELD_1, ..., CUSTOM_FIELD_99 etc. This is just idiotic. Again,
RDF does not have this problem.

You can read more on "The Problem with Relational Databases" in
"Linked Data in the Enterprise":
https://www.topquadrant.com/docs/whitepapers/information-enlightenment-2.0-final.pdf

Martynas
On Wed, Nov 28, 2018 at 3:05 AM Michael Brunnbauer <brunni@netestate.de> wrote:
>
>
> hi all,
>
> mhmm... I just realized that maybe I should not call out people on visions while I spread negative visions about computer security myself :-)
>
> So change of topic:
>
> RDF graph merging has been named as one of the big pros of RDF. Does this stand up to scrutiny - especially for the "average 33%" use case?
>
> I can easily import relational data from different sources into a RDB server - one database per source or rename tables with a prefix. Am I really so much worse off than the guy with the triple store trying to make sense of his triple soup?
>
> Depends on the quality of the data sources I guess. Triples - (re)use of well known ontologies and URIs. RDB - good documentation and common keys. But RDF won't free me from some detailed inspection and cleaning. The RDB makes this easier because it has provenance information - unless I use quads in the triple store to keep provenance.
>
> What about querying?
>
> RDF will shine when I want to query the combined pool for some entity - like persons from source A together with persons from source B. But I still may have duplicates! In the RDB, I would probably create a new table to merge the entities and address the problem with duplicates on the way.
>
> And joins? If I use quads for provenance, will my SPAQRL queries be easier than my RDB joins? I doubt it.
>
> And is the potential time saved relevant for the average developer? Who will probably have to invest a lot of time anyway to make sure that the new data does not screw up his app?
>
> Maybe someone will mention SHACL now or some similar stuff. But aren't most of the problems addressed by that already solved out of the box in my RDB?
>
> Regards,
>
> Michael Brunnbauer
>
> On Wed, Nov 28, 2018 at 12:42:54AM +0100, Michael Brunnbauer wrote:
> >
> > Hello Dave,
> >
> > On Tue, Nov 27, 2018 at 09:31:46PM +0000, Dave Raggett wrote:
> > > This is the basis for the Web of Things :-)
> > > RDF as the basis for a) semantic descriptions of the kinds of things and their relationships to each other and to the context in which they reside, and b) describing software objects that applications can interact with locally independent of where the actual thing is or the means to communicate with it.
> >
> > This does not sound like something that is needed right now. This sounds like a vision.
> >
> > I hope we can describe what problems RDF+friends is meant to solve without resorting to visions. As one of our chancellors once said: "People with visions should go to the doctor" :-) Why? They are hard to get right. The bigger they are, the more likely it is you got it wrong. I think even the Web started with moderate ambitions - but then surprised everybody.
> >
> > Besides: IMO, everybody who thinks that connecting more "things" to the Internet is a good idea should read more security news. And we are even unable to get very basic and supposedly simple stuff right. Stuff like glibc strstr() for example.
> >
> > Regards,
> >
> > Michael Brunnbauer
> >
> > --
> > ++  Michael Brunnbauer
> > ++  netEstate GmbH
> > ++  Geisenhausener Straße 11a
> > ++  81379 München
> > ++  Tel +49 89 32 19 77 80
> > ++  Fax +49 89 32 19 77 89
> > ++  E-Mail brunni@netestate.de
> > ++  https://www.netestate.de/
> > ++
> > ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> > ++  USt-IdNr. DE221033342
> > ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> > ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>
>
>
> --
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  Geisenhausener Straße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89
> ++  E-Mail brunni@netestate.de
> ++  https://www.netestate.de/
> ++
> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> ++  USt-IdNr. DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Received on Wednesday, 28 November 2018 10:38:08 UTC