Re: RDF graph merging: How useful is it really? (was Re: Blank Nodes Re: Toward easier RDF: a proposal) from Hugh Glaser on 2018-11-28 (semantic-web@w3.org from November 2018)

From: Hugh Glaser <hugh@glasers.org>
Date: Wed, 28 Nov 2018 14:15:58 +0000
To: Martynas Jusevičius <martynas@atomgraph.com>
Cc: Michael Brunnbauer <brunni@netestate.de>, Semantic Web <semantic-web@w3.org>, Dave Raggett <dsr@w3.org>
Message-Id: <39C55148-2DE9-4F9B-B222-A9E2D6C4738D@glasers.org>
Good topic.

> On 28 Nov 2018, at 10:37, Martynas Jusevičius <martynas@atomgraph.com> wrote:
> 
> Michael,
> 
> You mention import into RDB but do not address the merge itself, so it
> becomes apples/oranges comparison.
> 
> And merge is one of the points where RDB breaks apart. Say you have 2
> or more tables, possibly coming from different sources, with
> essentially the same kind of information, but split differently by
> columns. Maybe one contains FullName and the other one
> LastName,FirstName. Because the schema is fixed and mandatory, even
> such minor differences prevent you from automatically merging the
> tables. And it gets worse with the growing number of sources. RDF does
> not have this problem at the physical level (even though identifiers
> and vocabularies might need to be aligned).
Oh, but it does.
It just looks different because “ identifiers and vocabularies might need to be aligned” :-)

Which leads to a question.
One of the first services we needed was an RDF transformation one.
That is (in the Linked Data world, cast in a RESTful style), a proxy/service that takes a Linked Data URI and a transformation specification and resolves the URI and gives you RDF according to the transformation specification.
(Of course the transformation specification is a Linked Data URI of a transformation written in RDF.)

This RDF -> RDF is hugely important for building stuff, to remove stuff, or convert into preferred ontologies.
There was a proposed standard at the time that we used, but I can’t recall what it was.
Nowadays I am more handcrafting stuff, so don’t bother doing it in a principled fashion.

If there were good tools to do this (or even one :-), or maybe there is), that integrated with what people use, would that be useful?
That would encourage a library of transformation specs, such as dc->dct, xxx->skos etc.
None of them would be perfect, of course, but if they did the job?

It might even help merging data from other formats: if you have a DB or Gsheet that could easily go into a particular RDf vocabulary, you could do that, and then lean on the service to build RDF in something else (that you might not even know well).

I have a feeling that this won’t work, because in practice no-one is doing much merging(!), and the work would just wither on the vine and rot, like so many of the tools in the W3C pages, as has been pointed out.

But I thought it was worth mentioning.


> 
> Another issue with RDB is storing semi-structured data and/or
> open-ended "custom" fields. You end up with very sparse tables with
> mostly NULLs, and I've seen some implementations with table columns
> CUSTOM_FIELD_1, ..., CUSTOM_FIELD_99 etc. This is just idiotic. Again,
> RDF does not have this problem.
> 
> You can read more on "The Problem with Relational Databases" in
> "Linked Data in the Enterprise":
> https://www.topquadrant.com/docs/whitepapers/information-enlightenment-2.0-final.pdf
> 
> Martynas
> On Wed, Nov 28, 2018 at 3:05 AM Michael Brunnbauer <brunni@netestate.de> wrote:
>> 
>> 
>> hi all,
>> 
>> mhmm... I just realized that maybe I should not call out people on visions while I spread negative visions about computer security myself :-)
>> 
>> So change of topic:
>> 
>> RDF graph merging has been named as one of the big pros of RDF. Does this stand up to scrutiny - especially for the "average 33%" use case?
>> 
>> I can easily import relational data from different sources into a RDB server - one database per source or rename tables with a prefix. Am I really so much worse off than the guy with the triple store trying to make sense of his triple soup?
>> 
>> Depends on the quality of the data sources I guess. Triples - (re)use of well known ontologies and URIs. RDB - good documentation and common keys. But RDF won't free me from some detailed inspection and cleaning. The RDB makes this easier because it has provenance information - unless I use quads in the triple store to keep provenance.
>> 
>> What about querying?
>> 
>> RDF will shine when I want to query the combined pool for some entity - like persons from source A together with persons from source B. But I still may have duplicates! In the RDB, I would probably create a new table to merge the entities and address the problem with duplicates on the way.
>> 
>> And joins? If I use quads for provenance, will my SPAQRL queries be easier than my RDB joins? I doubt it.
>> 
>> And is the potential time saved relevant for the average developer? Who will probably have to invest a lot of time anyway to make sure that the new data does not screw up his app?
>> 
>> Maybe someone will mention SHACL now or some similar stuff. But aren't most of the problems addressed by that already solved out of the box in my RDB?
>> 
>> Regards,
>> 
>> Michael Brunnbauer
>> 
>> On Wed, Nov 28, 2018 at 12:42:54AM +0100, Michael Brunnbauer wrote:
>>> 
>>> Hello Dave,
>>> 
>>> On Tue, Nov 27, 2018 at 09:31:46PM +0000, Dave Raggett wrote:
>>>> This is the basis for the Web of Things :-)
>>>> RDF as the basis for a) semantic descriptions of the kinds of things and their relationships to each other and to the context in which they reside, and b) describing software objects that applications can interact with locally independent of where the actual thing is or the means to communicate with it.
>>> 
>>> This does not sound like something that is needed right now. This sounds like a vision.
>>> 
>>> I hope we can describe what problems RDF+friends is meant to solve without resorting to visions. As one of our chancellors once said: "People with visions should go to the doctor" :-) Why? They are hard to get right. The bigger they are, the more likely it is you got it wrong. I think even the Web started with moderate ambitions - but then surprised everybody.
>>> 
>>> Besides: IMO, everybody who thinks that connecting more "things" to the Internet is a good idea should read more security news. And we are even unable to get very basic and supposedly simple stuff right. Stuff like glibc strstr() for example.
>>> 
>>> Regards,
>>> 
>>> Michael Brunnbauer
>>> 
>>> --
>>> ++  Michael Brunnbauer
>>> ++  netEstate GmbH
>>> ++  Geisenhausener Straße 11a
>>> ++  81379 München
>>> ++  Tel +49 89 32 19 77 80
>>> ++  Fax +49 89 32 19 77 89
>>> ++  E-Mail brunni@netestate.de
>>> ++  https://www.netestate.de/
>>> ++
>>> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
>>> ++  USt-IdNr. DE221033342
>>> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
>>> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>> 
>> 
>> 
>> --
>> ++  Michael Brunnbauer
>> ++  netEstate GmbH
>> ++  Geisenhausener Straße 11a
>> ++  81379 München
>> ++  Tel +49 89 32 19 77 80
>> ++  Fax +49 89 32 19 77 89
>> ++  E-Mail brunni@netestate.de
>> ++  https://www.netestate.de/
>> ++
>> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
>> ++  USt-IdNr. DE221033342
>> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
>> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>
Received on Wednesday, 28 November 2018 14:18:44 UTC