Re: Trip Reports on Dagstuhl Seminar on Knowledge Graphs from thomas lörtsch on 2019-08-28 (semantic-web@w3.org from August 2019)

From: thomas lörtsch <tl@rat.io>
Date: Wed, 28 Aug 2019 15:22:29 +0200
To: Axel Polleres <axel.polleres@wu.ac.at>
Cc: William Waites <wwaites@tardis.ed.ac.uk>, Semantic Web <semantic-web@w3.org>
Message-Id: <CA756B5F-2DA9-48F4-98F3-1E100456029F@rat.io>
> On 28. Aug 2019, at 13:33, Axel Polleres <axel.polleres@wu.ac.at> wrote:
> 
>> There *are* fundamental problems with RDF. The main one is that it is
>> impossible to coherently make statements about statements.
>> [...]
>> but those areas remain out of reach 
>> for SW/LD/KG so long as the underlying RDF doesn't change to allow
>> it.
> 
> I beg to disagree: there are several proposals on reification for RDF, there is RDF*, so things are moving on in exactly this direction.

There is a fundamental gap in RDF reification semantics: it distinguishes abstract statement types and concrete statement tokens/occurrences but it provides no definition of what exactly constitutes concreteness or how to address a statement token. I think it is a gross oversight that this problem is still unresolved. 

RDF* in its current state is just syntactic sugar and remains silent about that crucial point. With other proposals I guess you refer to Named Graphs, Singleton Properties and Fluents. Singleton Properties have other problems but I still have to investigate how far their semantics actually reach. Fluents and Singleton Properties are both not really attribution technologies that would allow to attribute an existing statement without changing it. Named Graphs would work if we could specify that some graph name is actually a name. I don’t know why the RDF 1.1 WG couldn’t add a property to that effect - the proposal was on the table. Sure anybody can invent such a property and the fact that there isn’t one in widespread use might be a hint that this problem is not important. Somehow I just can’t convince myself of that.

Pointing out this issue has nothing to do with throwing out the baby with the bathtub. This is an ommission that reflects the underlying architectural assumptions of RDF which IMHO are just too simplistic. I’m sure not everybody agrees with me here but I’m also sure that everybody agrees with the 80/20 principle. IMO RDF lacks the 20% part with respect to reification/attribution/identification. That is a real problem and it will sooner or later let the whole aparatus get stale if it isn’t resolved. Then there is no baby anymore. Because if the semantics aren’t complete anyway, why not go with Property Graphs? A lot of people answered that question already.

> There are approaches working in practice, such that the one taken in Wikidata which have been 
> shown to be workable within an RDF/SPARQL context, cf. Wikidata's query service.

Markus Krötzsch investigated WikiData in depth and his work is a very good description of the mess that statement attribution in RDF actually is if you want/need it to have sound formal semantics. 

> I.e., there is evolution and development, and that's a good thing, but this is an ongoing process and there is no sense in throwing away 
> all that has been done on RDF and SW and start from scratch... that was also the base message I wanted to convey in my talk slides, BTW.
> 
>> And so, we are stuck. We can fix it, or we can keep inventing new
>> names.
> 
> IMHO, it's not about inventing names, it's about recognizing gaps and closing them, abotu not throwing out the baby with the bathtub and re-inventing the wheel, about combining 
> and evolving successful approaches... whether terminology/naming evolves over time as well is secondary.

The RDF spec refers to out of band means to specify reification semantics and that is of course always possible. Interoperability however is hard to achieve without some standard and we are not talking about isolated applications, we are talking about the semantic WEB. Some standard wouldn’t be that hard to achieve as it only has to extend, not change the RDF spec. Standardizing the natural default - the immediate context of a statement in the form of the containing RDF snippet, Named Graph (labeled or named in this respect doesn’t matter), document etc - should be straightforward. So: yes, evolution, but please not in the form of another underspecified hackish pragmatism like the RDF 1.1 Named Graphs or RDF* in its current form.

> aqnyway, I kinda hope/suppose we're (readers of this list, at least) on the same page here anyway

I think we are, and we don’t differ so much on what is preferable but what is possible. I do see a path to solid attribution/reification semantics - actually even to solid identification semantics - with defaults that blend well into current practices. I see however little interest or even a sense of urgency in this community to tackle that task. And that, I might add, for decades.

Cheers,
Thomas


> best regards,
> Axel 
> --
> Prof. Dr. Axel Polleres
> Institute for Information Business, WU Vienna
> url: http://www.polleres.net/  twitter: @AxelPolleres
> 
>> On 28.08.2019, at 12:36, William Waites <wwaites@tardis.ed.ac.uk> wrote:
>> 
>>> Well, link works for me, but in fact that was the pre-print version
>>> of the report, the official link on the Dagstuhl page:
>> 
>> Thanks for the link, Axel. I have no idea why it didn't work for me
>> earlier, but it does now. I've read (quickly skimmed, really) the 
>> canonical version of the report. My tuppence worth follows.
>> 
>> I'm a little puzzled about the Knowledge Graph. Is it a marketing term?
>> The question is only a little facetious: quite a few of the reports are
>> struggling to define what it is. We know that graphs are general structures
>> for representing a variety of different things, that's a very old idea and
>> it's very powerful (think objects and arrows). We know that going from
>> discrete entities (e.g. labellings) to continuous ones is hard and unobvious
>> so we get divisions in fields between graphs and rules on the one hand and
>> statistics and neural networks on the other. Plenty of potentially productive
>> open problems and questions lie that way.
>> 
>> I think what Paola might be getting at is the way that we have continually
>> invented new words for whatever it is we are doing here. There's the 
>> Semantic Web, there's Linked Data, now there's the Knowledge Graph. Each
>> with a slightly different focus perhaps as you point out in your presentation,
>> but with little substantive change. That's what I mean by marketing terms
>> (easily recognised by the proper noun casing).
>> 
>> There *are* fundamental problems with RDF. The main one is that it is
>> impossible to coherently make statements about statements. Without that,
>> we can't build hierarchies of statements and things like time and provenance
>> and the like (mentioned in the report) can't be done. These are important
>> and fascinating areas to research, but those areas remain out of reach 
>> for SW/LD/KG so long as the underlying RDF doesn't change to allow
>> it. And so, we are stuck. We can fix it, or we can keep inventing new
>> names.
>> 
>> Best wishes,
>> 
>> William Waites | wwaites@inf.ed.ac.uk
>> Institute for Language, Cognition and Computation
>> School of Informatics, University of Edinburgh
>> 
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>> 
>> 
> 
>
Received on Wednesday, 28 August 2019 13:22:57 UTC