Re: Toward easier RDF: a proposal from Holger Knublauch on 2018-11-27 (semantic-web@w3.org from November 2018)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 27 Nov 2018 13:54:56 +1000
To: Graham Klyne <gk@ninebynine.org>, semantic-web@w3.org
Message-ID: <f610e520-d5b0-f895-0d5a-4d71eec27f15@topquadrant.com>
On 26/11/2018 11:28 pm, Graham Klyne wrote:
> On 22/11/2018 23:22, Holger Knublauch wrote:
>> On 22/11/2018 10:21 PM, Graham Klyne wrote:
>>
>>> On 22/11/2018 00:38, Holger Knublauch wrote:
>>>> Would you mind clarifying this statement a bit? What practical 
>>>> benefits would
>>>> the foundation on formal logic add to a future (simplified) RDF, 
>>>> for average
>>>> users? I have seen plenty of evidence that some aspects of the 
>>>> semantic
>>>> technology stack are being regarded as too academic, and that the 
>>>> role of formal
>>>> logic has been one driver of this detachment. Related topics are the
>>>> non-unique-name-assumption and the open world assumption that are 
>>>> typically
>>>> neither understood nor expected by average users.
>>>
>>> Jumping in, if I may...
>>>
>>> My view is that the formal logic underpinning of RDF serves (at 
>>> least) one
>>> important (and not-so-academic) purpose:
>>>
>>> Given two distinct RDF graphs that are taken to be descriptions of 
>>> some world
>>> or context, following the procedure of RDF graph merging guarantees 
>>> that the
>>> resulting graph is true of that world exactly when the individual 
>>> graphs are
>>> true of that world.
>>
>> Sorry, I cannot follow this explanation. What do you mean with a 
>> graph being
>> true of a world? Could you maybe give a practical example?
>
> The obvious example is being true of the real world we live in - the 
> phrasing was just intended to allow for RDF use describing fictional 
> or potential worlds.
>
> So, to take an excerpt from TimBL's FOAF profile at [1]:
>
> [[
>     <rdf:Description 
> rdf:about="http://www4.wiwiss.fu-berlin.de/booksMeshup/books/006251587X">
>         <dc:creator 
> rdf:resource="http://www.w3.org/People/Berners-Lee/card#i"/>
>         <dc:title>Weaving the Web: The Original Design and Ultimate 
> Destiny of the World Wide Web</dc:title>
>     </rdf:Description>
> ]]
> -- [1] http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf
>
> We may have some common understanding that the URIs 
> http://www4.wiwiss.fu-berlin.de/booksMeshup/books/006251587X, 
> http://www.w3.org/People/Berners-Lee/card#i, dc:creator and dc:title 
> refer to a book, a person and two RDF properties respectively, and 
> that under that understanding they describe a true statement about 
> authorship of a book with a given title.
>
> Without this understanding, maybe the URIs can be interpreted as 
> referring to, say, a declaration of intent and a megalomaniac spider, 
> etc.  The graph might be true of this world too.
>
> But we don't need to know any of this to combine the graph at from [1] 
> with a graph from another source that may use some of the same URIs.  
> As long as the interpretation of all the URIs used is consistent, and 
> both graphs make statements that are true under that interpretation, 
> then the resulting combined graph (constructed according to the rules 
> of RDF graph merging) is also true.
>
> This ability to combine independent graphs is, to my mind, is what RDF 
> semantics gives us (while remaining completely agnostic about what the 
> URIs actually denote).  It's not something that concerns us much in 
> day-to-day work with RDF, but I feel that the semantics provide an 
> important underpinning for some of the things that we wish to do with 
> RDF.

Combining data from multiple sources is indeed one of the mayor 
strengths of RDF technology, and global identifiers (URIs) are the glue 
that holds them together. My point is that these benefits do not 
necessarily require the currently specified formal RDF semantics. 
Putting sets of triples together into a new set already achieves most of 
that.

Looking at https://www.w3.org/TR/rdf11-mt/ the intention was that "All 
entailment regimes must be monotonic extensions of the simple entailment 
regime described in the document [...]", so for example the 
interpretation of blank nodes as existential placeholders is currently 
hard-coded into the very core of RDF. Furthermore, even RDF Schema gets 
special treatment in this document, with the rather unintuitive 
semantics of rdfs:range etc.

I believe this needs to be decoupled in a future version. Those who find 
these semantics useful should continue to use them, but there are plenty 
of use cases where the given semantics are not just unnecessary but also 
an obstacle (performance of inferencing, merging of blank nodes etc).

IMHO a better stacking of RDF would have a simplified RDF (maybe called 
Graph Data Framework) at its very bottom, i.e. a definition of graphs 
that consist of triples that consist of subject, predicate and object, 
and (hopefully) better support for reified statements, named graphs and 
lists. These can be formalized entirely as a data structure/API, on a 
couple of pages. Ideally use this opportunity to get rid of most of the 
XML datatypes, but that's another topic.

Then, RDFS and OWL could be grouped together into a single standard that 
offers a certain interpretation of these graphs that some applications 
may find useful. The topic there is mostly inferencing of new triples 
under open world and no UNA. In practice many people mix RDFS 
(range/domain) with OWL (imports, minCardinality) already, so why not 
just group them together into a family of dialects.

However, standards like Turtle, SPARQL and SHACL do not need formal 
logic and RDFS/OWL, and neither should they. They simply operate on the 
graphs. Turtle offers parsing, SPARQL offers querying, SHACL offers 
validation and schema definitions, e.g. for building UI input forms. 
These standards overlap with RDFS/OWL on a minimum number of very 
frequently used URIs: rdf:type, rdfs:subClassOf, rdfs:Class. This 
overlap means that almost all existing RDF-based data can be used for 
applications that follow either paradigm. RDFS/OWL tools can ignore 
sh:property definitions, and SHACL tools can ignore owl:Restrictions, 
yet the names of classes and all instance data can be shared.

Another extension built upon a simplified RDF would be the whole idea of 
Linked Data with its rules to resolve URLs etc. No need to make this a 
base feature of RDF for those who simply use local data. A good rule 
language should be another orthogonal extension.

>
>>>
>>> To my mind, this underpins the (open-world?) idea of being able to
>>> meaningfully combine RDF information from independent sources. (Without
>>> implying any guarantee of absolute truth, whatever that may be.)
>>
>> In my viewpoint, an RDF graph is primarily a data structure - a set 
>> of triples.
>> Combining RDF triples produces another data structure as the union of 
>> these
>> triples.  That's it.
>
> If that's all it is, I think it has less real value than we like to 
> think.  What makes it better than XML or JSON or any other data 
> structure with some plausible merging rules?

XML and JSON are all about tree structures. RDF defines the more 
flexible data structure of graphs. RDF introduces globally unique 
identifiers for objects, which is something that XML or JSON do not 
have. RDF offers a very simple data structure based on triples that can 
be queried consistently and from all directions. RDF comes with 
serializations (Turtle/JSON-LD) that allow for sharing and reuse of 
data, and recommended infrastructure on how to resolve URIs. Furthermore 
there is a lot of value in existing data models (often, unfortunately, 
called ontologies) which define shareable identifiers for classes and 
properties, and these data models are just RDF data too.

>
> I perceive the notion of permissionless innovation around RDF data 
> depends on being to bring data together from independent sources, 
> without requiring specific agreement by every party about what all the 
> terms mean, while providing a guarantee that intended meanings are 
> maintained.

The semantics documents do not guarantee anything. Any application can 
ignore the rules, and most applications probably do. Some applications 
may for example just present triples in a graphical browser or user 
input forms. Why would that require the formal semantics.

BTW if you are publishing data and want a reasonable amount of 
guarantees that applications interpret the data correctly, then you may 
want to explicitly link your data to rules or constraints (e.g. using 
SHACL's sh:shapesGraph property). But even then, applications can find 
your data useful without following the official formal semantics that 
you have defined.


>
>> ... BTW neither SPARQL nor Turtle nor many other RDF-based
>> standards require "open world", so this interpretation could be made 
>> entirely
>> optional.
>
> Sure, not required.  But in locally closed contexts, I find it hard to 
> see why RDF is significantly better than the other formats and tools 
> that many developers prefer to work with.  Even if much (or even most) 
> use of RDF is in locally closed contexts, I feel that what sets it 
> apart is the capability to be used dependably across contexts in which 
> the assumed knowledge does not entirely overlap.

If this were the case, then why isn't RDF a mainstream technology 
alongside JSON and XML now? Could it be that the well-intended decision 
of giving "logic" a central role in the SW stack has contributed to 
making it a "difficult" niche technology? By re-branding an RDF stack 
based on a Graph Data Framework and making the rest optional there may 
be a chance to attract more users.

Holger


>
> (Many years ago, Dan Brickley used the phrase "missing isn't broken" 
> [citation lost], which has stuck with me as a key differentiator for 
> RDF.  I perceive this idea as being underpinned by RDF's formal 
> semantics that allow it to be interpreted consistently with varying 
> levels of knowledge about the world.)
>
> #g
> -- 
>
Received on Tuesday, 27 November 2018 03:55:24 UTC