Re: Toward easier RDF: a proposal from thomas lörtsch on 2018-11-27 (semantic-web@w3.org from November 2018)

From: thomas lörtsch <tl@rat.io>
Date: Wed, 28 Nov 2018 00:00:46 +0100
To: Holger Knublauch <holger@topquadrant.com>
Cc: Graham Klyne <gk@ninebynine.org>, semantic-web@w3.org
Message-Id: <92D56AEE-7905-4F6E-B3A7-8CF4D637F0CB@rat.io>
I admit that my gut reaction is that the open world design is what makes the semantic web unique and what might eventually hopefully facilitate some decentralized, deeply interconnected, majestically glowing... Therefor I’m tempted to ask back why you don’t just use SQL or MongoDB and some web framework for whatever closed world application you’re building. 
OTOH I agree that formal semantics - and logic in general - are often so unwieldy and so underpowered compared to what humans *just do* by intuition that it’s very tempting to get rid of them by closing the world and *just get it done*, *you know what I mean* style.

So, say we could fix some notoriously intimidating semantics/logic to sensible defaults like e.g. close the world, make identification default to indication/labeling, tame blank nodes, canonicalize all the things, keep out most of OWL etc - would the resulting closed world applications still be able to talk each other in a meaningful way? Could they still participate in an open semantic web? What would it take to keep that possibility open - just a semweb head going over the finished code once, adjusting some levers, making some implicit semantics explicit? Then it might indeed be worth it. But if we just end up with even more closed world applications and the only gain is that they use a triple store instead of a NoSQL store, and RDF instead of JSON, then I wouldn’t be very enthusiastic.

Thomas

> On 27. Nov 2018, at 04:54, Holger Knublauch <holger@topquadrant.com> wrote:
> 
> 
> On 26/11/2018 11:28 pm, Graham Klyne wrote:
>> On 22/11/2018 23:22, Holger Knublauch wrote:
>>> On 22/11/2018 10:21 PM, Graham Klyne wrote:
>>> 
>>>> On 22/11/2018 00:38, Holger Knublauch wrote:
>>>>> Would you mind clarifying this statement a bit? What practical benefits would
>>>>> the foundation on formal logic add to a future (simplified) RDF, for average
>>>>> users? I have seen plenty of evidence that some aspects of the semantic
>>>>> technology stack are being regarded as too academic, and that the role of formal
>>>>> logic has been one driver of this detachment. Related topics are the
>>>>> non-unique-name-assumption and the open world assumption that are typically
>>>>> neither understood nor expected by average users.
>>>> 
>>>> Jumping in, if I may...
>>>> 
>>>> My view is that the formal logic underpinning of RDF serves (at least) one
>>>> important (and not-so-academic) purpose:
>>>> 
>>>> Given two distinct RDF graphs that are taken to be descriptions of some world
>>>> or context, following the procedure of RDF graph merging guarantees that the
>>>> resulting graph is true of that world exactly when the individual graphs are
>>>> true of that world.
>>> 
>>> Sorry, I cannot follow this explanation. What do you mean with a graph being
>>> true of a world? Could you maybe give a practical example?
>> 
>> The obvious example is being true of the real world we live in - the phrasing was just intended to allow for RDF use describing fictional or potential worlds.
>> 
>> So, to take an excerpt from TimBL's FOAF profile at [1]:
>> 
>> [[
>>     <rdf:Description rdf:about="http://www4.wiwiss.fu-berlin.de/booksMeshup/books/006251587X">
>>         <dc:creator rdf:resource="http://www.w3.org/People/Berners-Lee/card#i"/>
>>         <dc:title>Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web</dc:title>
>>     </rdf:Description>
>> ]]
>> -- [1] http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf
>> 
>> We may have some common understanding that the URIs http://www4.wiwiss.fu-berlin.de/booksMeshup/books/006251587X, http://www.w3.org/People/Berners-Lee/card#i, dc:creator and dc:title refer to a book, a person and two RDF properties respectively, and that under that understanding they describe a true statement about authorship of a book with a given title.
>> 
>> Without this understanding, maybe the URIs can be interpreted as referring to, say, a declaration of intent and a megalomaniac spider, etc.  The graph might be true of this world too.
>> 
>> But we don't need to know any of this to combine the graph at from [1] with a graph from another source that may use some of the same URIs.  As long as the interpretation of all the URIs used is consistent, and both graphs make statements that are true under that interpretation, then the resulting combined graph (constructed according to the rules of RDF graph merging) is also true.
>> 
>> This ability to combine independent graphs is, to my mind, is what RDF semantics gives us (while remaining completely agnostic about what the URIs actually denote).  It's not something that concerns us much in day-to-day work with RDF, but I feel that the semantics provide an important underpinning for some of the things that we wish to do with RDF.
> 
> Combining data from multiple sources is indeed one of the mayor strengths of RDF technology, and global identifiers (URIs) are the glue that holds them together. My point is that these benefits do not necessarily require the currently specified formal RDF semantics. Putting sets of triples together into a new set already achieves most of that.
> 
> Looking at https://www.w3.org/TR/rdf11-mt/ the intention was that "All entailment regimes must be monotonic extensions of the simple entailment regime described in the document [...]", so for example the interpretation of blank nodes as existential placeholders is currently hard-coded into the very core of RDF. Furthermore, even RDF Schema gets special treatment in this document, with the rather unintuitive semantics of rdfs:range etc.
> 
> I believe this needs to be decoupled in a future version. Those who find these semantics useful should continue to use them, but there are plenty of use cases where the given semantics are not just unnecessary but also an obstacle (performance of inferencing, merging of blank nodes etc).
> 
> IMHO a better stacking of RDF would have a simplified RDF (maybe called Graph Data Framework) at its very bottom, i.e. a definition of graphs that consist of triples that consist of subject, predicate and object, and (hopefully) better support for reified statements, named graphs and lists. These can be formalized entirely as a data structure/API, on a couple of pages. Ideally use this opportunity to get rid of most of the XML datatypes, but that's another topic.
> 
> Then, RDFS and OWL could be grouped together into a single standard that offers a certain interpretation of these graphs that some applications may find useful. The topic there is mostly inferencing of new triples under open world and no UNA. In practice many people mix RDFS (range/domain) with OWL (imports, minCardinality) already, so why not just group them together into a family of dialects.
> 
> However, standards like Turtle, SPARQL and SHACL do not need formal logic and RDFS/OWL, and neither should they. They simply operate on the graphs. Turtle offers parsing, SPARQL offers querying, SHACL offers validation and schema definitions, e.g. for building UI input forms. These standards overlap with RDFS/OWL on a minimum number of very frequently used URIs: rdf:type, rdfs:subClassOf, rdfs:Class. This overlap means that almost all existing RDF-based data can be used for applications that follow either paradigm. RDFS/OWL tools can ignore sh:property definitions, and SHACL tools can ignore owl:Restrictions, yet the names of classes and all instance data can be shared.
> 
> Another extension built upon a simplified RDF would be the whole idea of Linked Data with its rules to resolve URLs etc. No need to make this a base feature of RDF for those who simply use local data. A good rule language should be another orthogonal extension.
> 
>> 
>>>> 
>>>> To my mind, this underpins the (open-world?) idea of being able to
>>>> meaningfully combine RDF information from independent sources. (Without
>>>> implying any guarantee of absolute truth, whatever that may be.)
>>> 
>>> In my viewpoint, an RDF graph is primarily a data structure - a set of triples.
>>> Combining RDF triples produces another data structure as the union of these
>>> triples.  That's it.
>> 
>> If that's all it is, I think it has less real value than we like to think.  What makes it better than XML or JSON or any other data structure with some plausible merging rules?
> 
> XML and JSON are all about tree structures. RDF defines the more flexible data structure of graphs. RDF introduces globally unique identifiers for objects, which is something that XML or JSON do not have. RDF offers a very simple data structure based on triples that can be queried consistently and from all directions. RDF comes with serializations (Turtle/JSON-LD) that allow for sharing and reuse of data, and recommended infrastructure on how to resolve URIs. Furthermore there is a lot of value in existing data models (often, unfortunately, called ontologies) which define shareable identifiers for classes and properties, and these data models are just RDF data too.
> 
>> 
>> I perceive the notion of permissionless innovation around RDF data depends on being to bring data together from independent sources, without requiring specific agreement by every party about what all the terms mean, while providing a guarantee that intended meanings are maintained.
> 
> The semantics documents do not guarantee anything. Any application can ignore the rules, and most applications probably do. Some applications may for example just present triples in a graphical browser or user input forms. Why would that require the formal semantics.
> 
> BTW if you are publishing data and want a reasonable amount of guarantees that applications interpret the data correctly, then you may want to explicitly link your data to rules or constraints (e.g. using SHACL's sh:shapesGraph property). But even then, applications can find your data useful without following the official formal semantics that you have defined.
> 
> 
>> 
>>> ... BTW neither SPARQL nor Turtle nor many other RDF-based
>>> standards require "open world", so this interpretation could be made entirely
>>> optional.
>> 
>> Sure, not required.  But in locally closed contexts, I find it hard to see why RDF is significantly better than the other formats and tools that many developers prefer to work with.  Even if much (or even most) use of RDF is in locally closed contexts, I feel that what sets it apart is the capability to be used dependably across contexts in which the assumed knowledge does not entirely overlap.
> 
> If this were the case, then why isn't RDF a mainstream technology alongside JSON and XML now? Could it be that the well-intended decision of giving "logic" a central role in the SW stack has contributed to making it a "difficult" niche technology? By re-branding an RDF stack based on a Graph Data Framework and making the rest optional there may be a chance to attract more users.
> 
> Holger
> 
> 
>> 
>> (Many years ago, Dan Brickley used the phrase "missing isn't broken" [citation lost], which has stuck with me as a key differentiator for RDF.  I perceive this idea as being underpinned by RDF's formal semantics that allow it to be interpreted consistently with varying levels of knowledge about the world.)
>> 
>> #g
>> -- 
>> 
>
Received on Tuesday, 27 November 2018 23:01:22 UTC