W3C home > Mailing lists > Public > semantic-web@w3.org > November 2018

Re: Toward easier RDF: a proposal

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 27 Nov 2018 02:47:33 -0600
To: Holger Knublauch <holger@topquadrant.com>, Graham Klyne <gk@ninebynine.org>, semantic-web@w3.org
Message-ID: <8118485e-55bb-931b-b36c-651a4f1676c0@ihmc.us>
On 11/26/18 9:54 PM, Holger Knublauch wrote:> ...>
 > Combining data from multiple sources is indeed one of the mayor
 > strengths of RDF technology, and global identifiers (URIs) are
 > the glue that holds them together. My point is that these
 > benefits do not necessarily require the currently specified
 > formal RDF semantics. Putting sets of triples together into a new
 > set already achieves most of that.

But /why/ is the case that simply putting sets of triples 
together into a new set does in fact correctly combine data from 
multiple sources? Without a semantics, this question has no 
answer. Even if one claims it to be obvious, that kind of 
judgement is based on a semantic intuition (along the lines that 
each triple makes a separate assertion, and the graph is the 
conjunction of them.) Of course a semantics does not 'achieve' 
anything in the sense of specifying an operation, but it is a 
necessary connection between an uninterpreted data structure and 
any claim about how such a structure can be interpreted as, well, 
data.

One might respond that this much semantics is trivial and that 
the RDF model theory is overkill for such triviality. I would 
reply along the lines of: if you can do it more simply, by all 
means do so. Bear in mind however that the semantics is 
/required/ to make sense of blank nodes, assertions such as

rdfs:domain rdfs:domain rdfs:Property .

and the meanings of typed literals. The semantics in 
https://www.w3.org/TR/rdf11-mt/ is as simplified as we could make 
it, and keeping it from being a lot more complicated took 
considerable effort.

 >
 > Looking at https://www.w3.org/TR/rdf11-mt/ the intention was that
 > "All entailment regimes must be monotonic extensions of the
 > simple entailment regime described in the document [...]", so for
 > example the interpretation of blank nodes as existential
 > placeholders is currently hard-coded into the very core of RDF.

Indeed, it is. If you feel this is too restrictive, what weaker 
semantics would you suggest for blank nodes? It need to be 
formalized, but what will RDF say about blank nodes to give users 
some idea what they are for?

 > Furthermore, even RDF Schema gets special treatment in this
 > document, with the rather unintuitive semantics of rdfs:range etc.
 >
 > I believe this needs to be decoupled in a future version. Those
 > who find these semantics useful should continue to use them, but
 > there are plenty of use cases where the given semantics are not
 > just unnecessary but also an obstacle (performance of
 > inferencing, merging of blank nodes etc).

So what would you suggest should be specified about how to treat 
blank nodes, in this new liberal semantics-free regime? Anything 
at all? Is there /any/ relationship between

x:a x:p x:b .

and

_:x x:p x:b .

? What is it?
 >
 > IMHO a better stacking of RDF would have a simplified RDF (maybe
 > called Graph Data Framework) at its very bottom, i.e. a
 > definition of graphs that consist of triples that consist of
 > subject, predicate and object, and (hopefully) better support for
 > reified statements

I will here go on record as asserting that /any/ set of 
conventions for creating and manipulating reified statements will 
reflect some semantic intuitions; and that if those intuitions 
are not stated explicitly and made sharply clear, then they will 
be semantically confused and will lead to intractable problems of 
misinterpretation. Reification is /very/ tricky. (The only part 
of the reification semantics described in the RDF specs that I 
will defend is that we did not make it normative.)

, named graphs and lists. These can be
 > formalized entirely as a data structure/API, on a couple of
 > pages.

If you think this can be done so easily, then do it. Just knock 
out a draft and circulate it for comments and feedback. 
Seriously, this is by far the most effective way to get something 
done.

Ideally use this opportunity to get rid of most of the XML
 > datatypes, but that's another topic.
 >
 > Then, RDFS and OWL could be grouped together into a single
 > standard that offers a certain interpretation of these graphs
 > that some applications may find useful.

Getting a group of people to agree on what a single unified logic 
(as it now is) should be like has not been a trivial matter. Just 
ask anyone who has been an active member of one of the WGs. I can 
attest that getting a small group of academics, none of whom had 
any financial stake in the result, to agree simply on a basic set 
of connectives for a standard first-order logic was effectively 
impossible. Things get worse when there are substantial 
investments already made in software development. For example, 
the failure of RDF 1.1 to give a clear semantics to RDF datasets 
was largely due to the fact that several major implementations 
had already reached commercial use, all using datasets in 
different ways and none of them willing to give up their 
investment. So datasets still have no agreed meaning, and 
probably never will, so they cannot be used for web-wide 
interchange of data.

The topic there is mostly
 > inferencing of new triples under open world and no UNA. In
 > practice many people mix RDFS (range/domain) with OWL (imports,
 > minCardinality) already

rdfs:range, rdfs:domain and rdfs:subclass are incorporated into 
the OWL 2 syntax and have the same meaning there as they have in 
RDFS. See 
https://www.w3.org/TR/2012/REC-owl2-quick-reference-20121211/#Class_Expressions. 
So to that extent, the required unification has already been 
achieved. It only took about 250 man-years.

, so why not just group them together into
 > a family of dialects.
 >
 > However, standards like Turtle, SPARQL and SHACL do not need
 > formal logic and RDFS/OWL, and neither should they. They simply
 > operate on the graphs. Turtle offers parsing, SPARQL offers
 > querying

Do you think that querying has no semantic implications? (This is 
a genuine question, not rhetorical. One can take either stance, 
but each one leads to a different answer to many issues in the 
design of a query language.)

, SHACL offers validation and schema definitions, e.g.
 > for building UI input forms. These standards overlap with
 > RDFS/OWL on a minimum number of very frequently used URIs:
 > rdf:type, rdfs:subClassOf, rdfs:Class. This overlap means that
 > almost all existing RDF-based data can be used for applications
 > that follow either paradigm. RDFS/OWL tools can ignore
 > sh:property definitions, and SHACL tools can ignore
 > owl:Restrictions, yet the names of classes and all instance data
 > can be shared.

That is certainly a happy vision, but I am unconvinced that it 
can be easily achieved in practice. The devils will be in the 
details.

 >
 > Another extension built upon a simplified RDF would be the whole
 > idea of Linked Data with its rules to resolve URLs etc. No need
 > to make this a base feature of RDF for those who simply use local
 > data. A good rule language should be another orthogonal extension.
 >
 >>
 >>>>
 >>>> To my mind, this underpins the (open-world?) idea of being
 >>>> able to
 >>>> meaningfully combine RDF information from independent
 >>>> sources. (Without
 >>>> implying any guarantee of absolute truth, whatever that may 
be.)
 >>>
 >>> In my viewpoint, an RDF graph is primarily a data structure -
 >>> a set of triples.
 >>> Combining RDF triples produces another data structure as the
 >>> union of these
 >>> triples.  That's it.
 >>
 >> If that's all it is, I think it has less real value than we
 >> like to think.  What makes it better than XML or JSON or any
 >> other data structure with some plausible merging rules?
 >
 > XML and JSON are all about tree structures. RDF defines the more
 > flexible data structure of graphs. RDF introduces globally unique
 > identifiers for objects, which is something that XML or JSON do
 > not have. RDF offers a very simple data structure based on
 > triples that can be queried consistently and from all directions.
 > RDF comes with serializations (Turtle/JSON-LD) that allow for
 > sharing and reuse of data, and recommended infrastructure on how
 > to resolve URIs. Furthermore there is a lot of value in existing
 > data models (often, unfortunately, called ontologies) which
 > define shareable identifiers for classes and properties, and
 > these data models are just RDF data too.
 >
 >>
 >> I perceive the notion of permissionless innovation around RDF
 >> data depends on being to bring data together from independent
 >> sources, without requiring specific agreement by every party
 >> about what all the terms mean, while providing a guarantee that
 >> intended meanings are maintained.
 >
 > The semantics documents do not guarantee anything. Any
 > application can ignore the rules, and most applications probably
 > do. Some applications may for example just present triples in a
 > graphical browser or user input forms. Why would that require the
 > formal semantics.

Obviously it does not, but as soon as these applications do any 
reasoning, the semantics becomes relevant. For example, the 
ImageSnippets system which I helped design composes RDF and 
attaches it as metadata to images - no semantics needed so far, 
but it uses subclass reasoning in its image retrieval, returning 
images tagged with 'eagle' for a query for 'bird'. This is not in 
itself remarkable, but the almost invisible way in which 
reasoning fits into the workflow and implementation /is/ unique 
to the RDF stack, in my experience.

 >
 > BTW if you are publishing data and want a reasonable amount of
 > guarantees that applications interpret the data correctly, then
 > you may want to explicitly link your data to rules or constraints
 > (e.g. using SHACL's sh:shapesGraph property). But even then,
 > applications can find your data useful without following the
 > official formal semantics that you have defined.

Well, they might be able to draw conclusions and perform 
processing which goes beyond the rather elementary semantics, 
perhaps making assumptions which cannot be encoded directly in 
RDF itself (such as various closed-world assumptions about 
uniqueness of naming or completeness of data) but this is not 
incompatible with the official semantics, and may indeed rely on 
it in some ways. The notion of semantic extension is designed to 
allow for this kind of external non-RDF-sanctioned processing of 
RDF data. But I do not know of any examples of such processing 
which /denies/or contravenes the RDF semantics. Do you?

 >
 >
 >>
 >>> ... BTW neither SPARQL nor Turtle nor many other RDF-based
 >>> standards require "open world", so this interpretation could
 >>> be made entirely
 >>> optional.
 >>
 >> Sure, not required.  But in locally closed contexts, I find it
 >> hard to see why RDF is significantly better than the other
 >> formats and tools that many developers prefer to work with.
 >> Even if much (or even most) use of RDF is in locally closed
 >> contexts, I feel that what sets it apart is the capability to
 >> be used dependably across contexts in which the assumed
 >> knowledge does not entirely overlap.
 >
 > If this were the case, then why isn't RDF a mainstream technology
 > alongside JSON and XML now? Could it be that the well-intended
 > decision of giving "logic" a central role in the SW stack has
 > contributed to making it a "difficult" niche technology? By
 > re-branding an RDF stack based on a Graph Data Framework and
 > making the rest optional there may be a chance to attract more
 > users.

Unfortunately, I tend to agree with you here. As a matter of 
branding, the use of words like 'logic' and 'semantics' seems to 
cause some people's brains to shut down, rather as 'calculus' 
does for large sections of the US adult population.

Pat Hayes


-- 
-----------------------------------
call or text to 850 291 0667
www.ihmc.us/groups/phayes/
www.facebook.com/the.pat.hayes
Received on Tuesday, 27 November 2018 08:48:08 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:57 UTC