- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 27 Nov 2018 02:47:33 -0600
- To: Holger Knublauch <holger@topquadrant.com>, Graham Klyne <gk@ninebynine.org>, semantic-web@w3.org
On 11/26/18 9:54 PM, Holger Knublauch wrote:> ...> > Combining data from multiple sources is indeed one of the mayor > strengths of RDF technology, and global identifiers (URIs) are > the glue that holds them together. My point is that these > benefits do not necessarily require the currently specified > formal RDF semantics. Putting sets of triples together into a new > set already achieves most of that. But /why/ is the case that simply putting sets of triples together into a new set does in fact correctly combine data from multiple sources? Without a semantics, this question has no answer. Even if one claims it to be obvious, that kind of judgement is based on a semantic intuition (along the lines that each triple makes a separate assertion, and the graph is the conjunction of them.) Of course a semantics does not 'achieve' anything in the sense of specifying an operation, but it is a necessary connection between an uninterpreted data structure and any claim about how such a structure can be interpreted as, well, data. One might respond that this much semantics is trivial and that the RDF model theory is overkill for such triviality. I would reply along the lines of: if you can do it more simply, by all means do so. Bear in mind however that the semantics is /required/ to make sense of blank nodes, assertions such as rdfs:domain rdfs:domain rdfs:Property . and the meanings of typed literals. The semantics in https://www.w3.org/TR/rdf11-mt/ is as simplified as we could make it, and keeping it from being a lot more complicated took considerable effort. > > Looking at https://www.w3.org/TR/rdf11-mt/ the intention was that > "All entailment regimes must be monotonic extensions of the > simple entailment regime described in the document [...]", so for > example the interpretation of blank nodes as existential > placeholders is currently hard-coded into the very core of RDF. Indeed, it is. If you feel this is too restrictive, what weaker semantics would you suggest for blank nodes? It need to be formalized, but what will RDF say about blank nodes to give users some idea what they are for? > Furthermore, even RDF Schema gets special treatment in this > document, with the rather unintuitive semantics of rdfs:range etc. > > I believe this needs to be decoupled in a future version. Those > who find these semantics useful should continue to use them, but > there are plenty of use cases where the given semantics are not > just unnecessary but also an obstacle (performance of > inferencing, merging of blank nodes etc). So what would you suggest should be specified about how to treat blank nodes, in this new liberal semantics-free regime? Anything at all? Is there /any/ relationship between x:a x:p x:b . and _:x x:p x:b . ? What is it? > > IMHO a better stacking of RDF would have a simplified RDF (maybe > called Graph Data Framework) at its very bottom, i.e. a > definition of graphs that consist of triples that consist of > subject, predicate and object, and (hopefully) better support for > reified statements I will here go on record as asserting that /any/ set of conventions for creating and manipulating reified statements will reflect some semantic intuitions; and that if those intuitions are not stated explicitly and made sharply clear, then they will be semantically confused and will lead to intractable problems of misinterpretation. Reification is /very/ tricky. (The only part of the reification semantics described in the RDF specs that I will defend is that we did not make it normative.) , named graphs and lists. These can be > formalized entirely as a data structure/API, on a couple of > pages. If you think this can be done so easily, then do it. Just knock out a draft and circulate it for comments and feedback. Seriously, this is by far the most effective way to get something done. Ideally use this opportunity to get rid of most of the XML > datatypes, but that's another topic. > > Then, RDFS and OWL could be grouped together into a single > standard that offers a certain interpretation of these graphs > that some applications may find useful. Getting a group of people to agree on what a single unified logic (as it now is) should be like has not been a trivial matter. Just ask anyone who has been an active member of one of the WGs. I can attest that getting a small group of academics, none of whom had any financial stake in the result, to agree simply on a basic set of connectives for a standard first-order logic was effectively impossible. Things get worse when there are substantial investments already made in software development. For example, the failure of RDF 1.1 to give a clear semantics to RDF datasets was largely due to the fact that several major implementations had already reached commercial use, all using datasets in different ways and none of them willing to give up their investment. So datasets still have no agreed meaning, and probably never will, so they cannot be used for web-wide interchange of data. The topic there is mostly > inferencing of new triples under open world and no UNA. In > practice many people mix RDFS (range/domain) with OWL (imports, > minCardinality) already rdfs:range, rdfs:domain and rdfs:subclass are incorporated into the OWL 2 syntax and have the same meaning there as they have in RDFS. See https://www.w3.org/TR/2012/REC-owl2-quick-reference-20121211/#Class_Expressions. So to that extent, the required unification has already been achieved. It only took about 250 man-years. , so why not just group them together into > a family of dialects. > > However, standards like Turtle, SPARQL and SHACL do not need > formal logic and RDFS/OWL, and neither should they. They simply > operate on the graphs. Turtle offers parsing, SPARQL offers > querying Do you think that querying has no semantic implications? (This is a genuine question, not rhetorical. One can take either stance, but each one leads to a different answer to many issues in the design of a query language.) , SHACL offers validation and schema definitions, e.g. > for building UI input forms. These standards overlap with > RDFS/OWL on a minimum number of very frequently used URIs: > rdf:type, rdfs:subClassOf, rdfs:Class. This overlap means that > almost all existing RDF-based data can be used for applications > that follow either paradigm. RDFS/OWL tools can ignore > sh:property definitions, and SHACL tools can ignore > owl:Restrictions, yet the names of classes and all instance data > can be shared. That is certainly a happy vision, but I am unconvinced that it can be easily achieved in practice. The devils will be in the details. > > Another extension built upon a simplified RDF would be the whole > idea of Linked Data with its rules to resolve URLs etc. No need > to make this a base feature of RDF for those who simply use local > data. A good rule language should be another orthogonal extension. > >> >>>> >>>> To my mind, this underpins the (open-world?) idea of being >>>> able to >>>> meaningfully combine RDF information from independent >>>> sources. (Without >>>> implying any guarantee of absolute truth, whatever that may be.) >>> >>> In my viewpoint, an RDF graph is primarily a data structure - >>> a set of triples. >>> Combining RDF triples produces another data structure as the >>> union of these >>> triples. That's it. >> >> If that's all it is, I think it has less real value than we >> like to think. What makes it better than XML or JSON or any >> other data structure with some plausible merging rules? > > XML and JSON are all about tree structures. RDF defines the more > flexible data structure of graphs. RDF introduces globally unique > identifiers for objects, which is something that XML or JSON do > not have. RDF offers a very simple data structure based on > triples that can be queried consistently and from all directions. > RDF comes with serializations (Turtle/JSON-LD) that allow for > sharing and reuse of data, and recommended infrastructure on how > to resolve URIs. Furthermore there is a lot of value in existing > data models (often, unfortunately, called ontologies) which > define shareable identifiers for classes and properties, and > these data models are just RDF data too. > >> >> I perceive the notion of permissionless innovation around RDF >> data depends on being to bring data together from independent >> sources, without requiring specific agreement by every party >> about what all the terms mean, while providing a guarantee that >> intended meanings are maintained. > > The semantics documents do not guarantee anything. Any > application can ignore the rules, and most applications probably > do. Some applications may for example just present triples in a > graphical browser or user input forms. Why would that require the > formal semantics. Obviously it does not, but as soon as these applications do any reasoning, the semantics becomes relevant. For example, the ImageSnippets system which I helped design composes RDF and attaches it as metadata to images - no semantics needed so far, but it uses subclass reasoning in its image retrieval, returning images tagged with 'eagle' for a query for 'bird'. This is not in itself remarkable, but the almost invisible way in which reasoning fits into the workflow and implementation /is/ unique to the RDF stack, in my experience. > > BTW if you are publishing data and want a reasonable amount of > guarantees that applications interpret the data correctly, then > you may want to explicitly link your data to rules or constraints > (e.g. using SHACL's sh:shapesGraph property). But even then, > applications can find your data useful without following the > official formal semantics that you have defined. Well, they might be able to draw conclusions and perform processing which goes beyond the rather elementary semantics, perhaps making assumptions which cannot be encoded directly in RDF itself (such as various closed-world assumptions about uniqueness of naming or completeness of data) but this is not incompatible with the official semantics, and may indeed rely on it in some ways. The notion of semantic extension is designed to allow for this kind of external non-RDF-sanctioned processing of RDF data. But I do not know of any examples of such processing which /denies/or contravenes the RDF semantics. Do you? > > >> >>> ... BTW neither SPARQL nor Turtle nor many other RDF-based >>> standards require "open world", so this interpretation could >>> be made entirely >>> optional. >> >> Sure, not required. But in locally closed contexts, I find it >> hard to see why RDF is significantly better than the other >> formats and tools that many developers prefer to work with. >> Even if much (or even most) use of RDF is in locally closed >> contexts, I feel that what sets it apart is the capability to >> be used dependably across contexts in which the assumed >> knowledge does not entirely overlap. > > If this were the case, then why isn't RDF a mainstream technology > alongside JSON and XML now? Could it be that the well-intended > decision of giving "logic" a central role in the SW stack has > contributed to making it a "difficult" niche technology? By > re-branding an RDF stack based on a Graph Data Framework and > making the rest optional there may be a chance to attract more > users. Unfortunately, I tend to agree with you here. As a matter of branding, the use of words like 'logic' and 'semantics' seems to cause some people's brains to shut down, rather as 'calculus' does for large sections of the US adult population. Pat Hayes -- ----------------------------------- call or text to 850 291 0667 www.ihmc.us/groups/phayes/ www.facebook.com/the.pat.hayes
Received on Tuesday, 27 November 2018 08:48:08 UTC