Re: defining the semantics of lists from Patrick J Hayes on 2020-05-18 (semantic-web@w3.org from May 2020)

From: Patrick J Hayes <phayes@ihmc.us>
Date: Mon, 18 May 2020 11:43:33 -0500
To: Cory Casanave <cory-c@modeldriven.com>
CC: thomas lörtsch <tl@rat.io>, Semantic Web <semantic-web@w3.org>
Message-ID: <630A7464-EB5F-4492-BC1B-4458311B50D8@ihmc.us>
Hi Cory

> On May 18, 2020, at 11:26 AM, Cory Casanave <cory-c@modeldriven.com> wrote:
> 
> Pat,
> Your summary provides valuable and important context, but it seems to be missing an important aspect with respect to "It describes things". There are two aspects of "open-world", the one you site "never assume that all the relevant data is known about some topic" and the existence criteria for topics (things). For some things, their inclusion in some known set is their differentia.
> 
> For example, consider a N.Y. Stock exchange listed stock. The set of these stocks at any point in time is well known - you can't just "infer" one, it has to be listed - a very specific process and set of requirements. There is some information about that listed stock that is also curated by the exchange (it is a class, not just a predicate). So the inclusion in a closed set is part of the semantics of a listed stock. You can query the list of stocks, one is either listed or not. However, it is fine to say that anyone can say anything about such a stock.
> 
> Another example; A company with an LEI (https://www.gleif.org/en/) - same situation, it is a managed set. One of the facts managed by GLEIF is company parentage, as reported under strict legal guidelines. A derivative fact is then the "ultimate parent" of any GLEIF listed company (as their parents have to listed as well). Under a strict "open-world assumption" we can't determine the ultimate parent as we can never know if there may be ae another "out in the wild" - yet we do know. 
> 
> We could go on with examples; contracted customers of an organization, etc. There are also cases where we may choose to treat some set as closed for a particular reasoning task. Most examples seem to be based on social constructs.  Without closed sets there are needed inferences that can't be stated, the ultimate parent being just one example. This seems like more of an "open-world mandate" than assumption, as the assumption can't be changed. 

Closed world rather than closed set, but yes I agree. There are lots of examples of ‘closed’ collections of data out there. But RDF had to adopt an open-world stance towards data taken as a whole, rather than a closed-world stance towards the data that it happens to have at any given moment - the ‘current graph', so to speak. And in any case, the key signal of closed-world reasonnig is negation as failure: if you can't prove something, assume it is false (because in a closed world, if it were true you would be able to prove it). And nothing about ‘falsity’ can be expressed in RDF, since it does not have negation, of any kind. SPARQL can of course report that a query has failed relative to some graph, so one can use negation-by-failure reasoning with RDF data, just not express the result itself in RDF. 
 
> 
> We have an existing capability for managing sets of facts - the graph. The graph is already a "closed" set of facts. What we can't express is that a graph may be complete for a specific set of types or a specific set of predicates about a specific set of types. With that capability we could express a closed set - something that is REAL in our world. 

Yes, that ability – to say explicitly, in the data, that a certain set of data is complete wrt some kinds of information – would enable closed worlds to be reasoned about in an open-world reasoning framework. It is not easy to see how to do this, however. I have thought about this on and off for about a decade or more, and have not come up with a workable general way to do it. 
> 
> A "list" is then just an ordering of the things in a closed graph.

Nope, that does not work. Just listing the things is not enough, you also need a way to say what kinds of facts are being ‘closed’. For example, a list of employees might be complete in the sense that it lists them all, but not in the sense that it says everything that can be said about them. And order is irrelevant (or at any rate is a different topic.)

Pat

> There are a few ways to express order, such as a value ordering by some predicate. The needed capability for closed sets and some ordering criteria would seem to satisfy the need for a semantics of lists.
> 
> -Cory Casanave
> 
>> -----Original Message-----
>> From: Patrick J Hayes <phayes@ihmc.us>
>> Sent: Friday, May 15, 2020 12:07 AM
>> To: thomas lörtsch <tl@rat.io>
>> Cc: Semantic Web <semantic-web@w3.org>
>> Subject: Re: defining the semantics of lists
>> 
>> Hi Thomas
>> 
>> Let me explain why the semantics of the RDF containers is the way it is.
>> Several members of the RDF WG were surprised by this, but it kind of
>> follows inevitably from other, more basic, design decisions of RDF.
>> 
>> First, RDF is NOT designed to be a datastructure language: it is a descriptive
>> language. It describes things. The semantics is entirely set up with this basic
>> design decision in mind. And second, it describes things under an open-
>> world assumption. That point (open-world vs. closed-world) was always
>> controversial, but it was thrashed out very early in the design process and
>> became a fundamental design choice, on the grounds that a Web-based
>> description language can never assume that all the relevant data is known
>> about some topic. So this means that given any piece of RDF, you can cut
>> out some piece of it, or adjoin some more RDF to it, without anything
>> breaking.
>> 
>> So now, how could rdfx:ClosedSeq work? Presumably it would come along
>> with a bunch of assertions about the first, second, third etc. elements of the
>> seq, and maybe a way of saying that the one of them is the last item, so we
>> might need rdfx:LastItemIn. Suppose however that we simply don’t have a
>> triple that specifies the second element. Is this an error? Or just an
>> incomplete description? If the latter, what if we omit the LastItemIn triple;
>> then we don't know how long this seq is. Is that also an incomplete
>> description (as the open-world assumption requires) or is it an error? What
>> happens if we are told that A is the second item and also that B is the
>> second item? Is that an error, or can we conclude that owl:sameAs A B ?  If
>> we take the open-world choice in these cases then this is hardly
>> distinguishable from that we have already. But if we say that incomplete or
>> ‘excessive’ information is an error then we don’t really have an RDF graph,
>> since the extra constraints amount to a fundamental change to the idea of
>> graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global
>> constraints on what must be present or what is allowed to be present. This
>> is of course possible, but it would change RDF fundamentally.
>> 
>> Now, another way to go would be to say that RDF needs containers, but it
>> doesn’t need to describe them using triples. We could just allow a new kind
>> of construct as a node in a triple (in addition to IRIs, Bnodes and literals)
>> and give it its own definition. We would have to invent new syntax to
>> represent them, of course, which would break all known RDF engines, but
>> maybe it would be worth it (?) Then sequences (etc) would be much more
>> like conventional datastructures. Of course, the semantics would have to
>> say something about these things, but not much. (For example, we might
>> require that IRIs inside sequences denote the same thing as they do outside,
>> basic things like that.) This would not break the open world assumption.
>> 
>> Anyway, I hope this helps people think about what the issues are :-)
>> 
>> Best wishes
>> 
>> Pat
>> 
>> 
>>> On May 14, 2020, at 8:18 AM, thomas lörtsch <tl@rat.io> wrote:
>>> 
>>> I’m aware that the topic of lists in RDF can ingnite lively debate nearly as
>> much as blank nodes so my apologies in advance. I have a very specific
>> question and I don’t intend to discuss the use of lists in OWL, syntactic
>> sugar in Turtle, querying in SPARQL or historic details about how some
>> decisions came to be (although I do find all that very interesting, but
>> another time...).
>>> 
>>> Lists from the RDF container vocabulary - rdf:Seq, rdf:Bag and rdf:Alt -
>> can’t be closed whereas lists from the collection vocabulary - rdf:List - are
>> always closed. Collections, in constrast to containers, are popularly
>> considered to "have semantics" because of this closing characteristic.
>>> The excruciatingly exact Lisp-style modelling of rdf:Lists through
>> rdf:first/rest/nil properties does indeed leave no room for
>> misunderstanding about the listiness and closedness of an rdf:List.
>>> What level of semantics could be provided for containers by explicitly
>> defining either a closed container, e.g. rdfx:ClosedSeq, or an appropriate
>> property, e.g. rdfx:hasLength, that implicitly closes a container?
>>> 
>>> I reckon that semantics introduced per definition are always somewhat
>> weaker than semantics that emenate naturally and unmistakably from a
>> datastructure itself. However collections have a lot of disadvantages (that I
>> promised above not to discuss) and I wonder how workable the semantics
>> provided by defining an rdfx:ClosedSeq class or an rdfx:hasLength property
>> would be. Would they be able to take some load, none at all, not enough, or
>> almost the same as collections?
>>> 
>>> Thomas
>> 
>
Received on Monday, 18 May 2020 16:43:54 UTC