Re: defining the semantics of lists from Patrick J Hayes on 2020-05-17 (semantic-web@w3.org from May 2020)

From: Patrick J Hayes <phayes@ihmc.us>
Date: Sun, 17 May 2020 12:29:20 -0500
To: thomas lörtsch <tl@rat.io>
CC: William Van Woensel <William.Van.Woensel@Dal.Ca>, Semantic Web <semantic-web@w3.org>
Message-ID: <F9D9B81B-17BB-4CC0-9614-E7F3E5388C53@ihmc.us>
> On May 17, 2020, at 4:40 AM, thomas lörtsch <tl@rat.io> wrote:
> 
> Hi all,
> 
> I’m really having trouble wrapping my head around this… 
> 
> Pat’s main argument centers around a very clear cut distinction between descriptions and constraints. But in reality that distinction is often not so clear but rather a perspective that one can choose [1].

The key point for RDF is that each triple should have a self-contained meaning, independent of other triples. Having constraints like the length of a list (described using triples) imposed by a different triple breaks this, which changes both the syntax and the semantics of RDF in fundamental ways. 

> I’ve got the feeling that I can evade the whole argument about a fundamental change to RDF by just proclaiming that the proposed list length attribute is merely a description of the intent behind a certain list. 

I would guess that putting intents into a semantics would be a horror show :-)

> E.g. I might want to publish a list of my favored colors and declare that I have exactly 3 of them. The list however contains only 2 because I can’t decide what the third one should be. That says something about my list (and my state of mind) but it certainly doesn’t break RDF.

It might, depending on how to map this doubt into an RDF graph. If the three-favorite-color observation is supposed to be a /syntactic/ constraint on a graph desscribing a proper list, then only having two listed would be a syntax error. But I presume you don’t intend that.

> But I fear that such a merely descriptive interpretation would then again amount to a very weak formal semantics, certainly weaker than that of collections.

Actually I think it would be similar to what we already have (informatively) as the intended meaning of collections.

> Insofar, yes, I see the dilemma. I wonder however how RDF could establish any semantics on such grounds. On the semantic web are at any time talking about statements that could be wrong, misleading, grossly incomplete etc. None of that breaks the fundamentals.

Being wrong/misleading is irrelevant. The issue here is not whether any of this stuff is really true, but rather how the truth of one part interacts with the meaning or truth of another part. 
> 
> Let's take a more solidly worked out list vocabulary: to simplify things I refrain from the idea to add a length attribute to existing containers. Instead I define a new type of container, rdfx:Chain, and a new property rdfx:hasLength. I subclass rdfs:Container because I strive for lists that are easy to read and write by hand [0].
> 
>  rdfx:hasLength 
>      rdfs:domain rdfx:Chain ;
>      rdfs:range <http://www.w3.org/2001/XMLSchema#int <http://www.w3.org/2001/XMLSchema#int>> .
>  rdfx:Chain
>      rdfs:subClassOf rdfs:Container .
> 
> I'd like rdfx:Chain to be defined rather tight:
> * an rdfx:Chain should have exactly as many entries as indicated by its length property. 

> * entries should be assigned through rdfs:ContainerMembershipProperty properties, starting from 1 and without skipping numbers: an rdfx:Chain of length 3 is expected to be constructed of exactly the properties :_1, :_2 and :_3.. 
> * an rdfx:Chain without a length property is mostly equivalent to an rdf:Seq but is still required to be without gaps and consecutively numbered, starting from 1. Its length may therefor be calculated from an rdfx:Chain that meets all those requirements. 
> 
> Any rdfx:Chain that breaks one of these rules is considered, ahem, problematic.

Is the chain problematic, or is the graph (mis)describing the chain problematic? In other words, is this a perfectly fine RDF description of a badly formed chain, or is it a problematic description? 

If the first, how does this differ in effect from the current advice regarding the list vocabulary? (see https://www.w3.org/TR/rdf11-mt/#rdf-collections:  "Semantic extension <https://www.w3.org/TR/rdf11-mt/#dfn-semantic-extension>s may place extra syntactic well-formedness restrictions on the use of this vocabulary in order to rule out such graphs. They may exclude interpretations of the collection vocabulary which violate the convention that the subject of a 'linked' collection of two-triple items of the form described above, ending with an item ending with rdf:nil, denotes a totally ordered sequence whose members are the denotations of the rdf:first values of the items, in the order got by tracing the rdf:rest properties from the subject to rdf:nil.”  
In other words, you can define a semantic extension to RDF with the ’sensible lists only’ constraint considered to be part of its syntax.

And if the second, problematic how? Is it syntactically illegal in some way? Should RDF engines refuse to process it and throw an error exception? Or should it be treated as semantically  inconsistent? Suppose this situation has arisen by merging two graphs that were each alone perfectly fine: should they not have been merged? But merging is a valid operation in RDF as currently defined. And so on. Sorry, a polite throat-clearing noise does not hack it. The hypothetical semantic extension just mentioned would have to deal with all those questions, of course.

> Applications must decide how to handle it. 

No, the problems run deeper than this, see above. 

> More semantics are sure possible. A somehow sensible set of rules could be that an unruly numbered rdfx:Chain is re-numbered, starting with :_1, missing members are augmented as blank nodes and surplus members cut or at least ingored.

So that would be… what? An inference rule that re-writes RDF graphs?? 

> OTOH there’s so many things that can go wrong and with so many different consequences that it seems a bit risky to standardize such fixing arrangements. 

Indeed. And this is all on the Web, note. Suppose you got this unruly stuff from a website, does your re-writing propagate back to the source? No. So why even bother, when the next http GET is going to screw it all up again?
> 
> The semantics of rdfx:Chain without any extra fixings are not much different from rdf:List except the one basic difference that an rdf:List that is broken because e.g. an element went missing is broken very obviously. The rdf:List doesn’t need any machinery that calculates if it's okay or not whereas an rdfx:Chain does need such machinery. I do however still have trouble accepting that bridging this difference would require to fundamentally alter RDF. It seems like syntactic sugar to me, albeit on the model level. Well, maybe that makes all the difference?
> 
> However: let’s say I declare ex:house to be rdfs:subClassOf ex:car and consequently a reasoner starts to add four wheels to every house in my ontology. That may be unfortunate but it doesn’t break RDF.

True, it does not. 

> How would a moderately heavy handed fixing arrangement as outlined above (say we don’t delete surplus members but simply ignore them) be any different?

Because it, unlike the house/car/wheel case, is talking about the actual RDF syntax rules themselves. Look, you can have containers in the world being described, first-class objects, and put up with the possibility of RDF triples saying rubbish about them, just like anything else. Or, you can have lists playing a central role in the /way/ that RDF describes things, like literals do at present. But you can’t have this both ways (in RDF: in some much more expressive languages you can, but its still very tricky to get it right and very prone to catastrophic errors if you don’t.)

> Does the problem stem from the fact that we are talking about RDF vocabulary, not instance data?

YES

> Pat argues that RDF is not designed to be a datastructure language, but does that mean that describing datastructures is off limits to RDF? Is it to be considered as a sort of meta modelling? Does it lead to paradoxes or intractability? 

Not /describing/ them, no. But we can do that with the current vocabulary. Remember that descriptions can always be incomplete or contradictory. 
> 
> Another aspect: Pat's reference to lists as a new node type seems to suggest that contradictions that are encapsulated in one statement are not a problem. The statement
>  ex:aChainOfLength_4 rdfx:hasLength "3"^^xsd:integer 
> would therefor not be problematic although ex:aChainOfLength_4 is clearly of length 4, not 3 as stated. This seems unfair and I’m again reduced to bickering, but not really understanding.

Well, the specification document for the (proposed extended) language would have to say what to do with that. I think it should be internally inconsistent, myself, but I would prefer to not use hasLength at all, but rather say that some entry was the last one. 

> BTW: I’m not convinced by that whole approach of a new node type for lists.

You are not alone. The WG rejected this idea. Twice. 

> I'd like lists to be integrated into RDF first class because they are such an important and ubiquitous datastructure.

Seems to me that this is a very tight form of integration. But YMMV, of course. 

> Describing lists in RDF is certainly not the most efficient way to implement and use them but I prefer a tight integration to an encapsulated one. I’d like to be able to spin graphs that relate and annotate items in lists and lists of lists (tables). That can more naturally be done when lists are expressed as graphs.

“Spin” here suggests to me that you are thinking about a programming language for manipulating and building graphs. BUt RDF is not such a language, and I do not think it would be a good idea to try to make it into one. 

Pat

> However I’ve never seen the approach with a new node type fleshed out in considerable detail. Maybe it has some advantages that I'm not aware of.
> 
> Best,
> Thomas
> 
> 
> [0] Collections took all the syntactic sugar in N3, Turtle and JSON-LD but that's another problem that for now shall be ignored.
> 
> [1] To elaborate: a statement can be considered a description, an axiom, a constraint - that often depends on the point of view taken and is rather an operational aspect. There can be value in adding a length property to a list for the purpose of indicating when the author thinks that the list is complete. I might want to publish a list with my favorite colors and express that I have exactly three of them. I would expect the open world to have little say in that matter but to happily receive my contribution. The list however contains only 2 items because I’m still undecided about the third color. 
> The list consumer then has a situation and will have to find a way to deal with it. However it’s often better to be made aware that there might be a problem. In such a case the added length attribute is a feature as it describes a discrepance between what I said and what I intended to say. Only in a further step may an axiom derive that the third color is as undefined as my state of mind or may a constraint reject the list as incomplete. 
> 
> 
> 
> 
>> On 15. May 2020, at 06:07, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>> 
>> Hi Thomas
>> 
>> Let me explain why the semantics of the RDF containers is the way it is. Several members of the RDF WG were surprised by this, but it kind of follows inevitably from other, more basic, design decisions of RDF. 
>> 
>> First, RDF is NOT designed to be a datastructure language: it is a descriptive language. It describes things. The semantics is entirely set up with this basic design decision in mind. And second, it describes things under an open-world assumption. That point (open-world vs. closed-world) was always controversial, but it was thrashed out very early in the design process and became a fundamental design choice, on the grounds that a Web-based description language can never assume that all the relevant data is known about some topic. So this means that given any piece of RDF, you can cut out some piece of it, or adjoin some more RDF to it, without anything breaking. 
>> 
>> So now, how could rdfx:ClosedSeq work? Presumably it would come along with a bunch of assertions about the first, second, third etc. elements of the seq, and maybe a way of saying that the one of them is the last item, so we might need rdfx:LastItemIn. Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B ?  If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally. 
>> 
>> Now, another way to go would be to say that RDF needs containers, but it doesn’t need to describe them using triples. We could just allow a new kind of construct as a node in a triple (in addition to IRIs, Bnodes and literals) and give it its own definition. We would have to invent new syntax to represent them, of course, which would break all known RDF engines, but maybe it would be worth it (?) Then sequences (etc) would be much more like conventional datastructures. Of course, the semantics would have to say something about these things, but not much. (For example, we might require that IRIs inside sequences denote the same thing as they do outside, basic things like that.) This would not break the open world assumption. 
>> 
>> Anyway, I hope this helps people think about what the issues are :-)
>> 
>> Best wishes
>> 
>> Pat
>> 
>> 
>>> On May 14, 2020, at 8:18 AM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>> 
>>> I’m aware that the topic of lists in RDF can ingnite lively debate nearly as much as blank nodes so my apologies in advance. I have a very specific question and I don’t intend to discuss the use of lists in OWL, syntactic sugar in Turtle, querying in SPARQL or historic details about how some decisions came to be (although I do find all that very interesting, but another time...).
>>> 
>>> Lists from the RDF container vocabulary - rdf:Seq, rdf:Bag and rdf:Alt - can’t be closed whereas lists from the collection vocabulary - rdf:List - are always closed. Collections, in constrast to containers, are popularly considered to "have semantics" because of this closing characteristic. 
>>> The excruciatingly exact Lisp-style modelling of rdf:Lists through rdf:first/rest/nil properties does indeed leave no room for misunderstanding about the listiness and closedness of an rdf:List.
>>> What level of semantics could be provided for containers by explicitly defining either a closed container, e.g. rdfx:ClosedSeq, or an appropriate property, e.g. rdfx:hasLength, that implicitly closes a container? 
>>> 
>>> I reckon that semantics introduced per definition are always somewhat weaker than semantics that emenate naturally and unmistakably from a datastructure itself. However collections have a lot of disadvantages (that I promised above not to discuss) and I wonder how workable the semantics provided by defining an rdfx:ClosedSeq class or an rdfx:hasLength property would be. Would they be able to take some load, none at all, not enough, or almost the same as collections?
>>> 
>>> Thomas
>> 
> 
> 
>> On 17. May 2020, at 05:53, Patrick J Hayes <phayes@ihmc.us> wrote:
>> 
>> 
>> 
>>> On May 16, 2020, at 11:01 AM, William Van Woensel <William.Van.Woensel@Dal.Ca> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> Some minor thoughts on this issue:
>>> 
>>> Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B?
>>> 
>>> Not sure whether this was meant to present a dichotomy between collections or containers – but it is same for the RDFS collections, no? 
>> 
>> Yes, it is. And RDF is obliged to accept the open-world interpretation in such cases. 
>> 
>>> If the second item in the linked list would be missing, it's even worse since the rest of the list would simply be "lost"; or, the same resource could have two different "first" or "rest" items, possibly leading us to conclude they are equivalent (In fact, the latter example is given in the RDF 1.1 semantics document to illustrate the total lack of semantics for collections; which is the real underlying issue here I suppose)
>> 
>> Exactly.
>> 
>>> 
>>> If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally.
>>> 
>>> Perhaps I again misunderstand, but surely a semantic extension with extra assumptions for datastructures (regarding entailment, or consistency) would not break RDF?
>> 
>> Well, we certainly could have a semantic extension which imposes extra syntactic conditions of its own (just as OWL-RDF does) but then that would not be RDF. But yes, certainly such an extension - call it RDF-C, maybe -
> 
> For a moment I thought you’d introduce contexts to define surfaces on which lists can be closed. But you wouldn’t go that far, would you? Or would you? Well, it would certainly introduce some inflationary demand in context identifiers.
> 
>> could be defined and might be useful. My next question would be, what did you want it to mean? That is, what semantic conditions would you want to put on such closed collections and statements made about them?
>> 
>> Pat
>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> William
>>> 
>>> -----Original Message-----
>>> From: Patrick J Hayes <phayes@ihmc.us> 
>>> Sent: May-15-20 1:07 AM
>>> To: thomas lörtsch <tl@rat.io>
>>> Cc: Semantic Web <semantic-web@w3.org>
>>> Subject: Re: defining the semantics of lists
>>> 
>>> CAUTION: The Sender of this email is not from within Dalhousie.
>>> 
>>> Hi Thomas
>>> 
>>> Let me explain why the semantics of the RDF containers is the way it is. Several members of the RDF WG were surprised by this, but it kind of follows inevitably from other, more basic, design decisions of RDF.
>>> 
>>> First, RDF is NOT designed to be a datastructure language: it is a descriptive language. It describes things. The semantics is entirely set up with this basic design decision in mind. And second, it describes things under an open-world assumption. That point (open-world vs. closed-world) was always controversial, but it was thrashed out very early in the design process and became a fundamental design choice, on the grounds that a Web-based description language can never assume that all the relevant data is known about some topic. So this means that given any piece of RDF, you can cut out some piece of it, or adjoin some more RDF to it, without anything breaking.
>>> 
>>> So now, how could rdfx:ClosedSeq work? Presumably it would come along with a bunch of assertions about the first, second, third etc. elements of the seq, and maybe a way of saying that the one of them is the last item, so we might need rdfx:LastItemIn. Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B ?  If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally.
>>> 
>>> Now, another way to go would be to say that RDF needs containers, but it doesn’t need to describe them using triples. We could just allow a new kind of construct as a node in a triple (in addition to IRIs, Bnodes and literals) and give it its own definition. We would have to invent new syntax to represent them, of course, which would break all known RDF engines, but maybe it would be worth it (?) Then sequences (etc) would be much more like conventional datastructures. Of course, the semantics would have to say something about these things, but not much. (For example, we might require that IRIs inside sequences denote the same thing as they do outside, basic things like that.) This would not break the open world assumption.
>>> 
>>> Anyway, I hope this helps people think about what the issues are :-)
>>> 
>>> Best wishes
>>> 
>>> Pat
>>> 
>>> 
>>>> On May 14, 2020, at 8:18 AM, thomas lörtsch <tl@rat.io> wrote:
>>>> 
>>>> I’m aware that the topic of lists in RDF can ingnite lively debate nearly as much as blank nodes so my apologies in advance. I have a very specific question and I don’t intend to discuss the use of lists in OWL, syntactic sugar in Turtle, querying in SPARQL or historic details about how some decisions came to be (although I do find all that very interesting, but another time...).
>>>> 
>>>> Lists from the RDF container vocabulary - rdf:Seq, rdf:Bag and rdf:Alt - can’t be closed whereas lists from the collection vocabulary - rdf:List - are always closed. Collections, in constrast to containers, are popularly considered to "have semantics" because of this closing characteristic.
>>>> The excruciatingly exact Lisp-style modelling of rdf:Lists through rdf:first/rest/nil properties does indeed leave no room for misunderstanding about the listiness and closedness of an rdf:List.
>>>> What level of semantics could be provided for containers by explicitly defining either a closed container, e.g. rdfx:ClosedSeq, or an appropriate property, e.g. rdfx:hasLength, that implicitly closes a container?
>>>> 
>>>> I reckon that semantics introduced per definition are always somewhat weaker than semantics that emenate naturally and unmistakably from a datastructure itself. However collections have a lot of disadvantages (that I promised above not to discuss) and I wonder how workable the semantics provided by defining an rdfx:ClosedSeq class or an rdfx:hasLength property would be. Would they be able to take some load, none at all, not enough, or almost the same as collections?
>>>> 
>>>> Thomas
Received on Sunday, 17 May 2020 17:29:42 UTC