RE: defining the semantics of lists from William Van Woensel on 2020-05-17 (semantic-web@w3.org from May 2020)

From: William Van Woensel <William.Van.Woensel@Dal.Ca>
Date: Sun, 17 May 2020 13:17:51 +0000
To: thomas lörtsch <tl@rat.io>, Patrick J Hayes <phayes@ihmc.us>
CC: Semantic Web <semantic-web@w3.org>
Message-ID: <YTOPR0101MB1530579F6B3B26A148732C43D4BB0@YTOPR0101MB1530.CANPRD01.PROD.OUTLOOK.>
Hi Thomas, Pat,



Regarding Pat's comment re possible semantic conditions, I think Thomas' message has some good examples :-)



I will point out that, in Notation 3, lists / collections are closer to first-class citizens than they are syntactic sugar. The W3C Team Submission<https://www.w3.org/TeamSubmission/n3/#lists> specifies some extra axioms to deal with N3 lists, and mentions they could be implemented as data-types (in that case the first-rest ladder could be regarded as a reification). In fact, these axioms cover something similar to what Pat mentioned – i.e., if the same resource has multiple “first” arcs from it then those nodes must be equivalent.



We are working on an N3 semantics that extends the RDF semantics, and here, N3 lists are considered an element in their own right (in addition to cited graphs and quantifiers, for instance). As opposed to static properties (such as rdfx:length), a set of built-ins have been defined for lists, such as member, append, .. (e.g., see Eye<http://eulersharp.sourceforge.net/2003/03swap/eye-builtins.html>, Cwm<https://www.w3.org/2000/10/swap/doc/CwmBuiltins>). We have a built-in wish list going here<https://docs.google.com/document/d/1ByEWSADIvebHRrBda0HhdKhHhp444mYM7BAapXVIJoM/edit#heading=h.gjdgxs> (feel free to contribute!)





Regards,



William



-----Original Message-----

From: thomas lörtsch <tl@rat.io>

Sent: May-17-20 6:41 AM

To: Patrick J Hayes <phayes@ihmc.us>

Cc: William Van Woensel <William.Van.Woensel@Dal.Ca>; Semantic Web <semantic-web@w3.org>

Subject: Re: defining the semantics of lists



CAUTION: The Sender of this email is not from within Dalhousie.



Hi all,



I’m really having trouble wrapping my head around this…



Pat’s main argument centers around a very clear cut distinction between descriptions and constraints. But in reality that distinction is often not so clear but rather a perspective that one can choose [1]. I’ve got the feeling that I can evade the whole argument about a fundamental change to RDF by just proclaiming that the proposed list length attribute is merely a description of the intent behind a certain list.

E.g. I might want to publish a list of my favored colors and declare that I have exactly 3 of them. The list however contains only 2 because I can’t decide what the third one should be. That says something about my list (and my state of mind) but it certainly doesn’t break RDF.

But I fear that such a merely descriptive interpretation would then again amount to a very weak formal semantics, certainly weaker than that of collections.

Insofar, yes, I see the dilemma. I wonder however how RDF could establish any semantics on such grounds. On the semantic web are at any time talking about statements that could be wrong, misleading, grossly incomplete etc. None of that breaks the fundamentals.



Let's take a more solidly worked out list vocabulary: to simplify things I refrain from the idea to add a length attribute to existing containers. Instead I define a new type of container, rdfx:Chain, and a new property rdfx:hasLength. I subclass rdfs:Container because I strive for lists that are easy to read and write by hand [0].



        rdfx:hasLength

            rdfs:domain rdfx:Chain ;

            rdfs:range <http://www.w3.org/2001/XMLSchema#int> .

        rdfx:Chain

            rdfs:subClassOf rdfs:Container .



I'd like rdfx:Chain to be defined rather tight:

* an rdfx:Chain should have exactly as many entries as indicated by its length property.

* entries should be assigned through rdfs:ContainerMembershipProperty properties, starting from 1 and without skipping numbers: an rdfx:Chain of length 3 is expected to be constructed of exactly the properties :_1, :_2 and :_3..

* an rdfx:Chain without a length property is mostly equivalent to an rdf:Seq but is still required to be without gaps and consecutively numbered, starting from 1. Its length may therefor be calculated from an rdfx:Chain that meets all those requirements.



Any rdfx:Chain that breaks one of these rules is considered, ahem, problematic. Applications must decide how to handle it.

More semantics are sure possible. A somehow sensible set of rules could be that an unruly numbered rdfx:Chain is re-numbered, starting with :_1, missing members are augmented as blank nodes and surplus members cut or at least ingored. OTOH there’s so many things that can go wrong and with so many different consequences that it seems a bit risky to standardize such fixing arrangements.



The semantics of rdfx:Chain without any extra fixings are not much different from rdf:List except the one basic difference that an rdf:List that is broken because e.g. an element went missing is broken very obviously. The rdf:List doesn’t need any machinery that calculates if it's okay or not whereas an rdfx:Chain does need such machinery. I do however still have trouble accepting that bridging this difference would require to fundamentally alter RDF. It seems like syntactic sugar to me, albeit on the model level. Well, maybe that makes all the difference?



However: let’s say I declare ex:house to be rdfs:subClassOf ex:car and consequently a reasoner starts to add four wheels to every house in my ontology. That may be unfortunate but it doesn’t break RDF. How would a moderately heavy handed fixing arrangement as outlined above (say we don’t delete surplus members but simply ignore them) be any different? Does the problem stem from the fact that we are talking about RDF vocabulary, not instance data?

Pat argues that RDF is not designed to be a datastructure language, but does that mean that describing datastructures is off limits to RDF? Is it to be considered as a sort of meta modelling? Does it lead to paradoxes or intractability?



Another aspect: Pat's reference to lists as a new node type seems to suggest that contradictions that are encapsulated in one statement are not a problem. The statement

        ex:aChainOfLength_4 rdfx:hasLength "3"^^xsd:integer would therefor not be problematic although ex:aChainOfLength_4 is clearly of length 4, not 3 as stated. This seems unfair and I’m again reduced to bickering, but not really understanding.

BTW: I’m not convinced by that whole approach of a new node type for lists. I'd like lists to be integrated into RDF first class because they are such an important and ubiquitous datastructure. Describing lists in RDF is certainly not the most efficient way to implement and use them but I prefer a tight integration to an encapsulated one. I’d like to be able to spin graphs that relate and annotate items in lists and lists of lists (tables). That can more naturally be done when lists are expressed as graphs. However I’ve never seen the approach with a new node type fleshed out in considerable detail. Maybe it has some advantages that I'm not aware of.



Best,

Thomas





[0] Collections took all the syntactic sugar in N3, Turtle and JSON-LD but that's another problem that for now shall be ignored.



[1] To elaborate: a statement can be considered a description, an axiom, a constraint - that often depends on the point of view taken and is rather an operational aspect. There can be value in adding a length property to a list for the purpose of indicating when the author thinks that the list is complete. I might want to publish a list with my favorite colors and express that I have exactly three of them. I would expect the open world to have little say in that matter but to happily receive my contribution. The list however contains only 2 items because I’m still undecided about the third color.

The list consumer then has a situation and will have to find a way to deal with it. However it’s often better to be made aware that there might be a problem. In such a case the added length attribute is a feature as it describes a discrepance between what I said and what I intended to say. Only in a further step may an axiom derive that the third color is as undefined as my state of mind or may a constraint reject the list as incomplete.









> On 15. May 2020, at 06:07, Patrick J Hayes <phayes@ihmc.us> wrote:

>

> Hi Thomas

>

> Let me explain why the semantics of the RDF containers is the way it is. Several members of the RDF WG were surprised by this, but it kind of follows inevitably from other, more basic, design decisions of RDF.

>

> First, RDF is NOT designed to be a datastructure language: it is a descriptive language. It describes things. The semantics is entirely set up with this basic design decision in mind. And second, it describes things under an open-world assumption. That point (open-world vs. closed-world) was always controversial, but it was thrashed out very early in the design process and became a fundamental design choice, on the grounds that a Web-based description language can never assume that all the relevant data is known about some topic. So this means that given any piece of RDF, you can cut out some piece of it, or adjoin some more RDF to it, without anything breaking.

>

> So now, how could rdfx:ClosedSeq work? Presumably it would come along with a bunch of assertions about the first, second, third etc. elements of the seq, and maybe a way of saying that the one of them is the last item, so we might need rdfx:LastItemIn. Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B ?  If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally.

>

> Now, another way to go would be to say that RDF needs containers, but it doesn’t need to describe them using triples. We could just allow a new kind of construct as a node in a triple (in addition to IRIs, Bnodes and literals) and give it its own definition. We would have to invent new syntax to represent them, of course, which would break all known RDF engines, but maybe it would be worth it (?) Then sequences (etc) would be much more like conventional datastructures. Of course, the semantics would have to say something about these things, but not much. (For example, we might require that IRIs inside sequences denote the same thing as they do outside, basic things like that.) This would not break the open world assumption.

>

> Anyway, I hope this helps people think about what the issues are :-)

>

> Best wishes

>

> Pat

>

>

>> On May 14, 2020, at 8:18 AM, thomas lörtsch <tl@rat.io> wrote:

>>

>> I’m aware that the topic of lists in RDF can ingnite lively debate nearly as much as blank nodes so my apologies in advance. I have a very specific question and I don’t intend to discuss the use of lists in OWL, syntactic sugar in Turtle, querying in SPARQL or historic details about how some decisions came to be (although I do find all that very interesting, but another time...).

>>

>> Lists from the RDF container vocabulary - rdf:Seq, rdf:Bag and rdf:Alt - can’t be closed whereas lists from the collection vocabulary - rdf:List - are always closed. Collections, in constrast to containers, are popularly considered to "have semantics" because of this closing characteristic.

>> The excruciatingly exact Lisp-style modelling of rdf:Lists through rdf:first/rest/nil properties does indeed leave no room for misunderstanding about the listiness and closedness of an rdf:List.

>> What level of semantics could be provided for containers by explicitly defining either a closed container, e.g. rdfx:ClosedSeq, or an appropriate property, e.g. rdfx:hasLength, that implicitly closes a container?

>>

>> I reckon that semantics introduced per definition are always somewhat weaker than semantics that emenate naturally and unmistakably from a datastructure itself. However collections have a lot of disadvantages (that I promised above not to discuss) and I wonder how workable the semantics provided by defining an rdfx:ClosedSeq class or an rdfx:hasLength property would be. Would they be able to take some load, none at all, not enough, or almost the same as collections?

>>

>> Thomas

>





> On 17. May 2020, at 05:53, Patrick J Hayes <phayes@ihmc.us> wrote:

>

>

>

>> On May 16, 2020, at 11:01 AM, William Van Woensel <William.Van.Woensel@Dal.Ca> wrote:

>>

>> Hi everyone,

>>

>> Some minor thoughts on this issue:

>>

>> Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B?

>>

>> Not sure whether this was meant to present a dichotomy between collections or containers – but it is same for the RDFS collections, no?

>

> Yes, it is. And RDF is obliged to accept the open-world interpretation in such cases.

>

>> If the second item in the linked list would be missing, it's even

>> worse since the rest of the list would simply be "lost"; or, the same

>> resource could have two different "first" or "rest" items, possibly

>> leading us to conclude they are equivalent (In fact, the latter

>> example is given in the RDF 1.1 semantics document to illustrate the

>> total lack of semantics for collections; which is the real underlying

>> issue here I suppose)

>

> Exactly.

>

>>

>> If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally.

>>

>> Perhaps I again misunderstand, but surely a semantic extension with extra assumptions for datastructures (regarding entailment, or consistency) would not break RDF?

>

> Well, we certainly could have a semantic extension which imposes extra

> syntactic conditions of its own (just as OWL-RDF does) but then that

> would not be RDF. But yes, certainly such an extension - call it

> RDF-C, maybe -



For a moment I thought you’d introduce contexts to define surfaces on which lists can be closed. But you wouldn’t go that far, would you? Or would you? Well, it would certainly introduce some inflationary demand in context identifiers.



> could be defined and might be useful. My next question would be, what did you want it to mean? That is, what semantic conditions would you want to put on such closed collections and statements made about them?

>

> Pat

>

>>

>>

>> Regards,

>>

>> William

>>

>> -----Original Message-----

>> From: Patrick J Hayes <phayes@ihmc.us>

>> Sent: May-15-20 1:07 AM

>> To: thomas lörtsch <tl@rat.io>

>> Cc: Semantic Web <semantic-web@w3.org>

>> Subject: Re: defining the semantics of lists

>>

>> CAUTION: The Sender of this email is not from within Dalhousie.

>>

>> Hi Thomas

>>

>> Let me explain why the semantics of the RDF containers is the way it is. Several members of the RDF WG were surprised by this, but it kind of follows inevitably from other, more basic, design decisions of RDF.

>>

>> First, RDF is NOT designed to be a datastructure language: it is a descriptive language. It describes things. The semantics is entirely set up with this basic design decision in mind. And second, it describes things under an open-world assumption. That point (open-world vs. closed-world) was always controversial, but it was thrashed out very early in the design process and became a fundamental design choice, on the grounds that a Web-based description language can never assume that all the relevant data is known about some topic. So this means that given any piece of RDF, you can cut out some piece of it, or adjoin some more RDF to it, without anything breaking.

>>

>> So now, how could rdfx:ClosedSeq work? Presumably it would come along with a bunch of assertions about the first, second, third etc. elements of the seq, and maybe a way of saying that the one of them is the last item, so we might need rdfx:LastItemIn. Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B ?  If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally.

>>

>> Now, another way to go would be to say that RDF needs containers, but it doesn’t need to describe them using triples. We could just allow a new kind of construct as a node in a triple (in addition to IRIs, Bnodes and literals) and give it its own definition. We would have to invent new syntax to represent them, of course, which would break all known RDF engines, but maybe it would be worth it (?) Then sequences (etc) would be much more like conventional datastructures. Of course, the semantics would have to say something about these things, but not much. (For example, we might require that IRIs inside sequences denote the same thing as they do outside, basic things like that.) This would not break the open world assumption.

>>

>> Anyway, I hope this helps people think about what the issues are :-)

>>

>> Best wishes

>>

>> Pat

>>

>>

>> > On May 14, 2020, at 8:18 AM, thomas lörtsch <tl@rat.io> wrote:

>> >

>> > I’m aware that the topic of lists in RDF can ingnite lively debate nearly as much as blank nodes so my apologies in advance. I have a very specific question and I don’t intend to discuss the use of lists in OWL, syntactic sugar in Turtle, querying in SPARQL or historic details about how some decisions came to be (although I do find all that very interesting, but another time...).

>> >

>> > Lists from the RDF container vocabulary - rdf:Seq, rdf:Bag and rdf:Alt - can’t be closed whereas lists from the collection vocabulary - rdf:List - are always closed. Collections, in constrast to containers, are popularly considered to "have semantics" because of this closing characteristic.

>> > The excruciatingly exact Lisp-style modelling of rdf:Lists through rdf:first/rest/nil properties does indeed leave no room for misunderstanding about the listiness and closedness of an rdf:List.

>> > What level of semantics could be provided for containers by explicitly defining either a closed container, e.g. rdfx:ClosedSeq, or an appropriate property, e.g. rdfx:hasLength, that implicitly closes a container?

>> >

>> > I reckon that semantics introduced per definition are always somewhat weaker than semantics that emenate naturally and unmistakably from a datastructure itself. However collections have a lot of disadvantages (that I promised above not to discuss) and I wonder how workable the semantics provided by defining an rdfx:ClosedSeq class or an rdfx:hasLength property would be. Would they be able to take some load, none at all, not enough, or almost the same as collections?

>> >

>> > Thomas
Received on Sunday, 17 May 2020 13:18:08 UTC