Re: defining the semantics of lists from Ivan Herman on 2020-05-30 (semantic-web@w3.org from May 2020)

From: Ivan Herman <ivan@w3.org>
Date: Sat, 30 May 2020 12:00:36 +0200
To: Pat Hayes <phayes@ihmc.us>
Cc: thomas lörtsch <tl@rat.io>, Semantic Web <semantic-web@w3.org>
Message-Id: <FFC295D1-2827-499E-81FC-2CBF3EA6940A@w3.org>
> On 29 May 2020, at 20:59, Patrick J Hayes <phayes@ihmc.us> wrote:
> 
> 
> 

<snip>

>  And there may be other places where it will not fit into a widely used system or notation (JSON-LD for example?). You should probably check this before pushing this idea too strongly or devoting effort to implementing anything. 

Right. Both JSON-LD and Turtle have syntactic shorthands for RDF Lists, but non for containers. That also means that, whilst the triple level syntax for Lists is indeed a pain, that pain completely disappears when one uses Turtle or JSON-LD. Which makes me wonder whether this whole issue of Lists vs. Collections is still relevant for RDF users (I mean those that create and maintain datasets which is, after all, what it is all about), who may very well ignore the details just as most of us deliberately ignores how a programming constructs looks like in assembly…

Ivan


> 
> Pat
> 
> other responses in-line below.
> 
>> 
>> 
>> Some more questions below.
>> 
>>> On 26. May 2020, at 18:59, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>> 
>>> 
>>> 
>>>> On May 24, 2020, at 10:29 AM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>>> 
>>>>> 
>>>>> On 19. May 2020, at 03:53, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>>>> 
>>>>> Quick response to what is (for me) the central issue:
>>>>> 
>>>>>> On May 18, 2020, at 5:28 PM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>>>>> 
>>>>>> I’ve inserted comments on comments all over the place, but for better understanding let me try to provide a summary first:
>>>>>> 
>>>>>> 
>>>>>> My aim is to provide rdf:Containers with about the same (or a tiny bit more) semantics than rdf:Collections. If a simple rdfx:hasLength property and a few informative remarks can do that I would be all set and quite happy.
>>>>>> 
>>>>>> I’m striving for lists expressed in triples, not for lists as a new node type, because this way I see more potential for making assertions about lists and their entries and it feels more natural anyway. I want not only lists but the items in them to be part of the graph.
>>>>>> But I want to get rid of rdf:Collections as they are unbearable to write or even read  manually, without syntactic sugar. And syntactic sugar, although formidable in Turtle et al., is not available everywhere. Ultimate goal: let OWL DL have rdf:List and be done with it.
>>>>>> 
>>>>>> The most unobtrusive semantics that I can think of is that an rdfx:hasLength property describes a certain list, period.
>>>>> 
>>>>> Sure, we do that already. Mathspeak is ‘ordered sequence’ rather than ‘list’, but whatever. 
>>>>> 
>>>>>> If the list actually contains more or less items than stated the rdfx:hasLength statement is wrong.
>>>>> 
>>>>> You mean, false? Sure, that is trivial. (Well, perhaps not, in that it puts arithmetic into the semantics, but that is a logician's worry. so lets leave it aside.) 
>>>>> 
>>>>> This doesn’t (by itself) support much inference, however. Also, it doesn’t make any graph “illegal”. 
>>>>> 
>>>>>> The actual list always wins: it is what it is, not what another statement says.
>>>>> 
>>>>> All we ever get is statements about anything, including lists. You can never examine the contents of the semantics ‘directly’. So this is meaningless. 
>>>>> 
>>>>>> But additionally "You have been warned!”.
>>>>> 
>>>>> I don't know what this means.
>>>> 
>>>> This is meant to mean that through this property an application outside of RDF is given a chance to spot and deal with a disturbance in the open world of RDF that RDF itself cannot express in formally sound terms.
>>>> 
>>>>> What you /will/ have with this semantics is a new way for a graph to be inconsistent (assuming that we also specify a sharp semantics for the :_n membership properties, and that for example a list of length n cannot have something in a position > n). So for example
>>>>> 
>>>>> _:L rdfx:hasLength “3”^^xsd:integer .
>>>>> _:L  rdf:_4 _:x .
>>>>> 
>>>>> would be inconsistent. I am not sure why you think this would be useful, but whatever. 
>>>> 
>>>> Being able to express closedness was deemed useful enough to justify the introduction of rdf:Lists. I want the same for rdfs:Containers because they have a saner syntax. Also that way Collection semantics would be available in OWL DL outside of its axiomatization, which should be a good thing too.
>>>> 
>>>>> By the way, this would have some other interesting consequences. For example, just saying that a list has a length:
>>>>> 
>>>>> _:L rdfx:hasLength “3”^^xsd:integer .
>>>>> 
>>>>> would entail that all its slots up to that length had /something/ in them:
>>>>> 
>>>>> _:L  rdf:_1 _:x1 .
>>>>> _:L  rdf:_2 _:x2 .
>>>>> _:L  rdf:_3 _:x3 .
>>>>> 
>>>>> And if you know that something is at a place in a list
>>>>> 
>>>>> _:L  rdf:_17 _:x1 .
>>>>> 
>>>>> then that entails that the list must be at least that long:
>>>>> 
>>>>> _:L rdfx:hasLength _:length .
>>>>> _:length rdfx:greaterThan “16”^^xsd:integer .
>>>>> 
>>>>> (where I have invented a suitable arithmetic RDF property to express the inequality.)
>>>> 
>>>>>> This semantics could be extended a little, demanding that the list not only has said length but also no empty slots
>>>>> 
>>>>> If you mean, slots whose value is not specified in a graph? then no, you cannot do that. That would be a syntactic constraint on RDF graphs. 
>>>> 
>>>> Here I was meaning exactly what you did a few lines above: empty slots are not forbidden but may get filled with some default value.
>>> 
>>> But that isn't what I did. Using a blank node to say that something exists in a slot is not saying the slot is empty: quite the contrary, in fact. And I didnt say anything about a default value. Defaults would be nonmonotonic and would break the RDF logic completely. 
>> 
>> 
>> 
>>>>>> and is numbered consecutively, starting with :_1.
>>>>> 
>>>>> How could RDF ever say it had some other order? 
>>>> 
>>>> Well, not necessarily some other order but rather no order at all. Given that the Containers semantics are so very sparing I thought it might be sensible to state the obvious. The real reason: I was just trying to not get picked at by Pat again ;-)
>>> 
>>> Hey, as the author of the semantics, it’s my job to be pedantic. Thats what formal semantics are for :-)
>> 
>> Given my disturbing displays of ignorance I’m quite glad that you still care to harrass me with your pedantism.
>> 
>>>>> But yes, with the caveat mentioned, this would be an easy change to RDF and would not seriously break anything. 
>>>>> 
>>>>> But notice how quickly putting numbers into the semantics imposes condiitons on the language. In order to express this semantics adequately we need to have ways of talking about numbers and their relationships in the language. If you once swallow arithmetic, you gotta have ways of talking arithmetic. While this does not break RDF, it does take it to a whole new level of complexity. 
>>>> 
>>>> These last two paragraphs leave me confused: it is an easy change but it needs an arithmetic extension that takes RDF to a whole new level of complexity? That seems contradictory. And that makes me quite unsure on how to proceed +->
>>> 
>>> All I meant is that hasLength introduces integers into the semantics of RDF. You might take integers (and basic arithmetic) as being so obvious that it is hardly worth mentioning, but for a logician this is a big step, because of Goedel’s theorem. Very few logics of any kind come with arithmetic built into the semantic equations. It can be done, of course, but for me it sets off alarm bells, in that it might have all kinds of downstream consequences that we have not thought of yet. The first sign of trouble is there already, in that we need a built-in ‘lessthan’ relation. What else do we need? I have no idea, but I’m sure it will be a lot. I like your rdfx:last suggestion, below, a lot better. 
>> 
>> Okay, then let’s go with that.
>> 
>>>> I do now understand how the OWA prohibts any explicit closing of a list in RDF, how RDF is all about _describing_ things, how only single triples can be a bearer of truth, how RDF terms themselves are not to be messed with and how the whole endeavour of formal semantics under an OWA is walking a very thin line between what may be inferred and what cannot be ruled out. Maybe. [0]
>>>> However I also lost practically all faith in the formal semantics of Collections and Containers alike. If not even the simplest syntactic constraints - only one head, no branching - can be enforced then why bother at all with the semantics of a length attribute? 
>>>> 
>>>> Why even consider an arithmetic extension? Not withstanding its usefulness in other contexts I’m not convinced that some arithmetic extension can ground the semantics of an rdfx:hasLength property when the rdf:Container it describes has so little formal standing to build on.
>>>> 
>>>> One could make rdfx:hasLength an owl:AnnotationProperty so its semantics would definitely be reduced to handwaving, providing a hint to applications if some list probably is complete. Closing a list was deemed useful before but it was implemented with a verbose syntax and in OWL DL it's off limits for users. Lists are so important in practice that IMO that’s reason enough to introduce something along those lines, even with _very_ limited formal semantics.
>>> 
>>> Indeed. And this is exactly why we introduced them in the way we did in current RDF. The /formal/ semantics of lists (actually of descriptions of lists) is minimal, but we point out that you can define a semantic extension which imposes stronger and more, um, rational conditions on list descriptions (no pathological branches, no gaps, etc.)  and indeed OWL did do this.
>> 
>> Since you mention it I tried to look up those conditions in the OWL 2 specs but, no surprise, after an hour or so I gave up. Do you know where to look? 
> 
> It isn't very clear, I have to admit but it is implicit in the text here: 
> https://www.w3.org/TR/2012/REC-owl2-mapping-to-rdf-20121211/#Mapping_from_RDF_Graphs_to_the_Structural_Specification <https://www.w3.org/TR/2012/REC-owl2-mapping-to-rdf-20121211/#Mapping_from_RDF_Graphs_to_the_Structural_Specification>
> 
>> 
>>> You could also do something analogous for other datastructures such as tables.
>> 
>> 
>> 
>>>> I was also pondering the graph based approach that Cory proposed but for a basic construct like lists (and trees and tables that can easily be built from it) it seems a waste. Graphs should be used for all kinds of stuff, even for structural features like n-ary relations, but lists - rather not. At least that’s my current thinking.
>>>> I think it can be useful in a bigger context like being able to express that in some application/source/universeOfDiscourse all lists are closed. But I’d rather embed that in a semantic extension that fixes a few more things and formally defines a Closed World Scenarios that applications often assume and require.
>>>> 
>>>> Pat has in earlier mails suggested to mark the last item of a list instead of providing a length attribute. That didn’t really catch on with me because I lacked an idea how to do it. Meanwhile the following vocabulary extension bubbled up in my head:
>>>> 
>>>>  rdfx:Chain rdfs:subClassOf rdfs:Container .
>>>>  rdfx:last rdfs:domain rdfx:Chain .
>>>>  rdfx:last rdfs:range rdfs:ContainerMembershipProperty .
>>>> 
>>>>  _:L  rdf:_1  "a" .
>>>>  _:L  rdf:_2  "b" .
>>>>  _:L  rdf:_3  "c" .
>>>>  _:L  rdfx:last rdf:_3 .
>>>> 
>>>> I sort of like it but I’m not convinced that it's really more elegant.
>>> 
>>> Yes, exactly what I would suggest. It avoids relying on integer arithemtic and IMO is cleaner. 
>>> 
>>>> Fundamentally it doesn’t seem to make much difference:
>>>> - Containers still provide only a semantically weak base
>>>> - a missing 2nd slot would still need to be filled
>>> 
>>> Why? You could just have the last triple to specify the length without saying anything about what is in the slots. 
>> 
>> Feels like a ladder with missing steps.
> 
> I agree, but the point is, it is always /possible/ that a description may be incomplete.
> 
>> I think the question what the default value is would inevitably come up. You are probably very right about default values breaking the monotonicity of RDF but entailing a blank node from a missing slot is exactly what an application developer will expect and intuitively understand as a default value.
> 
> Aaaargh, thats about as wrong it can be. First, it's not a value and second, it's not a default. But if you think people will be happ with it, I won’t argue. 
> 
>> So we’re lucky.
>> 
>>>> - a surplus 4th slot would still need to be ignored
>>> 
>>> It should be an inconsistency. 
>> 
>> Because? I like this more stringent definition of the semantics of a finite list but it is a design decision, right? Or does it follow logically from something else?
> 
> It surely follows from the idea of these things having a definite length. If it has length 3 but it is OK to say there is something in position 4, what does the length assertion even mean? Surely we have to say, there /isn’t/ anything in position 4, so the triple that says that there is, must be false; or else the claim about it being only 3 long was false. Something must be false: which means we have an inconsistency.
> 
> Pat
> 
> 
>> 
>>>> And maybe the counting business on ContainermembershipProperties would still require an arithmetic extension? Which would still not be worth the trouble because it would only stand on Collections’ shifting semantic sands?
>>>> 
>>>> 
>>>> BTW: I don’t like the name "Chain". I would prefer "Series" but I’m not a native speaker and not sure if it captures the intended purpose well enough. Also "Seq" and "Ser" are easy to confuse (but "Ser" gets filed one after  "Seq", so that’s good!). "Fin de Seq" would of course be even nicer.
>>> 
>>> FWIW, I am a native speaker and I don't like “chain" either. The only thing wrong with ’series’ is that is a very generic word that gets used a lot, so there might be some confusion. But thats a very slight point. 
>> 
>> Okay. But maybe that point is mute with the new proposal above.
>> 
>> 
>> Thomas
>> 
>> 
>>> Pat
>>> 
>>>> 
>>>> 
>>>> Thomas
>>>> 
>>>> 
>>>> [0] And that the RDF Semantics at https://www.w3.org/TR/rdf11-mt/ <https://www.w3.org/TR/rdf11-mt/> use the term "intent" although I got ridiculed for introducing it a few mails ago: "The intended mode of use is that things of type rdf:Bag are considered to be… " etc. Ha!
>>>> 
>>>> 
>>>>> Pat
>>>>> 
>>>>> 
>>>>>> That would be something that I’d need to discuss and think about more. The basic mechanism remains the same: if the list doesn’t fit the description, the description is wrong, but proceed with care.
>>>>>> 
>>>>>> This semantics cannot be as self-evident as rdf:Collections but it is trying to get close. It is trying not to step on the toes of RDF itself by referring not to the list itself but to the list contents as explicitly as possible (maybe that helps a little).
>>>>>> 
>>>>>> 
>>>>>> I see a few possible alternative ways forward:
>>>>>> 
>>>>>> A - GIVE UP
>>>>>> Be contend with nice closed lists in Turtle that can’t be used in OWL DL, hope for better rdf:List support in SPARQL 1.2. They surely will raise their ugly syntactic head now and then when needed least of all.
>>>>>> 
>>>>>> B - EXPRESS INTUITION, FORGET FORMAL SEMANTICS
>>>>>> Define a little hasLength property (as outlined above) that is purely descriptive but may hint at reasonable expectations and possible problems in a completely Open World. Its semantics are informational, just like the pile of warnings and downers that accompany Collections and Containers. Collections seem to do well enough, why strive for more.
>>>>>> Maybe even make hasLength a subproperty of OWL Annotation Property to definitely rule out any formal semantics.
>>>>>> 
>>>>>> C - SEMANTIC EXTENSION
>>>>>> Define one or more semantic extensions for RDF that specify the semantics of closed Containers more tightly. Maybe, while at it, define a Closed World Extension Set that provides not only very definite list semantics but also a few other fixings that application developers would love. Also define a process for registering extensions at the W3C, a vocabulary to describe them, a way to bind them into arbitrary snippets of RDF and set up a repository to host and serve them.
>>>>>> 
>>>>>> D - CLOSED GRAPHS
>>>>>> Per Cory Casanave's proposal in a parallel subthread rather concentrate on defining a locally closed world, a graph of some sort, and add ordering as a rather trivial feature. Sounds interesting. I have some vague ideas how Named Graphs could be pimped to be used for a lot more things than currently customary.
>>>>>> 
>>>>>> 
>>>>>> Of course you brought up a few more issues and this summary just aims to set the scene. So see below.
>>>>>> 
>>>>>> 
>>>>>>> On 17. May 2020, at 19:29, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>>>>> 
>>>>>>>> On May 17, 2020, at 4:40 AM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I’m really having trouble wrapping my head around this… 
>>>>>>>> 
>>>>>>>> Pat’s main argument centers around a very clear cut distinction between descriptions and constraints. But in reality that distinction is often not so clear but rather a perspective that one can choose [1].
>>>>>>> 
>>>>>>> The key point for RDF is that each triple should have a self-contained meaning, independent of other triples.
>>>>>> 
>>>>>> I had a vague feeling that I might still be underestimating how simplistc RDF is... 
>>>>>> 
>>>>>>> Having constraints like the length of a list (described using triples) imposed by a different triple breaks this, which changes both the syntax and the semantics of RDF in fundamental ways. 
>>>>>> 
>>>>>> It’s not a constraint, it’s a description (that might be wrong). But let’s say it is, then what about n-ary relations? Does using a blank node as intermediary make a difference? E.g.:
>>>>>> 
>>>>>>  _:b1 rdf:_1 "rosé" ;
>>>>>>           rdf:_2 "blue" ;
>>>>>>       rdf:_3 "citron ;
>>>>>>       rdfx:length 3 ;
>>>>>>       owl:sameAs ex:myList .
>>>>>> 
>>>>>> I now could happily add assertions about ex:myList and use it as a  subject or object in other statements. Of course this wouldn’t prevent 
>>>>>> 
>>>>>>  _:b2 rdf:_1 "rosé" ;
>>>>>>           rdf:_2 "blue" ;
>>>>>>       rdfx:length 3 ;  // bummer
>>>>>>       owl:sameAs ex:ohMyList .
>>>>>> 
>>>>>> Nah...
>>>>>> 
>>>>>>>> I’ve got the feeling that I can evade the whole argument about a fundamental change to RDF by just proclaiming that the proposed list length attribute is merely a description of the intent behind a certain list. 
>>>>>>> 
>>>>>>> I would guess that putting intents into a semantics would be a horror show :-)
>>>>>> 
>>>>>> Let me rephrase. Stating that a list is of a certain length means that:
>>>>>> - I want it to be of that length
>>>>>> - I made it to be of that length
>>>>>> - I believe it is of that length
>>>>>> - if it has a different length forget what I just said but
>>>>>> - proceed with care.
>>>>>> 
>>>>>> Properties have meanings that are usually defined in prose. They operationalize intentions.
>>>>>> 
>>>>>>>> E.g. I might want to publish a list of my favored colors and declare that I have exactly 3 of them. The list however contains only 2 because I can’t decide what the third one should be. That says something about my list (and my state of mind) but it certainly doesn’t break RDF.
>>>>>>> 
>>>>>>> It might, depending on how to map this doubt into an RDF graph. If the three-favorite-color observation is supposed to be a /syntactic/ constraint on a graph desscribing a proper list, then only having two listed would be a syntax error. But I presume you don’t intend that.
>>>>>> 
>>>>>> Indeed I didn’t put it as clear as I thought: stating that a list of length 2 is of length 3 is an obvious contradiction, an error. What kind of error it is we can’t necessarily say. One item might have been lost in transmission, I might be confused or maybe I accidently hit the Publish button too early. Anyway it’s now out there and a consumer of that data will have to find a way to deal with this contradiction. Insofar the description of a datastructure is no different from the description of anything else in this world and SNAFUs happen. Cory really made a good point about the Open World MANDATE. I had something similar on my lips about a world that is not really open when it forbids me to close my list of favorite colors - but I didn’t want to sound pert.
>>>>>> 
>>>>>>>> But I fear that such a merely descriptive interpretation would then again amount to a very weak formal semantics, certainly weaker than that of collections.
>>>>>>> 
>>>>>>> Actually I think it would be similar to what we already have (informatively) as the intended meaning of collections.
>>>>>> 
>>>>>> Well, that is what I’m aiming for. That would be fine! Away with Collections, here come Closed Containers! I have to admit I had drafted a much longer initial mail, explaining the reasoning behind my question but then I shortened it to keep the discussion focused - maybe too much.
>>>>>> My problem with Collections is not their semantics but their obese syntax. I always wondeerd why we can’t have the simple syntax of Containers for closed lists. Okay, the numbered membershipproperties aren’t _that_ elegant either but very reasonable compared to rdf:List. If one looks at
>>>>>>  ex:mylist rdf:_1 "rosé" ;
>>>>>>                rdf:_2 "blue" ;
>>>>>>            rdf:_3 "citron" .
>>>>>> one immediatly sees a list. Even nested Containers would still be readable. Querying is much easier too. The syntactic sugar for Collections in N3 and Turtle is of course very good but the underlying triple structure can not always be hidden - see SPARQL - and then Containers have the edge big way. That’s my primary motivation. 
>>>>>> Second: with easy lists it’s also easy to build lists of lists - trees and tables - or even cubes, and have all relevant datastructures covered. The aim is not to make graphs rule them all, but occasionally such a facility is nice and helpful. And it avoids frustrations: 
>>>>>> "LISTS for fuck’s sake!" [0]
>>>>>> 
>>>>>>>> Insofar, yes, I see the dilemma. I wonder however how RDF could establish any semantics on such grounds. On the semantic web are at any time talking about statements that could be wrong, misleading, grossly incomplete etc. None of that breaks the fundamentals.
>>>>>>> 
>>>>>>> Being wrong/misleading is irrelevant. The issue here is not whether any of this stuff is really true, but rather how the truth of one part interacts with the meaning or truth of another part. 
>>>>>> 
>>>>>>>> Let's take a more solidly worked out list vocabulary: to simplify things I refrain from the idea to add a length attribute to existing containers. Instead I define a new type of container, rdfx:Chain, and a new property rdfx:hasLength. I subclass rdfs:Container because I strive for lists that are easy to read and write by hand [0].
>>>>>>>> 
>>>>>>>>  rdfx:hasLength 
>>>>>>>>      rdfs:domain rdfx:Chain ;
>>>>>>>>      rdfs:range <http://www.w3.org/2001/XMLSchema#int <http://www.w3.org/2001/XMLSchema#int>> .
>>>>>>>>  rdfx:Chain
>>>>>>>>      rdfs:subClassOf rdfs:Container .
>>>>>>>> 
>>>>>>>> I'd like rdfx:Chain to be defined rather tight:
>>>>>>>> * an rdfx:Chain should have exactly as many entries as indicated by its length property. 
>>>>>>>> * entries should be assigned through rdfs:ContainerMembershipProperty properties, starting from 1 and without skipping numbers: an rdfx:Chain of length 3 is expected to be constructed of exactly the properties :_1, :_2 and :_3.. 
>>>>>>>> * an rdfx:Chain without a length property is mostly equivalent to an rdf:Seq but is still required to be without gaps and consecutively numbered, starting from 1. Its length may therefor be calculated from an rdfx:Chain that meets all those requirements. 
>>>>>>>> 
>>>>>>>> Any rdfx:Chain that breaks one of these rules is considered, ahem, problematic.
>>>>>>> 
>>>>>>> Is the chain problematic, or is the graph (mis)describing the chain problematic? In other words, is this a perfectly fine RDF description of a badly formed chain, or is it a problematic description? 
>>>>>> 
>>>>>> That’s the point of this example: we have to assume that such a situation can emerge and that we can’t answer that very question. So we have a problem. I want to figure out what the possible solutions are or if there aren’t any and we are doomed.
>>>>>> 
>>>>>>> If the first, how does this differ in effect from the current advice regarding the list vocabulary? (see https://www.w3.org/TR/rdf11-mt/#rdf-collections: <https://www.w3.org/TR/rdf11-mt/#rdf-collections:>  "Semantic extensions may place extra syntactic well-formedness restrictions on the use of this vocabulary in order to rule out such graphs. They may exclude interpretations of the collection vocabulary which violate the convention that the subject of a 'linked' collection of two-triple items of the form described above, ending with an item ending with rdf:nil, denotes a totally ordered sequence whose members are the denotations of the rdf:first values of the items, in the order got by tracing the rdf:rest properties from the subject to rdf:nil.”  
>>>>>>> In other words, you can define a semantic extension to RDF with the ’sensible lists only’ constraint considered to be part of its syntax.
>>>>>> 
>>>>>> I’m not so much concerend with the enforcement of that rule through a semantic extension. Right now we don’t even have a standard vocabulary to express such a "rule" (or rather, as we are not able to enforce much yet, an "intent") in the first place. RDF Collections did get by fine without such enforcements and so should the all new hasLength property. At least that issue was and is at the heart of my question.
>>>>>> 
>>>>>> Digression: it might be a good idea to take a look at all the loose ends in RDF and define such extensions, like e.g. an "RDF/CWA extension set" that makes application developers happy and helps them exchange data in strictly defined environments. It could lead to balkanization but I’d rather think of it as a variation of the OWL profiles. Paving the cow paths in which RDF is already used in CWA environments and making the differences explicit and comprehensible might be better than strictly sticking to a paradigm that is not for everybody and everything and leads to a lot of undocumented and poorly understood deviations in practice. But this is just a further idea. Currently any containers length thingy has to fit into the current RDF, with OWA, NUNA, ETC. End of digression.
>>>>>> 
>>>>>>> And if the second, problematic how? Is it syntactically illegal in some way? Should RDF engines refuse to process it and throw an error exception? Or should it be treated as semantically  inconsistent? Suppose this situation has arisen by merging two graphs that were each alone perfectly fine: should they not have been merged? But merging is a valid operation in RDF as currently defined. And so on. Sorry, a polite throat-clearing noise does not hack it. The hypothetical semantic extension just mentioned would have to deal with all those questions, of course.
>>>>>>> 
>>>>>>>> Applications must decide how to handle it. 
>>>>>>> 
>>>>>>> No, the problems run deeper than this, see above. 
>>>>>> 
>>>>>> One more No than your No: that would really not be the first merging conflict ever for which no general, standardized algorithm exists.
>>>>>> 
>>>>>>>> More semantics are sure possible. A somehow sensible set of rules could be that an unruly numbered rdfx:Chain is re-numbered, starting with :_1, missing members are augmented as blank nodes and surplus members cut or at least ingored.
>>>>>>> 
>>>>>>> So that would be… what? An inference rule that re-writes RDF graphs?? 
>>>>>> 
>>>>>> If it doesn’t cut but simply ignores surplus members it would only infer new statements. 
>>>>>> 
>>>>>>>> OTOH there’s so many things that can go wrong and with so many different consequences that it seems a bit risky to standardize such fixing arrangements. 
>>>>>>> 
>>>>>>> Indeed. And this is all on the Web, note. Suppose you got this unruly stuff from a website, does your re-writing propagate back to the source? No. So why even bother, when the next http GET is going to screw it all up again?
>>>>>> 
>>>>>> It doesn’t make much sense to standardize such an algorithm in RDF as there are so many plausible options. A vocabulary of different preferred or prefabricated algorithms to choose from and to hint at might be useful, so that e.g. a data exchange could state: "At this place we have a very strict dangling list policy. RTFM.".
>>>>>> 
>>>>>> I often felt unsatisfied when RDF didn’t provide tight rules but resorts to informative semantics, warnings etc. I’m beginning to see the beauty in it ;-)
>>>>>> 
>>>>>>>> The semantics of rdfx:Chain without any extra fixings are not much different from rdf:List except the one basic difference that an rdf:List that is broken because e.g. an element went missing is broken very obviously. The rdf:List doesn’t need any machinery that calculates if it's okay or not whereas an rdfx:Chain does need such machinery. I do however still have trouble accepting that bridging this difference would require to fundamentally alter RDF. It seems like syntactic sugar to me, albeit on the model level. Well, maybe that makes all the difference?
>>>>>>>> 
>>>>>>>> However: let’s say I declare ex:house to be rdfs:subClassOf ex:car and consequently a reasoner starts to add four wheels to every house in my ontology. That may be unfortunate but it doesn’t break RDF.
>>>>>>> 
>>>>>>> True, it does not. 
>>>>>>> 
>>>>>>>> How would a moderately heavy handed fixing arrangement as outlined above (say we don’t delete surplus members but simply ignore them) be any different?
>>>>>>> 
>>>>>>> Because it, unlike the house/car/wheel case, is talking about the actual RDF syntax rules themselves. Look, you can have containers in the world being described, first-class objects, and put up with the possibility of RDF triples saying rubbish about them, just like anything else. Or, you can have lists playing a central role in the /way/ that RDF describes things, like literals do at present. But you can’t have this both ways (in RDF: in some much more expressive languages you can, but its still very tricky to get it right and very prone to catastrophic errors if you don’t.)
>>>>>> 
>>>>>> Well, how if I say: I take the risks. I can have it both ways as: in the Open World (TM) I can’t get a tight and secure environment where a list stated to be of length 3 is guaranteed to be of said length - but I don’t care. The prudent consumer that I am I check every list before I use it. And if the check return a difference between stated and effective length I know that I have a problem and knowing that is often a good thing. But I’m starting to repeat myself.
>>>>>> 
>>>>>>>> Does the problem stem from the fact that we are talking about RDF vocabulary, not instance data?
>>>>>>> 
>>>>>>> YES
>>>>>> 
>>>>>> Good, but this is a difficult position. Stating that a list is of a certain length doesn’t speak about lists per se. The purpose of the length property is obviously to speak about the number of entries in a specific list. 
>>>>>> The situation would be totally different if I spoke about lists in general and aspired to state that in general lists with 2 entries are of length 3. That would indeed interfere with RDF itself in a quite unreasonable way.
>>>>>> I admit that this is a borderline situation and maybe that’s exactly why you are urging for caution so strongly. 
>>>>>> 
>>>>>> I haven’t looked into the way OWL DL uses and reserves the Collections vocabulary for axiomatic statements. Ideally a length property for containers would work just the other way round.
>>>>>> 
>>>>>> 
>>>>>>>> Pat argues that RDF is not designed to be a datastructure language, but does that mean that describing datastructures is off limits to RDF? Is it to be considered as a sort of meta modelling? Does it lead to paradoxes or intractability? 
>>>>>>> 
>>>>>>> Not /describing/ them, no. But we can do that with the current vocabulary.
>>>>>> 
>>>>>> Not in containers, which is my point.
>>>>>> 
>>>>>>> Remember that descriptions can always be incomplete or contradictory. 
>>>>>>>> 
>>>>>>>> Another aspect: Pat's reference to lists as a new node type seems to suggest that contradictions that are encapsulated in one statement are not a problem. The statement
>>>>>>>>  ex:aChainOfLength_4 rdfx:hasLength "3"^^xsd:integer 
>>>>>>>> would therefor not be problematic although ex:aChainOfLength_4 is clearly of length 4, not 3 as stated. This seems unfair and I’m again reduced to bickering, but not really understanding.
>>>>>>> 
>>>>>>> Well, the specification document for the (proposed extended) language would have to say what to do with that. I think it should be internally inconsistent, myself, but I would prefer to not use hasLength at all, but rather say that some entry was the last one. 
>>>>>> 
>>>>>> How would you do that? Reify the last statement? Use a special (to be defined) lastMember property?
>>>>>> 
>>>>>>>> BTW: I’m not convinced by that whole approach of a new node type for lists.
>>>>>>> 
>>>>>>> You are not alone. The WG rejected this idea. Twice. 
>>>>>>> 
>>>>>>>> I'd like lists to be integrated into RDF first class because they are such an important and ubiquitous datastructure.
>>>>>>> 
>>>>>>> Seems to me that this is a very tight form of integration. But YMMV, of course. 
>>>>>>> 
>>>>>>>> Describing lists in RDF is certainly not the most efficient way to implement and use them but I prefer a tight integration to an encapsulated one. I’d like to be able to spin graphs that relate and annotate items in lists and lists of lists (tables). That can more naturally be done when lists are expressed as graphs.
>>>>>>> 
>>>>>>> “Spin” here suggests to me that you are thinking about a programming language for manipulating and building graphs. But RDF is not such a language, and I do not think it would be a good idea to try to make it into one. 
>>>>>> 
>>>>>> "Spin" wasn’t the best wording, let's change it to "weave". Like: the above list about my favorite colors contains the entry "blue". Now I want to say something about blue contained in that list - as opposed to saying something about the color blue itself. Addressing a node in a statement needs even more machinery (more mails to come…), but having the node already in reach as opposed to hidden in a somewhat opaque list-node would be a precondition.
>>>>>> 
>>>>>> 
>>>>>> Thomas
>>>>>> 
>>>>>> 
>>>>>> [0] Manu Sporny’s rant is definitely worth a read, http://manu.sporny.org/2014/json-ld-origins-2/ <http://manu.sporny.org/2014/json-ld-origins-2/>
>>>>>> 
>>>>>> 
>>>>>>> Pat
>>>>>>> 
>>>>>>>> However I’ve never seen the approach with a new node type fleshed out in considerable detail. Maybe it has some advantages that I'm not aware of.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Thomas
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [0] Collections took all the syntactic sugar in N3, Turtle and JSON-LD but that's another problem that for now shall be ignored.
>>>>>>>> 
>>>>>>>> [1] To elaborate: a statement can be considered a description, an axiom, a constraint - that often depends on the point of view taken and is rather an operational aspect. There can be value in adding a length property to a list for the purpose of indicating when the author thinks that the list is complete. I might want to publish a list with my favorite colors and express that I have exactly three of them. I would expect the open world to have little say in that matter but to happily receive my contribution. The list however contains only 2 items because I’m still undecided about the third color. 
>>>>>>>> The list consumer then has a situation and will have to find a way to deal with it. However it’s often better to be made aware that there might be a problem. In such a case the added length attribute is a feature as it describes a discrepance between what I said and what I intended to say. Only in a further step may an axiom derive that the third color is as undefined as my state of mind or may a constraint reject the list as incomplete. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 15. May 2020, at 06:07, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Thomas
>>>>>>>>> 
>>>>>>>>> Let me explain why the semantics of the RDF containers is the way it is. Several members of the RDF WG were surprised by this, but it kind of follows inevitably from other, more basic, design decisions of RDF. 
>>>>>>>>> 
>>>>>>>>> First, RDF is NOT designed to be a datastructure language: it is a descriptive language. It describes things. The semantics is entirely set up with this basic design decision in mind. And second, it describes things under an open-world assumption. That point (open-world vs. closed-world) was always controversial, but it was thrashed out very early in the design process and became a fundamental design choice, on the grounds that a Web-based description language can never assume that all the relevant data is known about some topic. So this means that given any piece of RDF, you can cut out some piece of it, or adjoin some more RDF to it, without anything breaking. 
>>>>>>>>> 
>>>>>>>>> So now, how could rdfx:ClosedSeq work? Presumably it would come along with a bunch of assertions about the first, second, third etc. elements of the seq, and maybe a way of saying that the one of them is the last item, so we might need rdfx:LastItemIn. Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B ?  If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally. 
>>>>>>>>> 
>>>>>>>>> Now, another way to go would be to say that RDF needs containers, but it doesn’t need to describe them using triples. We could just allow a new kind of construct as a node in a triple (in addition to IRIs, Bnodes and literals) and give it its own definition. We would have to invent new syntax to represent them, of course, which would break all known RDF engines, but maybe it would be worth it (?) Then sequences (etc) would be much more like conventional datastructures. Of course, the semantics would have to say something about these things, but not much. (For example, we might require that IRIs inside sequences denote the same thing as they do outside, basic things like that.) This would not break the open world assumption. 
>>>>>>>>> 
>>>>>>>>> Anyway, I hope this helps people think about what the issues are :-)
>>>>>>>>> 
>>>>>>>>> Best wishes
>>>>>>>>> 
>>>>>>>>> Pat
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On May 14, 2020, at 8:18 AM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I’m aware that the topic of lists in RDF can ingnite lively debate nearly as much as blank nodes so my apologies in advance. I have a very specific question and I don’t intend to discuss the use of lists in OWL, syntactic sugar in Turtle, querying in SPARQL or historic details about how some decisions came to be (although I do find all that very interesting, but another time...).
>>>>>>>>>> 
>>>>>>>>>> Lists from the RDF container vocabulary - rdf:Seq, rdf:Bag and rdf:Alt - can’t be closed whereas lists from the collection vocabulary - rdf:List - are always closed. Collections, in constrast to containers, are popularly considered to "have semantics" because of this closing characteristic. 
>>>>>>>>>> The excruciatingly exact Lisp-style modelling of rdf:Lists through rdf:first/rest/nil properties does indeed leave no room for misunderstanding about the listiness and closedness of an rdf:List.
>>>>>>>>>> What level of semantics could be provided for containers by explicitly defining either a closed container, e.g. rdfx:ClosedSeq, or an appropriate property, e.g. rdfx:hasLength, that implicitly closes a container? 
>>>>>>>>>> 
>>>>>>>>>> I reckon that semantics introduced per definition are always somewhat weaker than semantics that emenate naturally and unmistakably from a datastructure itself. However collections have a lot of disadvantages (that I promised above not to discuss) and I wonder how workable the semantics provided by defining an rdfx:ClosedSeq class or an rdfx:hasLength property would be. Would they be able to take some load, none at all, not enough, or almost the same as collections?
>>>>>>>>>> 
>>>>>>>>>> Thomas
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 17. May 2020, at 05:53, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On May 16, 2020, at 11:01 AM, William Van Woensel <William.Van.Woensel@Dal.Ca <mailto:William.Van.Woensel@Dal.Ca>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi everyone,
>>>>>>>>>> 
>>>>>>>>>> Some minor thoughts on this issue:
>>>>>>>>>> 
>>>>>>>>>> Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B?
>>>>>>>>>> 
>>>>>>>>>> Not sure whether this was meant to present a dichotomy between collections or containers – but it is same for the RDFS collections, no? 
>>>>>>>>> 
>>>>>>>>> Yes, it is. And RDF is obliged to accept the open-world interpretation in such cases. 
>>>>>>>>> 
>>>>>>>>>> If the second item in the linked list would be missing, it's even worse since the rest of the list would simply be "lost"; or, the same resource could have two different "first" or "rest" items, possibly leading us to conclude they are equivalent (In fact, the latter example is given in the RDF 1.1 semantics document to illustrate the total lack of semantics for collections; which is the real underlying issue here I suppose)
>>>>>>>>> 
>>>>>>>>> Exactly.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally.
>>>>>>>>>> 
>>>>>>>>>> Perhaps I again misunderstand, but surely a semantic extension with extra assumptions for datastructures (regarding entailment, or consistency) would not break RDF?
>>>>>>>>> 
>>>>>>>>> Well, we certainly could have a semantic extension which imposes extra syntactic conditions of its own (just as OWL-RDF does) but then that would not be RDF. But yes, certainly such an extension - call it RDF-C, maybe -
>>>>>>>> 
>>>>>>>> For a moment I thought you’d introduce contexts to define surfaces on which lists can be closed. But you wouldn’t go that far, would you? Or would you? Well, it would certainly introduce some inflationary demand in context identifiers.
>>>>>>>> 
>>>>>>>>> could be defined and might be useful. My next question would be, what did you want it to mean? That is, what semantic conditions would you want to put on such closed collections and statements made about them?
>>>>>>>>> 
>>>>>>>>> Pat
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> 
>>>>>>>>>> William
>>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> 
>>>>>>>>>> Sent: May-15-20 1:07 AM
>>>>>>>>>> To: thomas lörtsch <tl@rat.io <mailto:tl@rat.io>>
>>>>>>>>>> Cc: Semantic Web <semantic-web@w3.org <mailto:semantic-web@w3.org>>
>>>>>>>>>> Subject: Re: defining the semantics of lists
>>>>>>>>>> 
>>>>>>>>>> CAUTION: The Sender of this email is not from within Dalhousie.
>>>>>>>>>> 
>>>>>>>>>> Hi Thomas
>>>>>>>>>> 
>>>>>>>>>> Let me explain why the semantics of the RDF containers is the way it is. Several members of the RDF WG were surprised by this, but it kind of follows inevitably from other, more basic, design decisions of RDF.
>>>>>>>>>> 
>>>>>>>>>> First, RDF is NOT designed to be a datastructure language: it is a descriptive language. It describes things. The semantics is entirely set up with this basic design decision in mind. And second, it describes things under an open-world assumption. That point (open-world vs. closed-world) was always controversial, but it was thrashed out very early in the design process and became a fundamental design choice, on the grounds that a Web-based description language can never assume that all the relevant data is known about some topic. So this means that given any piece of RDF, you can cut out some piece of it, or adjoin some more RDF to it, without anything breaking.
>>>>>>>>>> 
>>>>>>>>>> So now, how could rdfx:ClosedSeq work? Presumably it would come along with a bunch of assertions about the first, second, third etc. elements of the seq, and maybe a way of saying that the one of them is the last item, so we might need rdfx:LastItemIn. Suppose however that we simply don’t have a triple that specifies the second element. Is this an error? Or just an incomplete description? If the latter, what if we omit the LastItemIn triple; then we don't know how long this seq is. Is that also an incomplete description (as the open-world assumption requires) or is it an error? What happens if we are told that A is the second item and also that B is the second item? Is that an error, or can we conclude that owl:sameAs A B ?  If we take the open-world choice in these cases then this is hardly distinguishable from that we have already. But if we say that incomplete or ‘excessive’ information is an error then we don’t really have an RDF graph, since the extra constraints amount to a fundamental change to the idea of graph syntax. A ‘legal’ RDF graph now is not just a set of triples: it has global constraints on what must be present or what is allowed to be present. This is of course possible, but it would change RDF fundamentally.
>>>>>>>>>> 
>>>>>>>>>> Now, another way to go would be to say that RDF needs containers, but it doesn’t need to describe them using triples. We could just allow a new kind of construct as a node in a triple (in addition to IRIs, Bnodes and literals) and give it its own definition. We would have to invent new syntax to represent them, of course, which would break all known RDF engines, but maybe it would be worth it (?) Then sequences (etc) would be much more like conventional datastructures. Of course, the semantics would have to say something about these things, but not much. (For example, we might require that IRIs inside sequences denote the same thing as they do outside, basic things like that.) This would not break the open world assumption.
>>>>>>>>>> 
>>>>>>>>>> Anyway, I hope this helps people think about what the issues are :-)
>>>>>>>>>> 
>>>>>>>>>> Best wishes
>>>>>>>>>> 
>>>>>>>>>> Pat
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On May 14, 2020, at 8:18 AM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I’m aware that the topic of lists in RDF can ingnite lively debate nearly as much as blank nodes so my apologies in advance. I have a very specific question and I don’t intend to discuss the use of lists in OWL, syntactic sugar in Turtle, querying in SPARQL or historic details about how some decisions came to be (although I do find all that very interesting, but another time...).
>>>>>>>>>>> 
>>>>>>>>>>> Lists from the RDF container vocabulary - rdf:Seq, rdf:Bag and rdf:Alt - can’t be closed whereas lists from the collection vocabulary - rdf:List - are always closed. Collections, in constrast to containers, are popularly considered to "have semantics" because of this closing characteristic.
>>>>>>>>>>> The excruciatingly exact Lisp-style modelling of rdf:Lists through rdf:first/rest/nil properties does indeed leave no room for misunderstanding about the listiness and closedness of an rdf:List.
>>>>>>>>>>> What level of semantics could be provided for containers by explicitly defining either a closed container, e.g. rdfx:ClosedSeq, or an appropriate property, e.g. rdfx:hasLength, that implicitly closes a container?
>>>>>>>>>>> 
>>>>>>>>>>> I reckon that semantics introduced per definition are always somewhat weaker than semantics that emenate naturally and unmistakably from a datastructure itself. However collections have a lot of disadvantages (that I promised above not to discuss) and I wonder how workable the semantics provided by defining an rdfx:ClosedSeq class or an rdfx:hasLength property would be. Would they be able to take some load, none at all, not enough, or almost the same as collections?
>>>>>>>>>>> 
>>>>>>>>>>> Thomas
> 


----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Saturday, 30 May 2020 10:00:49 UTC