- From: David Booth <david@dbooth.org>
- Date: Wed, 28 Sep 2022 20:41:12 -0400
- To: semantic-web@w3.org
- Cc: Pat Hayes <phayes@ihmc.us>
Hi Pat! On 9/28/22 01:42, Patrick J. Hayes wrote: >> On Sep 27, 2022, at 1:32 PM, David Booth <david@dbooth.org> wrote: >> On 9/27/22 09:58, Pierre-Antoine Champin wrote: >>> lists do not only give you order, they give you "closedness": the >>> first/rest ladder captures the fact that the list contains all >>> these elements *and only them* (and in this particular order). >> >> Small but important clarification: currently RDF lists do NOT give >> "closedness", > > Sure they do, if you use the vocabulary properly. But that was my point: RDF does not currently *enforce* that it be used properly. A future RDF should support arrays that are impossible to malform. > :thislist rdf:first :A . > :thislist rdf:rest :x1 . > :x1rdf:first :B . > :x1 rdf:rest :x2 . > :x2 rdf:first :C . > :x2 rdf:rest rdf:nil . > > tells you that :thislist is exactly ( :A :B :C ), with three > items. You can't express this kind of 'closedness' with the > container vocabulary because it provides no way to say 'and > no more'. > > Now, of course RDF does not impose 'proper' usage of the collection > vocabulary, because it imposes hardly any syntactic restrictions > of how any vocabulary is used. But you can define a semantic > extension of RDF which does impose this, and indeed the specs > mention this possibility explicitly. Right, that was my point. More below . . . >> but "closedness" is definitely what we want (for the vast majority >> of use cases). That is precisely one of the reasons I and others >> feel such a need for (closed) arrays. >> >>> . . . >>>> Hugh Glaser wrote: >>>> I also worry that if I assert exactly the >>>> same knowledge twice, a paper could end up with two authorLists, >>>> certainly if bonds got involved. >>> Indeed... That's actually something that "lists as first class >>> citizens" could help solve -- that is, if they were defined in >>> such a way that two lists with exactly the same elements are >>> in fact one and the same object. > > Whoo, wait a minute. Do you really want that kind of extensionality > condition? That's not true in LISP, for example, and I am pretty > sure its not true in any programming language that uses linked > lists as a data structure. YES, we want that kind of extensionality! But we want it for *arrays* -- not linked lists. The fact that RDF only provided a linked list is a historical artifact that only muddies the waters. The point is that a future RDF should offer arrays that have array semantics -- not linked list semantics: each element should have an implied consecutive integer index. And that index should start from 0, by the way -- not 1. Software developers have to deal with these things, and developers have learned over several decades of experience that 0-based indexing is less programming work than 1-based indexing. An array *could* be implemented as a linked list, but the implementation is completely immaterial -- and an unfortunate distraction -- because it should be hidden from the user. >> Yes, that is exactly what's needed, and it is readily attainable >> if we eliminate *explicit* blank nodes. By explicit blank >> nodes, I mean blank nodes that are written like _:b42 in Turtle. >> Implicit blank nodes, written with square brackets like "[ ... ]", >> do not cause problems because they are guaranteed to be acyclic. > > Wait, wait, the nonsequiteurs are making me dizzy. First, > extensionality (ie the condition same elements => same list) > has nothing to do with blank nodes. Unfortunately it has a *lot* to do with blank nodes, because "sameness" is not easy to determine in the presence of unrestricted blank nodes. It is the graph isomorphism problem, as Jeremy Carroll pointed out two decades ago: https://www.hpl.hp.com/techreports/2001/HPL-2001-293.pdf https://en.wikipedia.org/wiki/Graph_isomorphism_problem That's why after 20+ years we still do not yet have a standard for RDF graph canonicalization, though a W3C group is now working on it. But if blank nodes are restricted to being acyclic -- i.e., no blank node cycles -- then it is trivially easy to compare two graphs for "sameness". Aidan Hogan did a wonderfully comprehensive analysis of blank nodes in his paper "Everything You've Always Wanted to Know About Blank Nodes: https://aidanhogan.com/docs/blank_nodes_jws.pdf > Second, even implicit blank nodes are still blank nodes, so how > can their explicitness be important? Because eliminating explicit blank nodes is an easy way to guarantee that the graph has no blank node cycles. And the vast majority of blank node usage can be done with implicit blank nodes. > Third, how did being acyclic enter into the discussion suddenly? I'm referring to blank node cycles. As explained above, they are the source of the problem. >> This means that it would be easy for tools to generate a >> consistent internal identifier for them, based recursively on >> their constituent elements. > > And those would be blank node identifiers, right? No. They should be some kind of internal identifiers that the tool can use to easily determine "sameness": if two objects have the same internal identifier, then they denote the same object. Those internal identifiers should not be exposed to the user. > So what you are suggesting here, if I follow you, is a scheme > for letting systems generate their own bnodeIDs for the innards > of list structures. Fine, but this is not a modification to RDF. Close, but not bnode IDs, but internal identifiers; and not list structures, but arrays. If those arrays happen to be implemented as list structures, that's fine but immaterial, because the implementation should not be exposed to the user. >> (This is closely related to RDF canonicalization, which becomes >> trivially easy without explicit blank nodes.) Eliminating >> explicit blank nodes would mean that we'd lose the convenience >> of not having to allocate an IRI. I think there are ways to >> address that loss in other ways, but that's another topic. >> >>> But that would depart from their current interpretation, and >>> not necessarily fit all use-cases, >> >> Agreed, but it would fit the most common use cases. It doesn't >> have to fit *all* use cases. >> >>> so this is not something to decide lightly. This is the kind of >>> semantic rabbit hole that Pat Hayes was warning about earlier >>> in this thread (if I got his point correctly). >> >> I hope Pat will correct me if I'm wrong, but my read of >> the discussion so far is that the semantics would not be a >> big problem: both arrays and composite objects can have very >> straight-forward -- and very similar -- semantics. And it's clear >> to me at least that although the rdf:aboutEach functionality >> could be useful in some cases, it is not what we need as the >> basic array functionality. The basic functionality that we need >> is for an assertion about an array to *only* be about that array >> -- not about every element in that array: >> >> ("apples" "bananas" "peaches") ex:length 3 . > > If indeed that is all you want, Pat agrees this would be trivially > straightforward. Pat is however very suspicious that this is > in fact not all that people want, and they they will be writing > things like > > :PatHayes :fatherOf (:SimonHayes :RobinHayes) > > before the ink is dry on the specification document. That's okay! There would be nothing wrong with writing that, *provided* that you define the :fatherOf property to mean that :PatHayes is the father of every person in that list. It may then be nonsensical to assert the following: :PatHayes :fatherOf :SimonHayes . But even that could be perfectly fine to write if you instead define the :fatherOf property to have *conditional* meaning: if the object of the assertion is a person, then it asserts fathership about that person. But if the object of the assertion is an array, then it asserts fathership about every person listed in that array. >> And one other comment . . . >> >> On 9/27/22 09:43, Pierre-Antoine Champin wrote: >>> If we can design other efficient design patterns for conveying >>> order and "closedness" (such as the one proposed above), I >>> believe that the need for representing lists would not be as >>> pressing as suggested in this thread. >> >> Possibly. But software developers have been using arrays for >> 60+ years, and they *expect* them. So as a practical matter, >> I think the straight-forward solution is to add proper support >> for arrays, perhaps in a new higher-level RDF 2.0 syntax, to >> avoid breaking any existing RDF or tools. >> >> As Manu Sporny put it, by not having proper array support in RDF, >> we're currently "giving developers a big fat middle finger in >> that area". > > Both you and Manu miss the central point here. RDF is not intended > to be a notation for software developers to create new structures > with. It is not a developer toolkit. That's exactly the weakness that we need to address. > It was designed and intended to be an information exchange > notation. As soon as you give it to developers and say, in effect, > go ahead and play with this and build things with it, then all > of its utility as an information exchange notation is lost, > because whatever meaning one developer intends to express using > the structures she develops will be opaque to any other developer > and any user in a different development environment. They will only be opaque if the predicates that use them are not documented. If they are properly documented, then the meaning will be clear. This is true already, of *all* of RDF. Adding features like arrays and composite objects does not fundamentally change the need to define the meaning of the predicates used. The example that you gave above would be meaningless to others if the :fatherOf relation were not defined. And as I pointed out, it can just as well be defined to make an assertion about one person as it can to make an assertion about every person in a list. > There is a kind of universal understanding about what an RDF triple > 'means': it says that a relation holds between two things, and > it is therefore an assertion about a world consisting of these > things with relations holding between them. This consensus of > intended meaning is perhaps a bit shaky at the edges and under > strain in some places, but still is kind of universally understood > and accepted. But there is no such consensus AT ALL about what a > list is supposed to mean, or what an array is supposed to mean, > when used as part of an assertion about this common 'world' of > things bound together by relations. And because there is no such > consensus, as soon as these structures are given to developers to > build with, what they build will have, at best, an idiosyncratic > meaning private to the community in which the developer is > working. At which point, the entire purpose of RDF is lost. Wait, let's not throw the baby out with the bath. The meaning will only be private if it is undocumented. But as I pointed out, that is already true of *all* of RDF. You have no idea what <http://dbooth.org/fribjam> means unless I document its meaning. The purpose of RDF is *not* lost just because we need to define the meaning of the predicates that use composite objects and arrays, just as we need to define the meaning of everything else we use in RDF. To further clarify, an array such as the following (reusing the Turtle syntax): (:SimonHayes :RobinHayes ) should have no intrinsic meaning beyond the fact that it is an array of two elements denoted by :SimonHayes and :RobinHayes, in that order. This allows it to be the exact *same* array -- extensionality -- in the following two assertions: :myStore :customers ( :SimonHayes :RobinHayes ) . :PatHayes :fatherOf ( :SimonHayes :RobinHayes ) . However, the :customers and :fatherOf predicates can (informally speaking) impart *different* meaning to those arrays by the way that they are defined: the :customers predicate can indicate that every member of that list is a customer of :myStore, and the :fatherOf predicate can indicate that every member has a father :PatHayes. This is no different than using the number 42 in different contexts: :boston :temperatureFahrenheit 42 . :marblesInMyPocket :count 42 . Formally speaking, the meaning of 42 is the same in both of those assertions, but it is *used* in different ways by different predicates, so informally/colloquially we might say that it has different meaning in different contexts. The exact same thing should be true of an array or a composite object. > Which is why any new syntactic extension to RDF should be given > a semantics as part of its normative definition, and this should > be one that is likely to survive the pressures of how developers > are wanting to use the new structure, I agree that they need a well-defined meaning. But the meaning defined by the RDF standards can be minimal, because additional meaning can be defined by RDF authors. > For example, if it is clear that some folk really want to use > arrays to express n-ary relations, while others really want to use > them to abbreviate conjunctions but with a closed-world assumption, > then these two groups should be given distinct extensions to RDF > syntax which will not be confused with each other, and each given > a nice crisp semantics and tutorial examples, etc.. For arrays and composite objects, I don't think we need different syntaxes for all of the many ways that they may be used, because as demonstrated above, additional meaning can be imparted by the predicates that use them. But if commonly used patterns emerge then it *might* be worth adding syntax to support a few of them. Thanks, David Booth
Received on Thursday, 29 September 2022 00:41:28 UTC