Re: RDF lists/arrays and n-ary relations [was Re: OWL and RDF lists] from David Booth on 2022-10-01 (semantic-web@w3.org from October 2022)

From: David Booth <david@dbooth.org>
Date: Sat, 1 Oct 2022 16:46:06 -0400
To: semantic-web@w3.org
Cc: Pat Hayes <phayes@ihmc.us>
Message-ID: <b55952f7-10ac-8bf8-f995-e421e35aeb4e@dbooth.org>
On 9/30/22 03:27, Patrick J. Hayes wrote:>> On Sep 28, 2022, at 7:41 PM, 
David Booth wrote:
>> On 9/28/22 01:42, Patrick J. Hayes wrote:
>> >> On Sep 27, 2022, at 1:32 PM, David Booth wrote:
>> >> On 9/27/22 09:58, Pierre-Antoine Champin wrote:
>> >>> lists do not only give you order, they give you "closedness": the
>> >>> first/rest ladder captures the fact that the list contains all
>> >>> these elements *and only them* (and in this particular order).
>> >>
>> >> Small but important clarification: currently RDF lists do NOT give
>> >> "closedness",
>> >
>> > Sure they do, if you use the vocabulary properly.
>>
>> But that was my point: RDF does not currently *enforce* that it
>> be used properly.  A future RDF should support arrays that are
>> impossible to malform.
> 
> That should be an extension on top of RDF, not a revised version of RDF. 
> Imposing syntactic rigor doesn't change anything about RDF itself, 
> either its syntax or its semantics. It would be an RDF app that rejects 
> RDF graphs with 'strange' (including incomplete) collection triples. The 
> rejection would be nonmonotonic, but the underlying RDF would not be. 
> All of which is fine and quite consistent with the current RDF specs, so 
> no new standard is required. Just implement the checks, document them, 
> call it a semantic extension (refer to the specs) and run with it.

Indeed, my hope is for these ease-of-use enhancements to be largely 
achieved through a higher level syntax that requires as little change as 
possible to the RDF core.

>> > :thislist rdf:first :A .
>> > :thislist rdf:rest :x1 .
>> > :x1rdf:first :B .
>> > :x1 rdf:rest :x2 .
>> > :x2 rdf:first :C .
>> > :x2 rdf:rest rdf:nil .
>> >
>> > tells you that :thislist is exactly ( :A :B :C ), with three
>> > items. You can't express this kind of 'closedness' with the
>> > container vocabulary because it provides no way to say 'and
>> > no more'.
>> >
>> > Now, of course RDF does not impose 'proper' usage of the collection
>> > vocabulary, because it imposes hardly any syntactic restrictions
>> > of how any vocabulary is used. But you can define a semantic
>> > extension of RDF which does impose this, and indeed the specs
>> > mention this possibility explicitly.
>>
>> Right, that was my point.
> 
> But if the current specs tell you explicitly that doing this kind of 
> checking is legal, why do you feel that we need a new version of the 
> specs to do it?

I think new specs will be needed to standardize a higher-level syntax. 
Right now, using RDF is like writing assembly language.  Higher level 
programming languages make the programming job easier and less error 
prone, and that's exactly the shift I hope to see with a higher level 
syntax for RDF.  I think most of the desired enhancements can indeed be 
achieved without changing the underlying RDF model.  But some tweaks to 
the RDF model *might* be needed.

>>   More below . . .
>>
>> >> but "closedness" is definitely what we want (for the vast majority
>> >> of use cases).  That is precisely one of the reasons I and others
>> >> feel such a need for (closed) arrays.
>> >>
>> >>> . . .
>> >>>> Hugh Glaser wrote:
>> >>>> I also worry that if I assert exactly the
>> >>>> same knowledge twice, a paper could end up with two authorLists,
>> >>>> certainly if bonds got involved.
>> >>> Indeed...  That's actually something that "lists as first class
>> >>> citizens" could help solve -- that is, if they were defined in
>> >>> such a way that two lists with exactly the same elements are
>> >>> in fact one and the same object.
>> >
>> > Whoo, wait a minute. Do you really want that kind of extensionality
>> > condition? That's not true in LISP, for example, and I am pretty
>> > sure its not true in any programming language that uses linked
>> > lists as a data structure.
>>
>> YES, we want that kind of extensionality!  But we want it for
>> *arrays* -- not linked lists.
> 
> OK, if you insist. But these arrays are not anything like arrays or 
> vectors in most programming languages. I can have two arrays in Algol-60 
> and just about every language since which have the same elements but are 
> not identical. For example, if one of them is changed or deleted, the 
> other is unchanged.

Fully agree.

> So it might be a good idea to not call these things 'arrays', maybe? HOw 
> about 'sequences'?

Perhaps.  I just don't want to call them lists, because that term has 
too much baggage, being associated with RDF's current linked lists.

> (Why do you want this extensionality for 'arrays' but not for lists? 
> Just asking.)

It isn't that I *only* want extensionality for arrays, it's just that I 
am only *interested* in arrays.  Linked lists have some use cases, but 
they are vastly outnumbered by arrays.  Arrays are the pressing need.

>>  The fact that RDF only provided a
>> linked list is a historical artifact that only muddies the waters.
>>
>> The point is that a future RDF should offer arrays that have array
>> semantics -- not linked list semantics: each element should have
>> an implied consecutive integer index.  And that index should start
>> from 0, by the way -- not 1.
> 
> OK, but this is syntax, not semantics.

I assumed that having an (implied) index would have to be a part of the 
semantics.  Wouldn't it?  For example, query languages will need to be 
able to access the index, or access elements by index, so I would assume 
that the index would have to be a part of the semantics, so that 
compliant processors would all get the same results.

Could you explain how arrays (with query-accessible indexes) would work 
if the index were not defined in the semantics?

>> Software developers have to deal
>> with these things, and developers have learned over several decades
>> of experience that 0-based indexing is less programming work than
>> 1-based indexing.
>>
>> An array *could* be implemented as a linked list, but the
>> implementation is completely immaterial -- and an unfortunate
>> distraction -- because it should be hidden from the user.
> 
> I agree, let us talk at a conceptual level ignoring implementation 
> decisions.
>>
>> >> Yes, that is exactly what's needed, and it is readily attainable
>> >> if we eliminate *explicit* blank nodes.  By explicit blank
>> >> nodes, I mean blank nodes that are written like _:b42 in Turtle.
>> >> Implicit blank nodes, written with square brackets like "[ ... ]",
>> >> do not cause problems because they are guaranteed to be acyclic.
>> >
>> > Wait, wait, the nonsequiteurs are making me dizzy. First,
>> > extensionality (ie the condition same elements => same list)
>> > has nothing to do with blank nodes.
>>
>> Unfortunately it has a *lot* to do with blank nodes, because
>> "sameness" is not easy to determine in the presence of
>> unrestricted blank nodes.
> 
> No, it is trivial, for both 'arrays' and for linked lists using the 
> first/rest vocabulary.

Determining graph isomorphism is not trivial, so I'm not following why 
you say that "sameness" is trivial to determine in the presence of 
unrestricted blank nodes.  To my mind, "sameness" in this context means 
that the structures are isomorphic.  The reason I view "sameness" that 
way is because I want to be able to write two arrays like the following, 
and I want them to be considered the exact *same* array, because they 
have the "same" elements, each of which in turn happens to be another array:

   # Example 14
   :x :p ( ( "pear" ) ( "plum" ) ) .   # Array 14a
   :y :p ( ( "pear" ) ( "plum" ) ) .   # Array 14b

But if I write that today in Turtle it becomes triples like the following:

   # Example 15
   :x :p _:B15a .    # List 15a
   :y :p _:B15b .    # List 15b

   _:B15a rdf:first _:ba0 .
   _:B15a rdf:rest _:baC .
   _:B15b rdf:first _:bb0 .
   _:B15b rdf:rest _:bbC .
   _:ba0 rdf:first "pear" .
   _:ba0 rdf:rest rdf:nil .
   _:ba1 rdf:first "plum" .
   _:ba1 rdf:rest rdf:nil .
   _:baC rdf:first _:ba1 .
   _:baC rdf:rest rdf:nil .
   _:bb0 rdf:first "pear" .
   _:bb0 rdf:rest rdf:nil .
   _:bb1 rdf:first "plum" .
   _:bb1 rdf:rest rdf:nil .
   _:bbC rdf:first _:bb1 .
   _:bbC rdf:rest rdf:nil .

Written as in example 15, it is not nearly as obvious that blank nodes 
_:B15a and _:B15b should be considered the same object (under the 
desired array semantics).

In the general case, deciding whether two blank nodes should be 
considered the same node is the graph isomorphism problem, which is 
*not* trivial to determine when blank nodes can have cycles.  (The above 
example is acyclic, but blank node cycles can occur if explicit blank 
nodes are allowed.)  So I'm not understanding why you say that 
"sameness" is trivial.  Can you explain?  Are you using a different 
notion of "sameness" perhaps?  Or assuming that we already have a 
guarantee that the graph is acyclic?

>>  It is the graph isomorphism
>> problem
> 
> They are both trivial subcases of that general problem.
> 
>> , as Jeremy Carroll pointed out two decades ago:
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.hpl.hp.com%2Ftechreports%2F2001%2FHPL-2001-293.pdf&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kyvHM%2FuhL67NXEzA%2FgQ%2Bqy9Ip4buyOhvd%2B4QyJwlKKw%3D&amp;reserved=0 
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.hpl.hp.com%2Ftechreports%2F2001%2FHPL-2001-293.pdf&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kyvHM%2FuhL67NXEzA%2FgQ%2Bqy9Ip4buyOhvd%2B4QyJwlKKw%3D&amp;reserved=0>
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FGraph_isomorphism_problem&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=rApMY0T9V0s31pWpwlwJfHECPtzob3sbCq0r3vemk3U%3D&amp;reserved=0 
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FGraph_isomorphism_problem&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=rApMY0T9V0s31pWpwlwJfHECPtzob3sbCq0r3vemk3U%3D&amp;reserved=0>
>>
>> That's why after 20+ years we still do not yet have a standard for
>> RDF graph canonicalization, though a W3C group is now working on it.
> 
> I know. I only narrowly escaped being drafted onto that WG :-).
>>
>> But if blank nodes are restricted to being acyclic -- i.e., no blank
>> node cycles -- then it is trivially easy to compare two graphs for
>> "sameness".
>>
>> Aidan Hogan did a wonderfully comprehensive analysis of blank nodes
>> in his paper "Everything You've Always Wanted to Know About Blank
>> Nodes:https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Faidanhogan.com%2Fdocs%2Fblank_nodes_jws.pdf&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xITySQ4mtp8fKYFyrGyJ%2Fb6HcobHLFVrsWK4EcGdYw4%3D&amp;reserved=0 
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Faidanhogan.com%2Fdocs%2Fblank_nodes_jws.pdf&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xITySQ4mtp8fKYFyrGyJ%2Fb6HcobHLFVrsWK4EcGdYw4%3D&amp;reserved=0>
>>
>> > Second, even implicit blank nodes are still blank nodes, so how
>> > can their explicitness be important?
>>
>> Because eliminating explicit blank nodes is an easy way to guarantee
>> that the graph has no blank node cycles.  And the vast majority
>> of blank node usage can be done with implicit blank nodes.
>>
>> > Third, how did being acyclic enter into the discussion suddenly?
>>
>> I'm referring to blank node cycles.  As explained above, they are
>> the source of the problem.
>>
>> >> This means that it would be easy for tools to generate a
>> >> consistent internal identifier for them, based recursively on
>> >> their constituent elements.
>> >
>> > And those would be blank node identifiers, right?
>>
>> No.  They should be some kind of internal identifiers
>> that the tool can use to easily determine "sameness": if two objects
>> have the same internal identifier, then they denote the same object.
> 
> BnodeIDs do exactly this if by 'object' you mean the bnode itself, which 
> I think is what you need when talking about data structures as part of 
> the syntax.

By "object" I mean the entire array, recursively, so that arrays 14a and 
14b in the above example automatically have the same internal 
identifier.  Since the array identifier would be internal -- not exposed 
to users -- it doesn't really matter what the system uses for it, but 
bnode IDs are not normally used as definite identifiers like that.  They 
are normally used to represent existential variables.

>> Those internal identifiers should not be exposed to the user.
> 
> As a matter of user interface design and syntactic sweetness, OK, but 
> surely they play exactly the same role as a bnode inside the actual RDF, 
> so why not call then bnodes? If they are something else, then you have 
> introduced a new class of identifiers into RDF triples and the semantics 
> have to be re-written, etc., all of which fuss seems to me to be 
> unnecessary.

I agree, I don't think there's a need for any fundamentally new class of 
identifiers at the RDF model level.    I think a subclass of existing 
RDF identifiers could fit this purpose just fine.

>> > So what you are suggesting here, if I follow you, is a scheme
>> > for letting systems generate their own bnodeIDs for the innards
>> > of list structures. Fine, but this is not a modification to RDF.
>>
>> Close, but not bnode IDs, but internal identifiers; and not list
>> structures, but arrays.  If those arrays happen to be implemented
>> as list structures, that's fine but immaterial, because the
>> implementation should not be exposed to the user.
> 
> Arrays, OK, Containers, in fact, right? Though starting from 0 instead 
> of 1, and the graph is required to be sensible in the same way, eg no 
> gaps, no double-entries and the largest-numbered item in the graph is 
> the last item. (A slight scent of nonmonotonicity there, but acceptable.)

Yes.

>>
>> >> (This is closely related to RDF canonicalization, which becomes
>> >> trivially easy without explicit blank nodes.)  Eliminating
>> >> explicit blank nodes would mean that we'd lose the convenience
>> >> of not having to allocate an IRI.  I think there are ways to
>> >> address that loss in other ways, but that's another topic.
>> >>
>> >>> But that would depart from their current interpretation, and
>> >>> not necessarily fit all use-cases,
>> >>
>> >> Agreed, but it would fit the most common use cases.  It doesn't
>> >> have to fit *all* use cases.
>> >>
>> >>> so this is not something to decide lightly. This is the kind of
>> >>> semantic rabbit hole that Pat Hayes was warning about earlier
>> >>> in this thread (if I got his point correctly).
>> >>
>> >> I hope Pat will correct me if I'm wrong, but my read of
>> >> the discussion so far is that the semantics would not be a
>> >> big problem: both arrays and composite objects can have very
>> >> straight-forward -- and very similar -- semantics.  And it's clear
>> >> to me at least that although the rdf:aboutEach functionality
>> >> could be useful in some cases, it is not what we need as the
>> >> basic array functionality.  The basic functionality that we need
>> >> is for an assertion about an array to *only* be about that array
>> >> -- not about every element in that array:
>> >>
>> >> ("apples" "bananas" "peaches") ex:length 3 .
>> >
>> > If indeed that is all you want, Pat agrees this would be trivially
>> > straightforward. Pat is however very suspicious that this is
>> > in fact not all that people want, and they they will be writing
>> > things like
>> >
>> > :PatHayes :fatherOf (:SimonHayes :RobinHayes)
>> >
>> > before the ink is dry on the specification document.
>>
>> That's okay!  There would be nothing wrong with writing that,
>> *provided* that you define the :fatherOf property to mean that
>> :PatHayes is the father of every person in that list.  It may then
>> be nonsensical to assert the following:
>>
>>  :PatHayes :fatherOf :SimonHayes .
>>
>> But even that could be perfectly fine to write if you instead
>> define the :fatherOf property to have *conditional* meaning: if
>> the object of the assertion is a person, then it asserts fathership
>> about that person.  But if the object of the assertion is an array,
>> then it asserts fathership about every person listed in that array.
> 
> That last is a cute idea. We could have a name for such properties, call 
> them distributive, giving inference patterns like
> 
> :P rdf:type :DistributiveProperty .
> :A :P (… :Bn …) .
> =>
> :A :P :Bn .
> 
> And there are of course many other possibilities, eg
> 
> :P rdf:type :InitialProperty .
> :A :P (:B0 …)
> =>
> :A :P :B0
> 
> ie just the first item.  And things like ordering, where
> 
> (:B0 … :Bn) rdf:orderedBy :P .
> =>
> :B0 :P :B1 .
> :B1 :P :B2 .
> …
> :Bn-1 :P :Bn .
> 
> eg (2 13 47 128 763 1246) rdf:orderedBy :lessThan .

Absolutely.

>> >> And one other comment . . .
>> >>
>> >> On 9/27/22 09:43, Pierre-Antoine Champin wrote:
>> >>> If we can design other efficient design patterns for conveying
>> >>> order and "closedness" (such as the one proposed above), I
>> >>> believe that the need for representing lists would not be as
>> >>> pressing as suggested in this thread.
>> >>
>> >> Possibly.  But software developers have been using arrays for
>> >> 60+ years, and they *expect* them.  So as a practical matter,
>> >> I think the straight-forward solution is to add proper support
>> >> for arrays, perhaps in a new higher-level RDF 2.0 syntax, to
>> >> avoid breaking any existing RDF or tools.
>> >>
>> >> As Manu Sporny put it, by not having proper array support in RDF,
>> >> we're currently "giving developers a big fat middle finger in
>> >> that area".
>> >
>> > Both you and Manu miss the central point here. RDF is not intended
>> > to be a notation for software developers to create new structures
>> > with. It is not a developer toolkit.
>>
>> That's exactly the weakness that we need to address.
> 
> I disagree. It is not a weakness, and to think it is, is a 
> misunderstanding. Chalk and cheese, etc..

I respectfully disagree on that point.  First of all, I absolutely 
understand the original purpose of RDF, and I *support* that purpose: 
"for representing information in the Web" (in a formal, readily 
machine-processable form, I might add).
http://www.w3.org/TR/rdf-concepts

But I *also* believe:

   1. RDF is also useful for *many* purposes that are *not* on the web. 
  Indeed, I suspect that the largest financial investments in RDF have 
*not* been for "representing information in the Web", but have been for 
other purposes, such as large-scale data integration, biomedical 
research, etc.

   2. RDF is significantly harder to use than it could and should be.

   3. The difficulty in using RDF is severely inhibiting adoption.

   4. *All* uses of RDF would benefit from greater ease of use and 
greater adoption -- *including* the original purpose of RDF.

I do not want to abandon RDF's original purpose at all.  I want to make 
RDF easier to use by a larger community, and this means tackling 
practical issues like allowing users to have easy and obvious RDF ways 
to do things that are trivially easy for them to do in other data 
representations -- such as representing arrays and composite values.

>> > It was designed and intended to be an information exchange
>> > notation. As soon as you give it to developers and say, in effect,
>> > go ahead and play with this and build things with it, then all
>> > of its utility as an information exchange notation is lost,
>> > because whatever meaning one developer intends to express using
>> > the structures she develops will be opaque to any other developer
>> > and any user in a different development environment.
>>
>> They will only be opaque if the predicates that use them are not
>> documented.  If they are properly documented, then the meaning will
>> be clear.  This is true already, of *all* of RDF.  Adding features
>> like arrays and composite objects does not fundamentally change
>> the need to define the meaning of the predicates used.
> 
> In some large sense, I agree. But extending the logical syntax of any 
> language with a semantics - which is what is being proposed here - 
> requires that the semantic framework be extended to cover these cases. 
> You have suggested how to do that, I agree, and it is exactly the 
> 'minimalist' approach I would suggest myself. 

Good, that probably means I got it right.  :)

> But then it needs to be 
> very firmly documented, using all kinds of strict normative language, 
> that this is indeed the semantics everyone should be working with, so 
> for example if anyone wants to express that
> 
>> PatHayes :fatherOf (:SimonHayes :RobinHayes) implies :PatHayes 
>> :fatherOf :SimonHayes .
> 
> then they need to find a way to make this explicit (eg see above) as the 
> semantics of RDF 'arrays' does not itself support this automatically.

Fully agree.

>> The example that you gave above would be meaningless to others if
>> the :fatherOf relation were not defined.
> 
> True, but in a different sense of 'meaning'. Inference machinery does 
> not need to be able to read human documentation, only more RDF. 

Agreed, except that you omitted one crucial thing: the inference 
machinery *also* needs to be able to read user-defined inference rules. 
  We cannot expect the basic RDF machinery to already know all of the 
inference rules that anyone might want to use.

> And the 
> inference patterns for 'arrays' and 'lists' and any other new syntax 
> need to be made explicit enough to support RDF inference engines.

Fully agree, that would be ideal.  But even short of that, it would 
still be beneficial to have arrays and composite values even if the 
associated inference rules that people want to use with them are 
described in prose instead of a machine-processable rules language.

>>  And as I pointed out, it
>> can just as well be defined to make an assertion about one person
>> as it can to make an assertion about every person in a list.
>>
>> > There is a kind of universal understanding about what an RDF triple
>> > 'means': it says that a relation holds between two things, and
>> > it is therefore an assertion about a world consisting of these
>> > things with relations holding between them. This consensus of
>> > intended meaning is perhaps a bit shaky at the edges and under
>> > strain in some places, but still is kind of universally understood
>> > and accepted. But there is no such consensus AT ALL about what a
>> > list is supposed to mean, or what an array is supposed to mean,
>> > when used as part of an assertion about this common 'world' of
>> > things bound together by relations. And because there is no such
>> > consensus, as soon as these structures are given to developers to
>> > build with, what they build will have, at best, an idiosyncratic
>> > meaning private to the community in which the developer is
>> > working. At which point, the entire purpose of RDF is lost.
>>
>> Wait, let's not throw the baby out with the bath.  The meaning
>> will only be private if it is undocumented.  But as I pointed out,
>> that is already true of *all* of RDF.   You have no idea what
>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdbooth.org%2Ffribjam&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=G8iZf7cN2s7jRjAsq45QwRQmopxrjNEqoM52V%2Bc54B0%3D&amp;reserved=0 
>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdbooth.org%2Ffribjam&amp;data=05%7C01%7Cphayes%40ihmc.us%7C4feecfe094cd4b888a1a08daa1b3509d%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638000089395083024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=G8iZf7cN2s7jRjAsq45QwRQmopxrjNEqoM52V%2Bc54B0%3D&amp;reserved=0>> 
>> means unless I document its meaning.
> 
> But I don't need to know that, if I am an RDF inference engine. All I 
> need are the basic rules for processing RDF, as supported by the 
> semantics. But when the RDF syntax gets extended in new ways, I need 
> that same basic guidance for those new extensions. And all that part of 
> the 'meaning' should be provided in the normative specifications.

The basic array and composite value semantics needs to be defined in the 
normative specifications, yes.  But as you point out, that's very 
minimal.  The rest of the semantics should be provided in user-defined 
inference rules or prose that defines the predicates that use those values.

>> The purpose of RDF is *not* lost just because we need to define the
>> meaning of the predicates that use composite objects and arrays, just
>> as we need to define the meaning of everything else we use in RDF.
>>
>> To further clarify, an array such as the following (reusing the
>> Turtle syntax):
>>
>>  (:SimonHayes :RobinHayes )
>>
>> should have no intrinsic meaning beyond the fact that it is an
>> array of two elements denoted by :SimonHayes and :RobinHayes, in
>> that order.
> 
> Fine. As long as we make this (what I called the minimalist semantics, 
> above) absolutely clear, and require all these developers who are 
> clamoring for better RDF datastructures to stick to this, and not 
> accidentally introduce further assumptions about meanings into the apps 
> and software they develop, then everything will be hunky-dory. Forgive 
> me if I am a little cynical about developers having the restraint to 
> accept this kind of discipline.
> 
>>  This allows it to be the exact *same* array --
>> extensionality -- in the following two assertions:
>>
>>  :myStore :customers ( :SimonHayes :RobinHayes ) .
>>  :PatHayes :fatherOf ( :SimonHayes :RobinHayes ) .
>>
>> However, the :customers and :fatherOf predicates can (informally
>> speaking) impart *different* meaning to those arrays by the way that
>> they are defined: the :customers predicate can indicate that every
>> member of that list is a customer of :myStore, and the :fatherOf
>> predicate can indicate that every member has a father :PatHayes.
>>
>> This is no different than using the number 42 in different contexts:
>>
>>  :boston :temperatureFahrenheit 42 .
>>  :marblesInMyPocket :count 42 .
>>
>> Formally speaking, the meaning of 42 is the same in both of
>> those assertions, but it is *used* in different ways by different
>> predicates, so informally/colloquially we might say that it has
>> different meaning in different contexts.  The exact same thing
>> should be true of an array or a composite object.
> 
> As I say, fine. If you can get the world to agree.
>>
>> > Which is why any new syntactic extension to RDF should be given
>> > a semantics as part of its normative definition, and this should
>> > be one that is likely to survive the pressures of how developers
>> > are wanting to use the new structure,
>>
>> I agree that they need a well-defined meaning.  But the meaning
>> defined by the RDF standards can be minimal, because additional
>> meaning can be defined by RDF authors.
>>
>> > For example, if it is clear that some folk really want to use
>> > arrays to express n-ary relations, while others really want to use
>> > them to abbreviate conjunctions but with a closed-world assumption,
>> > then these two groups should be given distinct extensions to RDF
>> > syntax which will not be confused with each other, and each given
>> > a nice crisp semantics and tutorial examples, etc..
>>
>> For arrays and composite objects, I don't think we need different
>> syntaxes for all of the many ways that they may be used, because
>> as demonstrated above, additional meaning can be imparted by the
>> predicates that use them.
> 
> How would you do that for the two cases I sketch, above? One thinks that 
>   :A :P (:B0 … :Bn) .
> really means (changing notation here) :P(:A :B0 …  :Bn), ie that :P is a 
> n+2-ary relation. The other thinks that it means the graph
> :A :P B0 .
> :A :P B1 .
> …
> :A :P Bn .
> ie a conjunction of n+1 binary assertions, together with a kind of CWA 
> about there not being any more of them.

Unless :P were defined to handle both cases, then different predicates 
should be defined for the two cases.  Otherwise one of them would be 
wrong.  If desired, ShEx or SHACL or something else could be used to 
check for this error.

> 
> Neither of these can be expressed in current RDF, of course, so each 
> would require a more drastic kind of extension to the RDF semantics. 

Agreed, but not to the core RDF semantics.  These semantic extensions 
can be defined in inference rules and/or prose, just as we currently do 
for any other semantic extensions.

> And 
> both of them have been explicitly suggested as likely intended meanings 
> in posts in this and other threads (eg see the RDF-star email archives). 
> Don't you think there might be just a faint possibility that some 
> developers might seize upon the opportunity to start using these new 
> structures in these ways?

Yes, but I see the benefit of having arrays and composite values as far 
exceeding the cost of mistakes that people might make in learning to use 
them.

> 
>>  But if commonly used patterns emerge
>> then it *might* be worth adding syntax to support a few of them.
> 
> My concern is that communities will form using a single RDF syntax in 
> these and other ways, all mutually incompatible, and then any attempt to 
> sort out the resulting mess will turn into a turf war about what RDF 
> 'arrays' REALLY mean.

I'm definitely less worried about that, because the documents that 
define the predicates should specify what they mean, so any disagreement 
should be quickly resolvable by RTFM.

Thanks,
David Booth
Received on Saturday, 1 October 2022 20:46:21 UTC