Re: RDF lists/arrays and n-ary relations [was Re: OWL and RDF lists] from Thomas Lörtsch on 2022-09-19 (semantic-web@w3.org from September 2022)

From: Thomas Lörtsch <tl@rat.io>
Date: Mon, 19 Sep 2022 17:53:20 +0200
To: Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: David Booth <david@dbooth.org>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-Id: <CCBED28F-E4E3-4DED-B8B1-FA25DD4CD90C@rat.io>
Hi Pierre-Antoine,


in principal I agree that shapes provide useful tooling to describe types of objects - in that respect they are probably all we need, although I’m no expert on shapes. With your other points however I don’t agree.

In my understanding shapes don’t _guarantee_ you anything. They enable you to describe what structure you intend to serve or expect to get. They describe restrictions. That is good and a useful complement to entailment driven approaches like OWL. However only an application would be able to guarantee that any RDF struct it serves obeys such restrictions and that it doesn’t ingest any structure that doesn’t. That may sound like word-mongering but see below why I think more is useful (maybe even necessary) and possible.

Another problem with your argument is that such descriptions still have to be added as extra triples. Either you have to declare that the range of some property is some shape or that some specifc struct, identified by some specific identifier, is of some specific shape. That is rather cumbersome, might get lost to the eye, might necessitate new properties for specific object types etc. That's where an annotation syntax comes in very handy as syntactic sugar. But actually it can be more than just syntactic sugar. Let’s check some examples.

My idea, and I’m not sure if anybody got that or if it’s so cracy that nobody bothered to respond when I pondered it in [0], is a bit more radical. It really defines objects that are not just shortcuts for the ususal RDF triples. Also, the useful possibilities to define these objects go beyond shapes, I guess.

Let’s imagine 
- a shape X
- an annotation syntax @
- an annotation @X can be applied to most entities in RDF

For example
- X could mean List
- the object could be a list (a,b,c)
- the annotation applied to the object would be (@X a,b,c)

The meaning would be that 
- this is not syntactic sugar for an rdf:List struct
- it is a list _object_ with certain properties as described
  by the X shape, which in this case is an rdf:List shape.
  I.e. it is
  ordered, 
  starting with index 0, 
  closed world assumption (no one can add to it, it is complete)
  referentially transparent (a inside means the same as outside)

The crucial thing however is that this list object can be described by an ordinary rdf:List but unlike that rdf:List it has known properties. Noone can add to it. It is itself not governed by the open world assumption. Of course it can be replaced by something else, i.e. an (c,d,e) list or something totally different. So in general the OWA still applies, the change of semantics is constrained to the object itself.
Also any RDF processor knows how to read this list. He can decode it into a first-rest-ladder (or whatever it uses internally to work on rdf:Lists) and work on it, find the second element etc. He just can’t change it. Maybe he can even change it. The crucial thing is probably that he can be sure that the list is well-formed and complete. I’m still a bit muddled about this, admittedly, but I guess this is not really hard to figure out (just needs a little bit more sleep...).

Is that clear? Or confused? As I said above, I’m not totally sure. To me it’s a kind of import mechanism, but not like owl:import which would include the imported ontology and make it fully accessible as standard RDF. It is in a way more like a getter function: the object is encapsulated, its values can be read, the type of the object is well defined, but none of that can be changed from the outside. Again, maybe the read-only drift is ill-devised and a well-formedness guarantee is all we need, at least most of the time.

Now this idea can not only be applied to lists but in the current form it can only be applied to syntactic features that are already there, and it can only refine their current meaning (but not change it to something else - what the annotated object _means_ has to be describable in triples added to the graph). The (a,b,c) list syntax could be annotated as i.e 
    (@SEQ a,b,c), 
    (@ALT a,b,c) and 
    (@BAG a,b,c) 
with the semantics as defined in the rdfs:Container vocabulary, but also as 
    (@OWL a,b,c) 
to discriminate OWL TBox lists from other ABox lists (this would allow both OWL and non-OWL applications to benefit from the ( … ) syntactic sugar for lists. Those lists would all expand to first-rest ladders but it’s up to syntaxes (also query and reasoning syntax) to hide that complexity whenever possible.

Applying annotations to {…} graphs would give us a chance to finally declare their semantics in RDF, i.e. well-defined naming semantics or quotation semantics or closed world semantics. I.e. 
    :G1 {@NAME :a :b :c }
would declare that :G1 unambiguoulsy names that graph and doesn’t also refer to something else too.
    :G2 {@CWA :a :b :c }
would declare that this graph is complete and no more values for :b about :a are to be expected. Combinations should also be possible, like
    :G3 {@NAME,CWA :a :b :c }
but that’s already nitty-gritty. Of course the syntax has to be extensible: we could reserve non-prefixed annotations to RDF and everybody else woud have to use a colon, like 
    :G4 {@:MY_SHAPE :a :b :c }
but that’s even more nitty-gritty. Obviously I haven’t yet thought this through in all detail.

Applying this syntax to <…> IRIs would allow to contextualize references, i.e.
    <@TIME:15thCentury https://paris.com>
would refer to Paris in the 15th century. Expanded to ordinary RDF this would be:
    _:b rdf:value <https://paris.com>; :time :15thCentury.
or to define their semantics
    :thomas :said <@UNA https://ex.com/madness>
would quote what I said but the reference to myself would still be referentially transparent. We had an discussion in the RDF-star CG about why I think this is in certain cases more useful than the quoted triple approach of RDF-star.

Apropos RDF-star: let’s for a moment assume that we drop the << … >> syntax and the proposed semantics but instead introduce nested {…} - just like that. We could then do the following:
    {@STAR :a :b :c } 
is a quoted triple, mostly like << :a :b :c >> is defined per the RDF-star CG report.
    {@STAR :a :b :c . :d :e :f } 
is a quoted graph, mostly like (<< :a :b :c >>,<< :d :e :f >>) as per the RDF-star CG report plus the list hack that you used in your Lotico talk. 
I would however be inclined to define them as asserted because that’s how the {…} is already used in practice all over the place and it would be unwise to try to change that. For unasserted statements either use RDF standard reification, or add another keyword, or indeed use the <<…>> syntax.
We could also do other nice stuff like i.e.
    {@UNA,CWA :a :b :c . :d :e :f } 
declare that graph as closed, following application logic intuitions - something that has been discussed as the next thing that we obviously need to do 20 years ago already, but apparently the process ran out of steam a little. Maybe give that its own keyword
    {@APP :a :b :c . :d :e :f } 
where everything has tightly controlled semantics.

I’ve spend the last years either fighting with you, Pierre-Antoine, in the RDF-star CG or reading hundreds of papers about how to extend the expressive capabilities of RDF. I’ve become increasingly convinced that there are just too many different options of useful semantics, all intuitive in their certain way for their certain application domains, as that it would be possible to extend RDF with just one more formalism and have them all covered. The reservoir of different brackets is very limited. So defining an extensible annotation syntax, standardizing a few options that seem to make the most sense and then see what happens seems to me a much more promising approach than forcing a very rigid semantics on RDF-star, that nobody will follow anyway, that indeed already isn't followed in practice (and that includes examples by the CG and by the CG editor's students). 

Considering shapes for which there is no syntax available in RDF already - which is almost everthing except lists - I could imagine nested {…} constructs as the way to go (I somehow don’t dare to call them 'nested graphs' but let’s be honest: that’s just what they are). So
    :G1 {@NAME 
        :a :b :c . 
        {@SHAPE_X :d :e :f } .
        :g :h (@SEQ :i, :k)
    }
would describe a graph with a proper name, containing some statements, one of which is a well-formed shape and the other contains a well-formed list.


And that’s it for today. Not reading it a third time. Hope it makes sense.
Thomas


[0] https://lists.w3.org/Archives/Public/semantic-web/2022Sep/0006.html


> Am 19.09.2022 um 03:36 schrieb Pierre-Antoine Champin <pierre-antoine@w3.org>:
> 
> Hi David,
> 
> it seems to me that RDF + Shapes + Ontology gives you all this already:
> 
> - Shapes can be used to guarantee that any node with a :disease property also has a :probability property (and vice-versa) -- and that these properties can't have multiple values.
> 
> - Ontologies can be used to guarantee that any two nodes with the same :disease and :probability values are owl:sameAs.
> 
> All your examples would then work with the standard [] syntax instead of the new @[] syntax.
> 
> 
> Note that Shapes + Ontologies can also be used for lists, constraining first/rest ladders to be well-formed. Granted, this would require
> 
> 1) to solve the problem of rdf:first/rdf:rest being not allowed in OWL A-boxes, and
> 2) to extend the SPARQL syntax to make it more convenient to query lists
> 
> but none of it, in my opinion, calls for an extension of RDF itself.
> 
>   pa
> 
> On 18/09/2022 13:20, David Booth wrote:
>> Great discussion!  It seems that lists and n-ary relations are closely related, in that one could view a list as a set of key-value pairs (or predicate-object pairs) of an n-ary relation.
>> 
>> For example, if the Turtle list syntax were used to express a built-in list object -- or more properly an *array* object -- rather than a first-rest ladder of triples, then this example:
>> 
>>   # Example 1
>>   :dogShow winners ( :ginger :bailey ) .
>> 
>> might be almost equivalent to:
>> 
>>   # Example 2
>>   :dogShow :winners [
>>     0 :ginger ;
>>     1 :bailey
>>   ] .
>> 
>> if integers could be used as predicates, which they can in generalized RDF. https://www.w3.org/TR/rdf11-concepts/#section-generalized-rdf
>> 
>> However, example 1 expresses a single triple, whereas example 2 expresses three triples.
>> 
>> In languages that manipulate RDF, such as SPARQL and various programming languages, it is always helpful to have ways to convert between a built-in construct and its constituent parts, and this can either be done implicitly or with explicit operators.  Implicit conversion offers more convenience, but at the price of being more error prone.  For example, if SPARQL did this conversion implicitly, the ordered list of winners from example 1 above might be obtained by:
>> 
>>   # Example 3: implicit conversion from list to set of triples
>>   SELECT ?winner ?index
>>   WHERE {
>>    :dogShow :winners [ ?index ?winner ]
>>    }
>>   ORDER BY ?index
>> 
>> On the other hand, if an explicit "@[ ... ]" operator were instead added to SPARQL, to convert a built-in list to its equivalent set of explicit triples, then the query might look like this:
>> 
>>   # Example 4: explicit conversion from list to set of triples
>>   SELECT ?winner ?index
>>   WHERE {
>>    :dogShow :winners @[ ?index ?winner ]
>>    }
>>   ORDER BY ?index
>> 
>> I'm just making up a possible syntax here for illustrative purposes. Some other syntax might be better.
>> 
>> A method should also be provided to go the other direction: convert a set of triples into the equivalent built-in object.  And although I think that sets and bags would also be useful, I think they could be readily layered on top of lists/arrays if we get proper built-in list/array support.
>> 
>> Example 2 above is strikingly similar to a commonly used idiom for encoding an n-ary relation:
>> 
>>   # Example 5
>>   :christine :diagnosis [
>>     :disease :breastCancer ;
>>     :probability 0.8
>>   ] .
>> 
>> Idioms for n-ary relations are explained in https://www.w3.org/TR/swbp-n-aryRelations/
>> 
>> This similarity that others have pointed out between lists and n-ary relations seems like good news, because it suggests that if we can figure out how to add one to RDF, we can also add the other, and both are sorely needed for convenience.  For reasons why, see:
>> https://github.com/w3c/EasierRDF/issues/74
>> https://github.com/w3c/EasierRDF/issues/20
>> 
>> Example 5 above is really a work-around for the lack of native n-ary relations in RDF.  It expresses three triples:
>> 
>>   # Example 5a -- ntriples for example 5
>>   :christine :diagnosis _:b0 .
>>   _:b0 :disease :breastCancer .
>>   _:b0 :probability 0.8 .
>> 
>> However, inspired by example 4 above, perhaps a similar syntax could be used to write an n-ary relation that would treat Christine's suspected disease and probability as a single entity participating in the :diagnosis relation:
>> 
>>   # Example 6
>>   :christine :diagnosis @[
>>     :disease :breastCancer ;
>>     :probability 0.8
>>   ] .
>> 
>> This differs from example 5 because example 6 expresses a *single* triple that connects :christine with a diagnosis object -- not 3 triples.  The order in which the diagnosis properties are listed has no effect -- they are a set:
>> 
>>   # Example 7a: property order does not matter
>>   @[ :probability 0.8 ; :disease :breastCancer ]
>>      owl:sameAs  @[ :disease :breastCancer ; :probability 0.8 ] .
>> 
>> and adding or removing a property makes it different:
>> 
>>   # Example 7b
>>   @[ :probability 0.8 ; :disease :breastCancer ]
>>      :NOT_sameAs  @[ :disease :breastCancer ; :probability 0.8 :year 2022 ] .
>> 
>> Trying to specify the same property twice should be a syntax error:
>> 
>>   # Example 7c -- INVALID -- SYNTAX ERROR!
>>   :christine :diagnosis @[
>>     :disease :breastCancer ;
>>     :disease :colonCancer ;
>>     :probability 0.8
>>   ] .
>> 
>> But the following would not be a syntax error, even if it may be semantically wrong:
>> 
>>   # Example 7d
>>   :malady owl:sameAs :disease .
>>   :christine :diagnosis @[
>>     :disease :breastCancer ;
>>     :malady :colonCancer ;
>>     :probability 0.8
>>   ] .
>> 
>> And of course, these constructs could be nested as desired.
>> 
>> I think something like this could meet the need for n-ary relations in some future RDF syntax.  And based on previous comments by Pat and Anthony, it sounds like the semantics would not be a problem.
>> 
>> Thanks very much to Thomas, Pat, Anthony and others for a very helpful discussion!
>> 
>> David Booth
>> 
> <OpenPGP_0x9D1EDAEEEF98D438.asc>
Received on Monday, 19 September 2022 15:53:46 UTC