Re: Proposal by Kurt from Franconi Enrico on 2024-07-25 (public-rdf-star-wg@w3.org from July 2024)

From: Franconi Enrico <franconi@inf.unibz.it>
Date: Thu, 25 Jul 2024 14:26:31 +0000
To: Thomas Lörtsch <tl@rat.io>
CC: Kurt Cagle <kurt.cagle@gmail.com>, RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-ID: <9AC77F81-C5E9-4EC2-807C-6A54E7585E6C@inf.unibz.it>
How would you deal with the graph:

:bill-clinton :husband#1 :hillary-rodham .
:42nd-potus :husband#1 :1st-female-NY-senator .
Would you reject it?
—e.

> On 25 Jul 2024, at 14:58, Franconi Enrico <franconi@inf.unibz.it> wrote:
> On 25 Jul 2024, at 12:41, Thomas Lörtsch <tl@rat.io> wrote:
>> 
>>>> :liz :married1 :richard ..
> 
> (…)
> 
>>>> :erin :married1 :richard .
>>>> I have constructed a logically inconsistent statement in RDF, one that can be caught by creating an exception in OWL or SHACL that indicates that if I have such a tpn:pointer, then this should generate at a minimum a warning that you have a singleton property being used with two different subject/object pairs
>>> No!
>>> If :married1 is a singleton property, you don’t get an inconsistent graph, but you get an equality between the denotation of :erin and the denotation of :liz, namely you entail :erin owl:sameAs :liz.
>>> This is easy to see that if you understand the semantics of IRIs and the semantics of singleton properties (check the original paper [1] where the semantics is explained in great detail).
>> 
>> That paper goes to some length discussing the syntactic machinery required to ensure singleton-ness of singleton properties:
>> 
>> "2.3 Enforcing the singleton-ness of property instances
>> 
>> If the property isMarriedTo#1 is asserted another triple such as BarackObama isMarriedTo#1 MichelleObama, this together with the existing assertion isMarriedTo#1 has- Start 1965-11-22 would imply the marriage date of the Obamas is 1965-11-22, which is not true. In order to avoid this, we need to ensure the singleton property isMarriedTo#1 occurs as a property in only one triple.
>> This constraint has to be enforced for all URIs of singleton property instances. Data publishers may combine their URI prefix, the generic property name and the timestamp when the instance is created into the URI of a singleton property to make it unique. However, there are still cases where two instances may share the same URI. Therefore, data publishers may employ the Universally Unique Identifier (UUID) [9], which is also supported by SPARQL and various programming languages, to ensure the singleton-ness of their property instances. The validation of this uniqueness constraint is straightforward, by counting the number of triple occurrences per singleton property. As the current RDF syntax does not allow blank nodes as properties, we do not represent singleton properties as blank nodes, although one advantage of using blank nodes in the property is providing the completeness for deduction rules [11]."
> 
> You are confused by the scope of that Section.
> It is talking about a methodology to minimise the possibility that two graphs to be merged use the same singleton property name for different cases.
> The text suggests how to enforce upfront  "the singleton-ness of their property instances”, namely data publishers should choose the very precise name for their singleton properties.
> The text is silent about what to do in the case of a graph with two different triples with the same singleton property, once it has established that it is indeed the same singleton property.
> If you want to respect the semantics given in the first part of the paper, in order to ensure the singleton-ness of the singleton property instances, you should generate equality between the subjects and between the objects.
> 
>> So they do indeed require a solution as Kurt outlined to support the semantics. I see no contradiction here, rather two perspectives - semantic and syntactic - that complement each other.
> 
> I don’t believe in syntax without semantics :-)
> 
> cheers
> —e.
> 
>> 
>> Best,
>> Thomas
>> 
>>> You can not have your proposal by just considering it as a syntactic preprocessing of Terrapin. It would violate the basic understanding of the *semantic* web stack principles.
>>> cheers
>>> —e.
>>> [1] https://dl.acm.org/doi/10.1145/2566486.2567973

>>>> Put another way, the => notation doesn't resolve the underlying conflict, in the case of a singleton property, but it is not intended to. A singleton property by definition can only apply to one subject/object pair.
>>>> Now, in your example,
>>>> A1 owl:sameAs A2 .   
>>>> <—>   
>>>> A1 [ B  => C1 D1 ] E .
>>>> A2 [ B  => C2 D2 ] E .
>>>> expands to:
>>>> A1 owl:sameAs A2 . #Line 1
>>>> A1 B E . # Line 2
>>>> A2 B E .  # Line 3
>>>> B C1 D1 ; C2 D2 .
>>>> If you go back to the previous assertion that I made - a singleton property can have only one distinct <S,O> , then when you parse the above Turtle, Line 2 and 3 together should generate a compilation error.
>>>> The owl:sameAs assertion in this case is not the same as actually making the two URIs the same except when you have owl:sameAs exposed as a generated inference (which, in effect, generates the permutations of all possible triples involving A1 and A2). Most systems don't use owl:sameAs for precisely that reason. However, assuming that they did, then the statements above should realistically collapse down to
>>>> A1 B E .
>>>> B C1 D1; C2 D2 .
>>>> Again, keep in mind that in the case here B is a singleton property, B should only apply to A1 and E (or put another way,  if I assert
>>>> A1 B E .
>>>> and B is a singleton, I cannot then assert
>>>> A2 B E  .
>>>> I can, however, assert:
>>>> A1 B1 E .
>>>> A2 B2 E .
>>>> B1 owl:sameAs B2 .
>>>> How do I know THAT B1 and B2 are singletons? Because I also have to add the assertions:
>>>> B1 tpn:property B .
>>>> B2 tpn:property B .
>>>> where B is a vanilla predicate.
>>>> Think of a singleton property as actually being a pointer to a property. I can name that property (which is what the => notation does), but the naming is orthogonal to the fact that the pointer itself is both unique and can only have one unique <S,O> per pointer. RDF explicitly states that if two triples have the same <S,P,O> they are the same triple, period. This is why you MUST have singleton properties. All that the named node expressions do in that regard is to make it possible to name these singleton nodes in a more readable manner.
>>>> By the way, this is about the ONLY way that I can see dealing with temporal RDF.
>>>> Consider the following:
>>>> Country:USA Country:hasPresident Person:JoeBiden .
>>>> Country:USA Country:hasPresident Person:DonaldTrump .
>>>> Country:USA Country:hasPresident Person:KamalaHarris .
>>>> All of these assertions are true, but only in the right context. If you want to determine this context, you either have to create multiple third-normal form assertions, or you have to resort to singleton properties:
>>>> Country:USA [:hp1 => tpn:property Country:hasPresident ; :start 2021; :end 2025] Person:JoeBiden .
>>>> Country:USA [:hp2 => tpn:property Country:hasPresident ; :start 2017; :end 2021] Person:DonaldTrump .
>>>> Country:USA [:hp3 => tpn:property Country:hasPresident ; :start 2025; :end 2029] Kamala Harris.
>>>> In SPARQL this is trivial to resolve:
>>>> select ?president ?start ?end where {
>>>>    Country:USA ?singleton ?president .
>>>>    ?singleton tpn:property Country:hasPresident .
>>>>     ?singleton :start ?start .
>>>>     ?singleton :end ?end .
>>>>     filter(now() >= ?start && now() < ?end)
>>>> }
>>>> Kurt Cagle
>>>> Editor in Chief
>>>> The Cagle Report
>>>> kurt.cagle@gmail.com
>>>> 443-837-8725
>>>> On Tue, Jul 23, 2024 at 2:27 AM Franconi Enrico <franconi@inf.unibz.it> wrote:
>>>> Hi Kurt,
>>>> I’m still waiting for a reply to my comments from two months ago about your proposal (Word document attached), which you presented again at our last meeting.
>>>> I have seen that you have posted now your proposal in LinkedIn.
>>>> Let me rephrase the comments again, hoping you will react to them.
>>>> Named Node in the Predicate Position
>>>> Your example:
>>>> :liz [_:married1 => rdf:subPropertyOf :married ;
>>>>                   :hasInterval [ _:interval1 => :start 1964 ; :end 1974]]
>>>>    :richard .
>>>> :married1 must be a *singleton* property.
>>>> This option which has been discussed and dismissed some time ago in the RDF-star WG.
>>>> This introduces owl:sameAs, leading to serious implementation problems.
>>>> Indeed, the following equivalence pattern holds:
>>>> A1 owl:sameAs A2 .   
>>>> <—>   
>>>> A1 [ B  => C1 D1 ] E .
>>>> A2 [ B  => C2 D2 ] E .
>>>> Moreover, the singleton property does not express directly the multi-edge case, since you have to name each edge of the same type with a distinct name.
>>>> From the current RDF-star baseline, the example can be written in Turtle:
>>>> << _:marriage1 | :liz :married :richard >>
>>>>  a :marriage ;
>>>>  :hasInterval [:start 1964 ; :end 1974] .
>>>> This corresponds to the following in N-Triples:
>>>> _:marriage1 rdf:reifies <<( :liz :married :richard )>> .
>>>> _:marriage1 rdf:type :marriage .
>>>> _:marriage1 hasInterval  _:interval1 .
>>>> _:interval1 :start 1965 .
>>>> _:interval1 :end 1974 .
>>>> Reifier Expression
>>>> This is just a rephrase of "option 1” (old style 1.1 reification) we discussed and and dismissed some time ago in the RDF-star WG.
>>>> It has severe drawbacks, e.g., in reconstructing the reifier back from the three reification triples.
>>>> cheers
>>>> —e.
>>>>> On 24 May 2024, at 15:55, Franconi Enrico <franconi@inf.unibz.it> wrote:
>>>>> Hi Kurt,
>>>>> It seems to me that your proposal is a rephrase of various discussions we already had, and ruled out.
>>>>> Named Node in the Predicate Position: this seems to be just a rephrase of the singleton property - once you try to give semantics to it. Observe that, wrt the current status of the discussion, your proposal does not express directly the multi-edge case, since you have to name each edge of the same type with a distinct name.
>>>>> Reifier Expression: this is just a rephrase of "option 1" we discussed and ruled out some time ago. It has severe drawbacks in reconstructing the reifier back from the three reification triples.
>>>>> cheers
>>>>> —e.
>>>>> On 23 May 2024, at 19:24, Kurt Cagle <kurt.cagle@gmail.com> wrote:
>>>>> I've attached a document that covers YET ANOTHER proposal (more properly a recommendation I've made before).
>>>>> There are two issues that we seem to be rehashing here. The first is the question of reificational notation, while the second has to do with LPG harmonization. My contention is that these are different issues, though we can use similar notation for both.
>>>>> Reification
>>>>> A named reification is simply a set of statements:
>>>>> :r rdf:subject :s; rdf:predicate :p; rdf:object :o .
>>>>> This is not a triple. It is three statements about the state that a triple can be in. It does not introduce a triple into the system,it makes no assertions about the truthiness or even, by itself existence of that triple. It is simply a statement about the components that a triple might have. You cannot reason with it directly, though you can use other processes (SPARQL, SHACL, etc.) to construct or verify the existence of triples for which these assertions are true. Properly speaking, the above itself should probably be qualified:
>>>>> :r rdf:subject :s; rdf:predicate :p; rdf:object :o ; a rdf:Reification .
>>>>> The notation << :r | :s :p :o >> makes the above statement more compact, but the reification can apply to any triples within a system, or none at all, regardless of the values.
>>>>> Named Node Expressions
>>>>> I propose, in the attached, that we use a similar nomenclature for what I'm turning named node expressions, to whit:
>>>>> [ ?nn | :p1 :o1 ; :p2 :o2 ]
>>>>> where ?nn is replaced by a formal (not blank) IRI.
>>>>> This is a Turtle (not RDF) syntactical amendment. The above takes what would ordinarily be a blank node and replaces it with a named node:
>>>>> For instance:
>>>>> :liz :hasMarriage [ :marriage 1 | :to :Ricard, :start "1965" ; :end "1975" ].
>>>>> which expands to:
>>>>> :liz :hasMarriage  :marriage 1 .
>>>>> :marriage 1 :to :Richard .
>>>>> :marriage 1  :start "1965" .
>>>>> :marriage 1   :end "1975" .
>>>>> Why is this important? Because the blank node is a pointer to a data structure, but use of the [] notation makes it impossible to reference that data structure from within Turtle. By adding in a named node as the referencing node, you gain that ability, and it is a key ability for modeling.
>>>>> For instance, I can use the expression:
>>>>> :liz :hasMarriage [ :marriage 1 | :start "1965" ; :end "1975"; :to :richard ], [ :marriage 2 | :start "1975" ; :end "1985"; :to :john].
>>>>> This is semantically equivalent to the JSON
>>>>> {"liz":{"hasMarriage":[{"marriage1":{"start":"1965", "end":"1975","to":"richard"}},"marriage1":{"start":"1965", "end":"1975","to":"richard"}}]}}
>>>>> The same thing can be done with both predicate-positioned named node expressions and subject-oriented ones.
>>>>> This addresses the LPG equivalency relationship, and does so without ever touching reifications.
>>>>> Note that this also highlights an important point. Blank nodes are useful because they are unique and system-assigned. However, they are not referenceable. The Turtle notation:
>>>>> :liz :hasMarriage _:b1, _:b2 .
>>>>> _:b1 :start "1965" ; :end "1975"; :to :richard .
>>>>> _:b2 :start "1975" ; :end "1985"; :to :john .
>>>>> is simply a preprocessor directive to replace the "named" nodes with anonymous IRIs in the final indexing.  You still have to make _:b1 and _:b2 unique, or the data structures disintegrate.
>>>>> Anyway, I ask the chair for time during our next meeting to discuss this proposal.
>>>>> Kurt Cagle
>>>>> Editor in Chief
>>>>> The Cagle Report
>>>>> kurt.cagle@gmail.com
>>>>> 443-837-8725
Received on Thursday, 25 July 2024 14:26:37 UTC