Re: [External] : Our approach to unasserted assertions is ambiguous and lossy [ was: Re: streamlining the baseline] from Souripriya Das on 2024-06-20 (public-rdf-star-wg@w3.org from June 2024)

From: Souripriya Das <souripriya.das@oracle.com>
Date: Thu, 20 Jun 2024 14:47:36 +0000
To: Thomas Lörtsch <tl@rat.io>, Niklas Lindström <lindstream@gmail.com>
CC: "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>
Message-ID: <CY5PR10MB6071542E8DAC09009C137544FAC82@CY5PR10MB6071.namprd10.prod.outlook.com>
Here is an N-Triple version of Thomas' example. I do not see any source of confusion because calling the triple-term <<( :Moon :madeOf :Cheese )>> a theory or idioticClaim really has no association with it being mentionedBy Alice or Carol, respectively. Those are independent assertions in these graphs.

# RDF graph G1:
:r1 rdf:annotationOf   <<( :Moon :madeOf :Cheese )>> . # t1a
:r1 a :Theory . # t1b – has nothing to do with Alice
:r1 :mentionedBy :Alice . # t1c

# RDF graph G2: (assuming that it uses :r1, the same reifier used in G1)
:Moon :madeOf :Cheese . # t2a
:r1 rdf:annotationOf   <<( :Moon :madeOf :Cheese )>> . # t2b
:r1 :mentionedBy :Bob . # t2c

# RDF Graph G1.2 obtained by merging G1 and G2:
:r1 rdf:annotationOf   <<( :Moon :madeOf :Cheese )>> . # combines t1a and t2b
:r1 a :Theory . # t1b – has nothing to do with Alice
:r1 :mentionedBy :Alice . # t1c
:Moon :madeOf :Cheese . # came from #t2a
:r1 :mentionedBy :Bob . # t2c

RDF graph G3: (assuming that it uses :r1, the same reifier used in G1)
:r1 rdf:annotationOf   <<( :Moon :madeOf :Cheese )>> . # t3a
:r1 a :IdioticClaim . # t3b – has nothing to do with Carol
:r1 :mentionedBy :Carol . # t3c

# RDF Graph G1.2.3 obtained by merging G1.2 and G3:
:r1 rdf:annotationOf   <<( :Moon :madeOf :Cheese )>> . # combines t1a and t2b with t3a
:r1 a :Theory . # t1b – has nothing to do with Alice
:r1 :mentionedBy :Alice . # t1c
:Moon :madeOf :Cheese . # came from #t2a
:r1 :mentionedBy :Bob . # t2c
:r1 a :IdioticClaim . # t3b – has nothing to do with Carol
:r1 :mentionedBy :Carol . # t3c

Thanks,
Souri.

________________________________
From: Thomas Lörtsch <tl@rat.io>
Sent: Thursday, June 20, 2024 6:05 AM
To: Niklas Lindström <lindstream@gmail.com>
Cc: public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
Subject: [External] : Our approach to unasserted assertions is ambiguous and lossy [ was: Re: streamlining the baseline]

[Changing the subject line to bring more attention to this issue]

Imagine a graph which contains the following triples:

    << :Moon :madeOf :Cheese >> a :Theory ;
                                :mentionedBy :Alice .

Alice doesn’t endorse that statement, but she is polite and calls it a 'theory’ nonetheless. Then another pair of triples is added to the graph, maybe because new facts come in, maybe because a merge is performed between multiple graphs concerned with that topic. That new pair of triples is:

    :Moon :madeOf :Cheese .
    << :Moon :madeOf :Cheese >> :mentionedBy :Bob .

(it seems like Bob endorses that theory). The addition results in the following graph:

    :Moon :madeOf :Cheese .
    << :Moon :madeOf :Cheese >> a :Theory ;
                                :mentionedBy :Alice .
    << :Moon :madeOf :Cheese >> :mentionedBy :Bob .

Here it is not clear anymore who asserted what, or who just commented on a statement without asserting (and endorsing) it. If one doesn’t know the complete change history of the graph one wouldn’t even know if both, or none of them, asserted and endorsed that statement ':Moon :madeOf :Cheese .' (it could have been there before any annotating satetements).

It is also not possible for Carol to add another pair of triples without getting near the asserted statement that the Moon is made of cheese, as adding her two cents, err triples:

    << :Moon :madeOf :Cheese >> a :IdioticClaim ;
                                :mentionedBy :Carol .

results in the following graph:


    :Moon :madeOf :Cheese .
    << :Moon :madeOf :Cheese >> a :Theory ;
                                :mentionedBy :Alice .
    << :Moon :madeOf :Cheese >> :mentionedBy :Bob .
    << :Moon :madeOf :Cheese >> a :IdioticClaim ;
                                :mentionedBy :Carol .

I.e. Carol is getting dangeruously close to the asserted claim no matter her intentions.

Knowing this we can’t even be sure anymore that Bob actually meant to assert and annotate the statement in question. Maybe he already fell into the same trap.

Plain and simple: the construct of unasserted assertions as currently specified is ambiguous to an untolerable degree. The instrument as such - being able to annotate statements without asserting them - is a rather delicate construct. The last thing it needs is ambiguity over the question if a statement so annotated was asserted or not.

The only way out of this conundrum that I see is to explicitly introduce a type of unasserted term, and by extension define the most standard term type as asserted (as it was in the original proposal of RDF*/RDR) (’standard’ here meaning: the most accessible one, the one with the most straightforward syntax).


Such a more explicit approach also has a massive usability advantage (which is the reason why I started to poke into this issue again): the normal use case is to annotate assertions that are actually asserted. We have no empiric evidence for this, but the fact that CG and WG adopted a special shorthand notation as syntactic sugar to faciltate that case IMO is proof enough that this sentiment is widely shared at least among us.

To make things worse the shorthand annotation syntax is ill conceived, as it approaches the problem from the wrong direction. A standard design strategy would be to cater for the vast majority of use cases first and then try to ease the way for the more involved ones with some syntactic sugar. The Turtle-star syntax does the opposite: it caters for unasserted assertions first and then adds syntactic sugar for annotating asserted assertions. that is pretty bad from a usability standpoint.
However, the real nail in the coffin of the shorthand annotation syntax is the fact that the information that the syntactic sugar conveys - if an annotated statement is also meant to be asserted - gets lost when mapping to the Turtle standard notation, or to any other notation like JSON-LD, or N-triples even. This is intolerable, as it lures users into believing that they have expressed something (they think they unambiguously asserted AND annotated a statement in one go), whereas that crucial detail - "AND" - will actually get lost with the first transformation into another syntax. This is plain and simple a very bad design.


I’m a bit drastic in my portrayal of this issue because I’ve already mentioned it multiple times during the last months, but my concerns have been met with little interest. Of course we also have other problems to solve, but good solutions often come from clever combinations of issues. Currently we are discussing different term types (or type defining properties to the same effect). This could be a very good opportunity to also resolve this issue about annotations on asserted and unasserted statements. I hope that the WG will not much longer hesitate to tackle it. My concrete proposal is to add another syntax for unasserted assertions - see below the quote of my orignal mail in this thread for more detail. It is not completely worked out, but it’s at least a start.


Best,
Thomas



> On 19. Jun 2024, at 13:25, Thomas Lörtsch <tl@rat.io> wrote:
>
> Hi Niklas,
>
> thanks for looking into this!
>
> Am 18. Juni 2024 13:20:35 MESZ schrieb "Niklas Lindström" <lindstream@gmail.com>:
>> HI Thomas,
>>
>> In principle, are you suggesting that an asserted triple should no
>> longer mean just "the triple is in the graph", but either that or when
>> it is encoded as a triple literal which is used as an object of a
>> triple in the graph (with some predefined predicates)?
>
> I’m not sure I understand your concern correctly. Let me try to clarify. You have to distinguish the parts where I speak about the abstract syntax from those where I sketch concrete implementations. The idea for the abstract syntax - and IIUC Pierre-Antoine had the same idea, just formulated as a variant of the CG’s TEP proposal - is to derive all different concrete semantics (asserted, unasserted, ref. transparent, ref. opaque, and/or any/some combination thereof) from one abstract type. The way to describe the desired semantics is by chosing the right property. We already have such properties in the current baseline proposal, namely :reifies and :annotationOf, just without this functionality. Instead we define two different types of abstract triple. The idea is to get away with only one type of abstarct triple, and putting more functionality/information into the properties.
>
>> I.e. that,
>> AFAICS, the RDF semantics have to be extended so that each extension
>> of a property also must take into account, recursively, some other
>> property extensions, wherein an extension membership (a "statement")
>> is encoded as a literal?
>
> I'm afraid I can't follow. And please note that the literal aspect is totally optional, the proposal is just as valid with the abstract triple term (and its <<(…)>> syntax) as we have it.
>
>> And, presumably, this would similarly have to
>> be added to the definition of basic pattern matching in SPARQL?
>
> This is a question that I can mostly only speculate about. If the definition of SPARQL as "Turtle with holes" carries far enough then the different syntactic variants should allow to focus precisely on asserted/unasserted/transparent/opaque statements. The details might be harder to get right: how to query over all asserted statements, no matter if opaque or transparent? How to to dismabiguate results that stem from semantically different statements? I’m not sure if the current proposal of SPARQL-star captures such nuances (because I haven’t investigated this issue enough).
>
>> While this is not where my expertise lies, I have a hard time seeing
>> how this simplifies what an asserted triple is. Unless I'm missing
>> something, I'd rather stay on the currently agreed route where
>> assertion and reification are separate and complementary.
>
> They don’t complement so much as they should. E.g., can you tell for sure from the following example if Bob actually did assert  <:s :p :o >?
>
>    :s :p :o .
>    << :s :p :o >> :b :Bob ;
>                   :c :Carol .
>
> Short answer: you can’t, as at some point all the available information might have been:
>
>    << :s :p :o >> :b :Bob .
>
> So no endorsement of < :s :p :o > by Bob. And only later were the following two triples added (maybe through merge, maybe because new information came in):
>
>    :s :p :o .
>    << :s :p :o >> :c :Carol .
>
> Ergo this is ambiguous. Given that use cases for unasserted assertions are often concerned with modelling information with a high degree of precision, this can’t be treated as a rare side-issue, but has to be understood as a real problem: this approach to unasserted assertions claims to solve a specific problem (namely to facilitate unasserted assertions), but it doesn’t provide a solid solution. Nonetheless it makes normal usage quite a bit harder (at least when one doesn’t or can’t use the annotations syntax).
>
> The annotation syntax provides syntactic sugar to ease authoring (and querying I suppose?), but that sugar dissolves irrevocably as soon as the data is converted to N-triples. We talked a lot about how important N-Triples are as the format in which data is sent over the wire, especially in streaming contexts. I have to take the word of practitioners for it, but when I do I have to conclude that the way we currently handle unasserted assertions is dead in the water.
>
> I looked at the use cases today to see how many of them require unasserted assertions: not too many, I suppose. The fact that we have extra syntactic sugar to annotate asserted statements speaks for itself: the sentiment that those are the norm rather than the exception is not only my personal impression. This is another reason to not burden the standard case with an extra syntax.
>
> The normal approach is to facilitate corner cases with syntactic sugar, and let those expand to a standard form that converts all the information encapsulated in syntactic sugar into standard idioms. The current approach however makes the standard use case dependent on syntactic sugar, and loses information when converting to standard form.
>
>> The use of literals, as discussed, also suffers from the problem of
>> blank node identifiers [1], which are not global, but local to an RDF
>> document *at parse time*. I believe the CDT [2] proposes to treat each
>> literal as its own document. But this is not easily (if at all) made
>> workable for the needs of RDF-star.
>
> You may be right w.r.t. blank nodes and literals. So let's keep the "abstract term" as the basic building block to keep the discussion focused.
> However, in principle I guess it shouldn’t be too hard to define some parsing rules that determine how blank nodes in RDF literals are to be interpreted. Those rules would again be triggered via the properties that "unfold" those literals into annotated statements.
>
> Best,
> Thomas
>
>> Best regards,
>> Niklas
>>
>> [1]: <https://urldefense.com/v3/__https://www.w3.org/TR/rdf11-concepts/*section-blank-nodes__;Iw!!ACWV5N9M2RV99hQ!KH55FHbA76OW55mGSmTOxSyNZFGO5z-gFhfTL2PoTacH3DmdpmdD345U0Qk-G9ycdeMaJjsFlTq7$ >
>> [2]: <https://urldefense.com/v3/__https://awslabs.github.io/SPARQL-CDTs/spec/latest.html*dfn-bnl2bn__;Iw!!ACWV5N9M2RV99hQ!KH55FHbA76OW55mGSmTOxSyNZFGO5z-gFhfTL2PoTacH3DmdpmdD345U0Qk-G9ycdeMaJrmtAdpF$ >
>>
>>
>> On Sun, Jun 16, 2024 at 10:19 PM Thomas Lörtsch <tl@rat.io> wrote:
>>>
>>> as was discussed in the semantics TF meeting last friday we could streamline the abstract syntax of the baseline proposal by deriving triple term occurrences with different semantics from only one primitive, the abstract triple term - the '<<( :s :p :o) >>' in n-triples - by defining appropriate properties, e.g. for referentially transparent occurrences
>>>
>>>    :reified_x rdf-star:reificationOf <<( :s :p :o )>> .
>>>
>>> and for referentially opaque occurrences
>>>
>>>   :annotated_y rdf-star:annotationOf <<( :s :p :o )>> .
>>>
>>> this seemed like a good idea to some people (including me) and IIUC Enrico might incorporate it into an update to the baseline proposal.
>>>
>>>
>>> for one, it occurred to me later (again, and the original idea is not mine) that we could then also replace the abstract triple term by a literal of a to be defined RDF datatype, e. g.:
>>>
>>>    :reified_x rdf-star:reificationOf ":s :p :o"^rdf:ttl .
>>>    :annotated_y rdf-star:annotationOf ":s :p :o"^rdf:ttl .
>>>
>>> it seems that this wouldn't require any change to the RDF 1.1 abstract syntax at all, which would be nice. also, as recent works by the Amazon Neptune team have shown, there is more interest in and more uses for an RDF literal datatype. it would be a generically useful addition to the RDF toolbox.
>>>
>>>
>>> secondly, i'd like us to think again about the different semantics we need. both options discussed in the current proposed baseline are unasserted, one of them is referentially transparent and the other one to some degree opaque. it bothers me a great deal that we provide no option to assert and annotate statements in one go (the same when querying). this makes statement annotation unnecessarily tedious in the majority (*) of use cases (and it introduces ambiguity when annotated statements occur both in asserted and unasserted form). we should change this and make the referentially transparent triple occurrence asserted (as it was conceived in the original RDF* proposal) whereas the referentially opaque variant may remain unasserted.
>>>
>>> we could also define all 4 permutations of asserted/unasserted and referentially transparent/opaque, e.g.
>>>
>>>   # stating - asserted, ref. transparent
>>>   ## asserting and annotating in one go
>>>   ## n-triples
>>>   :a rdf-star:statingOf ":s :p :o"^rdf:ttl ;
>>>       :some :Attribute.
>>>   ## turtle
>>>   << :s :p :o >> :some :Attribute.
>>>
>>>   # mention - unasserted, ref. transparent
>>>   ## citing, but not endorsing, without syntactic precision (eg indirect speech)
>>>   ## n-triples
>>>   :b rdf-star:mentionOf ":s :p :o"^rdf:ttl ;
>>>       :some :Attribute.
>>>   ## turtle
>>>   <<< :s :p :o >>> :some :Attribute.
>>>
>>>   # posit - asserted, ref. opaque
>>>   ## stating with exactly these terms (eg functional mapping from LPG)
>>>   ## n-triples
>>>   :c rdf-star:positingOf ":s :p :o"^rdf:ttl ;
>>>       :some :Attribute.
>>>   ## turtle
>>>   <<" :s :p :o  ">> :some :Attribute.
>>>
>>>   # quote - unasserted, ref. opaque
>>>   ## quoting, not asserting, exactly those terms (eg versioning)
>>>   ## n-triples
>>>   :d rdf-star:quotingOf ":s :p :o"^rdf:ttl ;
>>>       :some :Attribute.
>>>   ## turtle
>>>   <<<" :s :p :o ">>> :some :Attribute.
>>>
>>> naming of the variants and syntax variations are of course tentative proposals.
>>> also i've omitted the optional explicit occurrence identifiers in the turtle syntax.
>>>
>>>
>>> all 4 variants make sense, but the first one is without reasonable doubt the most important one.
>>>
>>>
>>> how precisely referential opacity is to be defined is another question. the RDF literal shouldn't  be constrained to single statements and bnodes might be treated decently by annotating their concise bounded description. alternatively the CG semantics of always transparent bnodes might be used. IRIs in ref. opaque posits and quotes would denote, but not co-denote. fully uninterpreted, syntactic opacity can be achieved via the RDF literal itself.
>>>
>>>
>>> the abstract syntax wouldn't change at all, but the semantics would be extended by formalizations of the different types of annotated terms: stating (same as RDF 1.0/1), mention, posit and quote.
>>>
>>>
>>> best,
>>> thomas
>>>
>>>
>>> (*) Andy might again denounce this as a sweeping claim, but in the absence of solid empiric data what else can we do but make good guesses.
>
Received on Thursday, 20 June 2024 14:47:57 UTC