Re: streamlining the baseline from Thomas Lörtsch on 2024-06-19 (public-rdf-star-wg@w3.org from June 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Wed, 19 Jun 2024 13:25:55 +0200
To: Niklas Lindström <lindstream@gmail.com>
Cc: public-rdf-star-wg@w3.org
Message-Id: <96FCF5F2-313B-4FBE-9F81-CE353F91DAEF@rat.io>
Hi Niklas,

thanks for looking into this! 

Am 18. Juni 2024 13:20:35 MESZ schrieb "Niklas Lindström" <lindstream@gmail.com>:
> HI Thomas,
> 
> In principle, are you suggesting that an asserted triple should no
> longer mean just "the triple is in the graph", but either that or when
> it is encoded as a triple literal which is used as an object of a
> triple in the graph (with some predefined predicates)?

I’m not sure I understand your concern correctly. Let me try to clarify. You have to distinguish the parts where I speak about the abstract syntax from those where I sketch concrete implementations. The idea for the abstract syntax - and IIUC Pierre-Antoine had the same idea, just formulated as a variant of the CG’s TEP proposal - is to derive all different concrete semantics (asserted, unasserted, ref. transparent, ref. opaque, and/or any/some combination thereof) from one abstract type. The way to describe the desired semantics is by chosing the right property. We already have such properties in the current baseline proposal, namely :reifies and :annotationOf, just without this functionality. Instead we define two different types of abstract triple. The idea is to get away with only one type of abstarct triple, and putting more functionality/information into the properties.

> I.e. that,
> AFAICS, the RDF semantics have to be extended so that each extension
> of a property also must take into account, recursively, some other
> property extensions, wherein an extension membership (a "statement")
> is encoded as a literal?

I'm afraid I can't follow. And please note that the literal aspect is totally optional, the proposal is just as valid with the abstract triple term (and its <<(…)>> syntax) as we have it.

> And, presumably, this would similarly have to
> be added to the definition of basic pattern matching in SPARQL?

This is a question that I can mostly only speculate about. If the definition of SPARQL as "Turtle with holes" carries far enough then the different syntactic variants should allow to focus precisely on asserted/unasserted/transparent/opaque statements. The details might be harder to get right: how to query over all asserted statements, no matter if opaque or transparent? How to to dismabiguate results that stem from semantically different statements? I’m not sure if the current proposal of SPARQL-star captures such nuances (because I haven’t investigated this issue enough).

> While this is not where my expertise lies, I have a hard time seeing
> how this simplifies what an asserted triple is. Unless I'm missing
> something, I'd rather stay on the currently agreed route where
> assertion and reification are separate and complementary.

They don’t complement so much as they should. E.g., can you tell for sure from the following example if Bob actually did assert  <:s :p :o >?

    :s :p :o .
    << :s :p :o >> :b :Bob ;
                   :c :Carol .

Short answer: you can’t, as at some point all the available information might have been: 

    << :s :p :o >> :b :Bob .

So no endorsement of < :s :p :o > by Bob. And only later were the following two triples added (maybe through merge, maybe because new information came in):

    :s :p :o .
    << :s :p :o >> :c :Carol .

Ergo this is ambiguous. Given that use cases for unasserted assertions are often concerned with modelling information with a high degree of precision, this can’t be treated as a rare side-issue, but has to be understood as a real problem: this approach to unasserted assertions claims to solve a specific problem (namely to facilitate unasserted assertions), but it doesn’t provide a solid solution. Nonetheless it makes normal usage quite a bit harder (at least when one doesn’t or can’t use the annotations syntax).

The annotation syntax provides syntactic sugar to ease authoring (and querying I suppose?), but that sugar dissolves irrevocably as soon as the data is converted to N-triples. We talked a lot about how important N-Triples are as the format in which data is sent over the wire, especially in streaming contexts. I have to take the word of practitioners for it, but when I do I have to conclude that the way we currently handle unasserted assertions is dead in the water.

I looked at the use cases today to see how many of them require unasserted assertions: not too many, I suppose. The fact that we have extra syntactic sugar to annotate asserted statements speaks for itself: the sentiment that those are the norm rather than the exception is not only my personal impression. This is another reason to not burden the standard case with an extra syntax. 

The normal approach is to facilitate corner cases with syntactic sugar, and let those expand to a standard form that converts all the information encapsulated in syntactic sugar into standard idioms. The current approach however makes the standard use case dependent on syntactic sugar, and loses information when converting to standard form.

> The use of literals, as discussed, also suffers from the problem of
> blank node identifiers [1], which are not global, but local to an RDF
> document *at parse time*. I believe the CDT [2] proposes to treat each
> literal as its own document. But this is not easily (if at all) made
> workable for the needs of RDF-star.

You may be right w.r.t. blank nodes and literals. So let's keep the "abstract term" as the basic building block to keep the discussion focused. 
However, in principle I guess it shouldn’t be too hard to define some parsing rules that determine how blank nodes in RDF literals are to be interpreted. Those rules would again be triggered via the properties that "unfold" those literals into annotated statements. 

Best, 
Thomas 

> Best regards,
> Niklas
> 
> [1]: <https://www.w3.org/TR/rdf11-concepts/#section-blank-nodes>
> [2]: <https://awslabs.github.io/SPARQL-CDTs/spec/latest.html#dfn-bnl2bn>
> 
> 
> On Sun, Jun 16, 2024 at 10:19 PM Thomas Lörtsch <tl@rat.io> wrote:
>> 
>> as was discussed in the semantics TF meeting last friday we could streamline the abstract syntax of the baseline proposal by deriving triple term occurrences with different semantics from only one primitive, the abstract triple term - the '<<( :s :p :o) >>' in n-triples - by defining appropriate properties, e.g. for referentially transparent occurrences
>> 
>>     :reified_x rdf-star:reificationOf <<( :s :p :o )>> .
>> 
>> and for referentially opaque occurrences
>> 
>>    :annotated_y rdf-star:annotationOf <<( :s :p :o )>> .
>> 
>> this seemed like a good idea to some people (including me) and IIUC Enrico might incorporate it into an update to the baseline proposal.
>> 
>> 
>> for one, it occurred to me later (again, and the original idea is not mine) that we could then also replace the abstract triple term by a literal of a to be defined RDF datatype, e. g.:
>> 
>>     :reified_x rdf-star:reificationOf ":s :p :o"^rdf:ttl .
>>     :annotated_y rdf-star:annotationOf ":s :p :o"^rdf:ttl .
>> 
>> it seems that this wouldn't require any change to the RDF 1.1 abstract syntax at all, which would be nice. also, as recent works by the Amazon Neptune team have shown, there is more interest in and more uses for an RDF literal datatype. it would be a generically useful addition to the RDF toolbox.
>> 
>> 
>> secondly, i'd like us to think again about the different semantics we need. both options discussed in the current proposed baseline are unasserted, one of them is referentially transparent and the other one to some degree opaque. it bothers me a great deal that we provide no option to assert and annotate statements in one go (the same when querying). this makes statement annotation unnecessarily tedious in the majority (*) of use cases (and it introduces ambiguity when annotated statements occur both in asserted and unasserted form). we should change this and make the referentially transparent triple occurrence asserted (as it was conceived in the original RDF* proposal) whereas the referentially opaque variant may remain unasserted.
>> 
>> we could also define all 4 permutations of asserted/unasserted and referentially transparent/opaque, e.g.
>> 
>>    # stating - asserted, ref. transparent
>>    ## asserting and annotating in one go
>>    ## n-triples
>>    :a rdf-star:statingOf ":s :p :o"^rdf:ttl ;
>>        :some :Attribute.
>>    ## turtle
>>    << :s :p :o >> :some :Attribute.
>> 
>>    # mention - unasserted, ref. transparent
>>    ## citing, but not endorsing, without syntactic precision (eg indirect speech)
>>    ## n-triples
>>    :b rdf-star:mentionOf ":s :p :o"^rdf:ttl ;
>>        :some :Attribute.
>>    ## turtle
>>    <<< :s :p :o >>> :some :Attribute.
>> 
>>    # posit - asserted, ref. opaque
>>    ## stating with exactly these terms (eg functional mapping from LPG)
>>    ## n-triples
>>    :c rdf-star:positingOf ":s :p :o"^rdf:ttl ;
>>        :some :Attribute.
>>    ## turtle
>>    <<" :s :p :o  ">> :some :Attribute.
>> 
>>    # quote - unasserted, ref. opaque
>>    ## quoting, not asserting, exactly those terms (eg versioning)
>>    ## n-triples
>>    :d rdf-star:quotingOf ":s :p :o"^rdf:ttl ;
>>        :some :Attribute.
>>    ## turtle
>>    <<<" :s :p :o ">>> :some :Attribute.
>> 
>> naming of the variants and syntax variations are of course tentative proposals.
>> also i've omitted the optional explicit occurrence identifiers in the turtle syntax.
>> 
>> 
>> all 4 variants make sense, but the first one is without reasonable doubt the most important one.
>> 
>> 
>> how precisely referential opacity is to be defined is another question. the RDF literal shouldn't  be constrained to single statements and bnodes might be treated decently by annotating their concise bounded description. alternatively the CG semantics of always transparent bnodes might be used. IRIs in ref. opaque posits and quotes would denote, but not co-denote. fully uninterpreted, syntactic opacity can be achieved via the RDF literal itself.
>> 
>> 
>> the abstract syntax wouldn't change at all, but the semantics would be extended by formalizations of the different types of annotated terms: stating (same as RDF 1.0/1), mention, posit and quote.
>> 
>> 
>> best,
>> thomas
>> 
>> 
>> (*) Andy might again denounce this as a sweeping claim, but in the absence of solid empiric data what else can we do but make good guesses.
Received on Wednesday, 19 June 2024 11:26:06 UTC