Re: RDF-star “baseline” document from Thomas Lörtsch on 2024-06-07 (public-rdf-star-wg@w3.org from June 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Fri, 7 Jun 2024 18:11:12 +0200
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: Niklas Lindström <lindstream@gmail.com>, public-rdf-star-wg@w3.org
Message-Id: <6C3ADD03-D27B-4417-8A0F-6B07D939A8B2@rat.io>
> On 7. Jun 2024, at 15:20, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> You appear to be proposing using a datatype as a way of achieving fully opaque triple terms.   This has, I believe, been proposed before.  

Quite a few times by me, and by Antoine Z. Maybe others, but I’m not sure. But I was pleasently surprised to see the recent AWS work on composite datatypes take the principal idea up (for lists and maps only, but that’s not an issue).

> I'm not totally against this.

Well, that’s a start ;-)

> The other part of your proposal appears to make triple terms not be unique, so each occurrence of a triple term in an input document gives rise to a separate node in the graph, just like regular RDF reification does not produce a unique statement for a given combination of subject, predicate, and object.  I've been rather in favour of this, but it hasn't had much support from the working group.  Proposals that advocate using nodes that are connected to unique triple terms approximate this quite closely, with the advantage of being able to have the connected node being an IRI and thus being able to refer to it elsewhere.

The WG seems to agree to focus on triple terms as occurrences of the abstract triple (type). AFAICT all recent and current proposals are based on occurrences and provide identifiers to disambiguate and refer to them. Those identifiers are in most cases called 'reifier'. 

Current proposals don’t seem to bother about a way to map those triple term occurrences to good oldfashioned RDF, but like you (and at least Niklas, IIUC) I would like such a mapping to be provided. However, I’m not convinced anymore that RDF standard reification is the most useful way to provide such a mapping. IMO instantiation a la singleton properties is better able to capture the use case of qualification. I assume qualification to be the primary use case in LPG, I see qualification as better fitting the axiom that "one persons metadata is another persons data", and I see "real" application-specific metadata being covered by RDF 1.1 named graphs. Also, every once in a while the question pops up what reification really is and I noted that in your recent discussion with Bryan on this very question you rather provided arguments on what it is not than what it actually is. While I do think that I have a good enough grasp of the concept (and you certainly do) it has to be noted that it’s not necessarily a straightforwad concept. I would argue that instantiation is easier to understand, at least for developers, but not only for them. The difference between "an apple" and some specific apple (bigger, smaller, sweeter, more red than the generic apple) is an ubiquituous concept, and so is instantiation. I figure that down the road this will be helpful in resolving any sure to surface thorny issues about the "right" interpretation of annotations. And instantiation is at the core of the singleton property proposal, but I’m increasingly trying to avoid the term because it seems to make too many people think they already know what I'm about to say.

Thomas


> peter
> 
> 
> On 6/7/24 07:45, Thomas Lörtsch wrote:
>>> On 6. Jun 2024, at 16:12, Niklas Lindström <lindstream@gmail.com> wrote:
>>> 
>>> I do share these concerns, as well many of the concerns that Thomas
>>> expressed (the unasserted aside; I am not as worried about that).
>> But you can’t deny that they require a lot of effort - from users, not just from us - to meet a pretty special and niche need. If I’d argue with the charter (which in general I find a pretty uninspired approach ;-) I’d say they are out of scope.
>> But back to your proposal.
>> You go back to the very restrictive CG semantics, and derive less restrictive variants from it. As back then in the CG I’m convinced that that’s an approach bound to be trampled to death in practice. It just demands too much, for reasons that are pretty obscure to normal users and use cases.
>> But consider a combination of a very mainstream approach - triple terms are asserted, referentially transparent, and tokens - and a second, very restrictive mechanism to derive all the special needs from. That second mechanism could be RDF literals, which pretty obviously are opaque and unasserted, so require very little mental effort to be interpreted correctly by unassuming users. Then either define special syntactic variants for unasserted transparent and asserted opaque semantics, or just define properties to create surrogate identifiers with these or even more obscure properties.
>> Such a design IMO would pretty squarely meet the 80/20 requirement: most things can be done easily, the rest can be done without too much effort. The CG approach missed that goal by miles, and the current baseline, although much better already, misses it either. But it’s not hard to achieve, quite to te contrary.
>> Best,
>> Thomas
>>> I know that once one has grasped the distinction, the choice of opaque
>>> or transparent is "obvious"; but history has shown that this is not
>>> easy to grasp. It's not about the "quotes", it's about the distinction
>>> between tokens and interpretations. And not even the formal syntax (g)
>>> vs. interpretation/model (I), but about exposing it in the domain of
>>> discourse; for developers and users with a wide range of backgrounds,
>>> training and assumptions. In a way, I think this proposal is good in
>>> part *because* it can show how difficult that can become.
>>> 
>>> As Andy also replied, we did talk about there being a connection
>>> between the opaque and transparent triples. But I'm not sure this
>>> baseline proposal explains how to make the connection. And I agree,
>>> there should be one (perhaps even must be, to prevent users from
>>> accidentally painting their data into a corner).
>>> 
>>> Taking as much as I could into consideration, I've written an
>>> alternative proposal, attempting to simplify this by removing
>>> transparent triples (gasp!) and then betting on it being feasible to
>>> entail transparent statements from their tokens.
>>> 
>>> I just put it at [1], *far* too late for the call today. But based on
>>> where the discussion goes, it might be up for debate on tomorrow's
>>> SemTF telecon. I know e.g. Enrico won't like it -- I'm not even sure
>>> *I* do -- but if the opaque functional "triple token" point is deemed
>>> necessary, it may be better to root everything in that; *if* it can
>>> also be "peeked into".
>>> 
>>> (Its Achilles' heel is probably the notion of a "B-function" (from
>>> Dörthe's options) to go from a literal-like triple to its
>>> interpretation. It also adds a "hop" to get to the "real reifier",
>>> using a qualifiedBy relation. I do think there is a common prior
>>> pattern to that though, so, for better or worse, it may be more
>>> recognizable... It echoes what I've seen in Wikidata, as well as
>>> "option 2: sugar+" from the "seeking consensus" table [2]. There is
>>> also no apparent need for a naming syntax with this alternative (it is
>>> neutral to that).)
>>> 
>>> All the best,
>>> Niklas
>>> 
>>> [1]: <https://github.com/w3c/rdf-star-wg/wiki/Proposal:-Triple-Tokens-Entailing-Reified-Statements>
>>> [2]: <https://htmlpreview.github.io/?https://github.com/w3c/rdf-star-wg/blob/main/docs/seeking-consensus-2024-01.html>
>>> 
>>> 
>>> On Thu, Jun 6, 2024 at 2:13 PM Peter F. Patel-Schneider
>>> <pfpschneider@gmail.com> wrote:
>>>> 
>>>> I have three concerns with this as a baseline.
>>>> 
>>>> First, it is complex, with two different kinds of triple terms.  I think that
>>>> the baseline should be a simple extension that meets the requirements of most
>>>> of the use cases.
>>>> 
>>>> Second, opaque triple terms are completely opaque, with blank nodes treated
>>>> just like IRIs.  Although there is a use case that requires opaque blank nodes
>>>> I don't see how opaque blank nodes are suitable for use cases like annotations
>>>> or provenance.
>>>> 
>>>> Third, there does not appear to be any connection between transparent and
>>>> opaque triple terms.
>>>> 
>>>> peter
>>>> 
>>>> 
>>>> On 6/3/24 17:29, Franconi Enrico wrote:
>>>>> Hi all,
>>>>> as promised, I’ve prepared a document defining the current status of RDF-star,
>>>>> according to what I understood from our latest chats.
>>>>> It is mainly a merge of the two previous documents about the two profiles.
>>>>> 
>>>>> The idea is that RDF with simple interpretations has two triple terms
>>>>> (transparent and opaque) and unrestricted syntax for them. There is no other
>>>>> adde special vocabulary.
>>>>> On the other hand, RDF with RDF interpretations introduces the special
>>>>> vocabulary for reification, restricts the syntax of triple terms as usual (the
>>>>> “well formed” fragment), and specifies the functionality of the annotation in
>>>>> the reification of opaque triple terms.
>>>>> 
>>>>> You may notice that I changed rdf:annotationOf with rdf:hasAnnotation, in
>>>>> order to allow for direct literal annotation to opaque triple terms - not
>>>>> orthodox but useful I guess.
>>>>> 
>>>>> Here it is:
>>>>> https://github.com/w3c/rdf-star-wg/wiki/RDF-star-"baseline"
>>>>> <https://github.com/w3c/rdf-star-wg/wiki/RDF-star-"baseline">
>>>>> 
>>>>> 
>>>>> Cheers
>>>>> —e.
>>>>> 
>>>>> 
>>>> 
>>> 
>
Received on Friday, 7 June 2024 16:11:21 UTC