Re: [External] : Re: UCR

Please use the RDF-star Working Group <public-rdf-star-wg@w3.org> for these RDF related discussions. Thanks.
________________________________
From: Thomas Lörtsch <tl@rat.io>
Sent: Friday, June 14, 2024 5:37 AM
To: Niklas Lindström <lindstream@gmail.com>
Cc: public-rdf-star@w3.org <public-rdf-star@w3.org>
Subject: [External] : Re: UCR



> On 13. Jun 2024, at 15:55, Niklas Lindström <lindstream@gmail.com> wrote:
>
> Hi Thomas,
>
> On Wed, Jun 12, 2024 at 11:33 PM Thomas Lörtsch <tl@rat.io> wrote:
>>
>>
>>
>>> On 10. Jun 2024, at 22:53, Niklas Lindström <lindstream@gmail.com> wrote:
>> [...]
>>> To aid in this assessment, I made a short presentation (6 slides) at
>>> [1], focusing on what I think is the most pertinent question at hand:
>>> "Tokens and/or Reifiers?"
>>>
>>> Best regards,
>>> Niklas
>>>
>>> [1]: https://urldefense.com/v3/__https://docs.google.com/presentation/d/e/2PACX-1vQd9lU1j4TPxluCe-cB0t7_BUpy8zAfeY_5hDlbwIyOB8wsiRqkRtSFP4AeflV5UsE4EqT-Y3_Jjx9q/pub__;!!ACWV5N9M2RV99hQ!N2fR2AVp3XukywdW9TaOEGTACVAfL1023gYaAufXrQR8Qw0MbHZSHnctoZTxrxQbZ9HgMPRjSUb0$

>>
>> Not a fan of slide decks since they are hard to comment on.
>
> To spare the list some prose for a while, I tried a more visual
> approach, primarily for perspective, I think we need to share more
> perspective and interpretations of examples, and *retain* that. This
> isn't it; it's seeking that.
>
>
>> On slide "Use Case Categories" you broadly distinguish two categories:
>> 1. Token provenance - to which timestamps, source, and level of trust can be assigned
>> 2. Statement qualification - about detailed circumstances such as events or situations
>>
>> I agree in principle/roughly, but I think we can do better:
>>
>> A) about the statement as a whole, i.e an entitiy in its own right
>> B) about the kind of relation described by the statement
>>
>> A is of course very well suited to describe provenance, but also versioning, plausibility, propositional attitude, etc. However, the crucial aspect is that it talks about the statement as a whole, as an object in its own right. Annotating that object doesn’t change the assertion it represents, it only comments on it.
>>
>> B on the other hand qualifies the relation. It may add that a "likes" relation is indeed strongly felt, that a "buys" relation was performed via eletronic payment, etc. It might even go further and comment on properties of the subject and object at the time the relation existed, maybe that Paul was in a hurry when he bought the ticket, but such detail seems out of scope to this WG. However, I don’t find the notion of "events or situations" helpful to clarify the distinction between 1/A and 2/B.
>>
>>
>> What bugs me right now is that
>> - reification is well suited to represent 1/A
>> - instantiation via singleton properties is better suited to represent 2/B.
>> However, who wants to complicate things even further than they are already?!
>>
>> But, w.r.t. to the discussion about what reification actually is, if occurrences are the right concept/term, etc, I think it’s important that we agree on the categorization A/B. I hope you find it clearer than 1/2, but maybe you can come up with an even better abstraction.
>
> I think your categories represent different delineations. It appears
> to deal with different specializations of statements (closer to tokens
> of "1", or sub-relationships), not necessarily the wider notion of
> reifiers ("2").

The term reifier was derived from the concept of reification, which is the mapping to RDF that was as identified by the WG as being the concept that we try to implement, even if we decided not to map the triple term closely to the RDF reification vocabulary. However, we don’t seem to have a solid shared understanding in the WG of what the concept of reification actually means. To me it means to create a reference to a statement by describing it. The statement so described doesn’t even need to actually exist in the data. The air gap that this creates between the description and what it describes ensures that annotating the description, referenced via the reifier, doesn’t change the described statement (or any instance/token/occurrence of it). In RDF’s case this ensures monotonicity. Also, multiple reifications don’t mess with the set semantics.
In practice this is good enough to say that some triple ":s :p :o" has been copied over from server :x on date :y or that I like that triple, but for not much more. In practice the air gap between a reifier and "the" statement it reifies causes problems when updating or merging data and it counters basic intuitions. So reification IMO is really a problematic mechanism, notwithstanding the fact that it works well with a monotonic, set-based formalism like RDF. Qualification via instantiation has some very real advantages over reification.

> Your "A" is either the simple logical expression, or what the triple
> denotes, the abstract relationship itself. I take the RDF spec to
> imply that the former is a token, and the latter is basically a formal
> atom (or axiomatic abstraction) of binary, directed propositional
> logic. This atom is not useful as a subject of further description in
> most domains of discourse. (I'd say even literals come before them in
> the theoretical order of "useful subjects".) But *tokens of* such
> certainly are.

I’m practically always talking about occurrences/tokens - like RDF standard reification, and like you and since last December the whole WG I consider it not useful to be able to annotate the abstract triple type.
No, the distinction I’m after is a different one: the statement as an entity on its own right - as if you draw a circle around it and referenced it via that circle - , versus the statement as a relation, with its predicate as the most natural handle to refer to it (that is my B).

> And I'd say your "B" is the notion of edge instantiation in LPGs?

That’s a popular example, yes, but not necessarily true as an edge annotation in an LPG graph may refer to the relation as an entity on its own right just the same (eg to record provenance). I’m not aware of any serious account of how edge annotation in LPG is used in practice, but not only published examples but the whole drive of the approach (distinguishing between primary fact and secondary detail) and its lack of interest in integration and decentralized data (which drives the entity-focused provenance use case in RDF) tells me that qualification is probably the oevrwhelming usage scenario.

> I
> don't see its usefulness in RDF, due to its stricter foundation. (And
> it already has rdfs:subPropertyOf for direct sub-relationship
> specializations.)

But RDF-LPG-compatibility is one of the main reasons why this WG was called into existance. There is a widely shared feeling that what RDF provides to qualify relations is not on par with what LPG provides, and what LPG users expect from RDF when mapping their data to it. I feel reminded of that recurring theme that "everything can be modelled with n-ary relations". Sure enough, but then why use graphs in the first place? (Or, to counter your argument about rdfs:subPropertyOf, why have standardized properties, when in practice we can’t use them without also checking for their eventual subproperties?) What RDF is lacking is a facility to annotate triples _easily_. That is a syntactic problem first (which RDF* claimed to solve) and a much bigger problem under the surface (which CG and WG painfully learned, or knew already). Anyway, saying that RDF doesn’t need to cater for that aspect because it is so well founded, is IMHO pretty much beside the point.

> I do think your examples are great for the
> usefulness of reifiers, since they turn "liking" and "buying" into
> situations or events (which quite naturally cannot be restricted to
> reify only one statement).

Arrrgh, no! :)

> And of course the identity of agents
> participating in those events may be temporal specializations of their
> general, "endurant" identities. RDF can already be used to model *all*
> of that, it just has a somewhat cumbersome way of indirectly relating
> the simpler, direct statements to such more qualified states of
> affairs that reify them.

See above. This WG wouldn’t exist if people felt that what we have is enough.

> As I see it, a statement qualification doesn't have to be just a
> restriction in meaning, it can also be wider than a specialization of
> a relationship, in that it represents some kind of condition or
> circumstance in relation to it. It can thus be both more particular,
> as in concrete,

If additional detail is understood as restricting the view or widening it is very much in the eye of the beholder. "Qualification" in my understanding leaves that open, which is a good thing IMO :)

> and apply to more than one statement.

As I just discovered in my private, not yet shared investigations, it is indeed not trivial to extend singleton properties (which is currentyl my favored approach to qualification) to more than one statement - at least not as economically as I’d like it to be. Of course it can always be done with one more triple.

> They are
> truth-makers of truth-bearers. But this is most certainly
> "vocabulary".

Maybe you can some day give a short account of what that truth-makers and truth-bearers vocabulary means. I never got around reading the paper you got the inspiration from (something Enrico dropped at some point a few months ago IIRC) and I haven’t been able to glean its meaning from your mentions ever since.

> So while I'm sure a better (and/or more approachable) abstraction can
> be made, I'm not sure your substitution keeps the delineation I'm
> seeking. (The sides of which, as I see it, the current "baseline"
> proposal attempts to cater for.)
>
>
>> Your slide "Two Kinds of 'Occurrences'" doesn’t make much sense to me, especially how you characterize a token. I think, plato.stanford.edu is more helpful to define the notion of a token. More generally you mix use cases with structural properties and add orthogonal questions like opacity to the mix. I think that needs more differentiation, and separation of concerns.
>>
>> Enrico mentioned in the last SemTF meeting that there are different kinds of referential opacity:
>> - totally opaque, like a literal referring only to itself
>> - co-referentially opaque, refering to a real world entity but suppressing co-denotation
>> - maybe various levels of co-referential opacity depending on syntactic details (e.g. if the two intergers 42 and 042 are different or not)
>> We have to discuss if we need those, all of them, or which ones…
>> We can derive them all from the abstract unasserted triple term via specific (and to be defined) properties, we can define different syntaxes to represent them (some combinatiosn of <> and "" might do the trick), etc, but who would implement all that, and who would even bother to understand? So do we need to decide?
>
> I am also asking for which concerns there are. I think opacity is a
> rabbit hole (leading to a philosophical permathread) which is best
> tackled, when needed, as just a raw string used as a property value of
> a transparent statement "token", for the cases where quotation of
> source representation is part of the domain of discourse. I don't
> think we need core syntax for that, as use cases may differ a lot on
> details. (For one, I wouldn't be surprised if some require the actual
> prefixes used, etc, since that can be a relevant aspect of errors in
> data capture.)

I agree to a lot of this. However, we had to deal with the opposition from AWS. That is why we have opacity now. We have to talk about that openly. There is a lot of discontent in the SemTF with the opacity thing. Nobody likes it and nobody udnerstands why the AWS guys think that it solves their problem. It makes everything more complicated, it is just one of several possible approaches to opacity which haven’t been discussed in detail (a discussion which I’m pretty sure the AWS guys wouldn’t be very interested in). I’m quite a fan of the "Let’s define an RDF literal and then let a 1000 flowers bloom to derive any kind of opacity from" approach. However, it wouldn’t quite solve AWS’s problem. I’m kinda looking forward to the tentatively announced simpler version of the semantics. I could imagine a solution where we say that the abstract triple term is our starting point, and it is as opaque as the CG defined it (hopefully solving the problems with bnodes or at least relegating them to some edge cases). And then we define different ways to derive concrete instances/occurrences/tokens from them, each with its own syntax: << … >> for the asserted transparent occurrence (much as we have it now, but asserted), and then increasingly more complex variants (and accompanying syntaxes and semantics) like unasserted transparent, asserted weakly opaque, unasserted weakly opaque, unasserted strongly opaque, etc. The mechanism is very straightforward, but all those options are too much. So we then have to decide which ones we actually standardize and require conforming applications to implement, and which ones we maybe hint at. Hopefully the use cases give some guidance. I haven’t checked yet, and I have my doubts, but we’ll see. Otherwise we either don’t do any of that and get in trouble with AWS again, or we do something stupid, or we discuss it until we’re sure enough to have a found a solid set possible semantics and syntaxes.


> However, I do think the way both SPARQL and SHACL operate "on the
> abstract syntax itself" (a gross oversimplification, but just think of
> e.g. isIRI and isBlank in SPARQL), and the way graph names are outside
> of interpretations, are interesting factors. There *might* be a way
> to, simply enough, capture the token/statement duality of a triple
> that is accessible in the interpretation, if deemed a fundamental,
> useful capability for required use cases.
>
>> That is orthogonal to the asserted-unasserted axis.
>> It is also orthogonal to the 1/A-2/B categorization above.
>
> I agree.
>
>
>> Again, I don’t find that "situations, events or circumstances" categorization useful. There are many more things on earth, like theories, relationships, broadly agreed upon facts, the periodic system, Paul buying a ticket, etc. We will neither be able to categorize them all nor do we need to: on the abstract level of statements about statements they all behave the same. The rest is vocabulary, if one is so inclined, and not our topic.
>
> Well, I do explicitly say that they are any kind of rdfs:Resource. I
> suppose we could concretize the example categorizations: situations
> (a.k.a. states of affairs), such as "broadly agreed upon facts" and
> "the periodic system"; events, such as "Paul buying a ticket"; and
> circumstances (closely related to situations), such as "relationships"
> and "theories".
>
> Vocabulary is perhaps not our topic to develop, but usage thereof in
> RDF is surely our responsibility to cater for.

But this voabulary of events, situations, etc, that you keep coming back to is not used yet, right? I still can’t see how they would influence the choice of modelling primitive. Maybe we don’t need to discuss that any further. I just wanted to clarify that I assume that you find that vocab useful to help understanding your conceptualization, but I don’t get how it would. So I’m missing a link in your chain of thought and I’m not sure if it’s even there or if it’s just me.

Best,
Thomas


> Best regards,
> Niklas
>
>
>
>> Best,
>> Thomas
>

Received on Friday, 14 June 2024 11:43:27 UTC