Re: Qualification and identity from thomas lörtsch on 2022-01-04 (public-rdf-star@w3.org from January 2022)

From: thomas lörtsch <tl@rat.io>
Date: Tue, 4 Jan 2022 16:05:16 +0100
To: Niklas Lindström <lindstream@gmail.com>
Cc: public-rdf-star@w3.org
Message-Id: <08AB00A0-CC1C-4A81-B98C-5D172D059AD2@rat.io>
Hi Niklas,


thank you for this very nice post! I agree with a lot of your fundamental assumptions and attitudes. Some comments inline.

> Am 29.12.2021 um 02:20 schrieb Niklas Lindström <lindstream@gmail.com>:
> 
> Hello all,
> 
> I sometimes wonder, aren't there any exdurantists [1] on this list (or people from the twicely fictional Tlön [2], perhaps)?

I’m definitely thinking about the benefits of endurants and the like (btw: do you know NdTerms [0])? Their user facing unwieldyness is IMO the main problem.

> Just the other week I wrote up this simple mixed essay/ontology/critique, spurred by my reading up on RDF-star and all the surrounding discussions. I called it "QuID - The Qualified Identity Ontology":
> 
>     https://w3id.org/quid/ [3]
> 
> I wasn't fully motivated to bring it up then (it isn't a finished thought product for one), but given the recent discussion on the list, I felt that perhaps it might provide some kind of perspective (or prompt someone to give me constructive feedback). I am not out to recommend this "QuID" approach over RDF-star annotations (they could be orthogonal, albeit here poised as contrasts), I just need a way to clarify on the one hand what I believe is viable regarding qualification, and on the other what I worry RDF-star is and isn't, to hopefully gather further insights here.

RDF-star as defined by the CG report is not what people expect it to be and neither what it was and is advertized to be. It is now defined as a very specific tool for a very specific task very close to the metal of RDF: keeping track of syntactic representations of triples during processing (be it reasoning or other transformations). The semantic approach chosen by the CG makes good use of the verbosity inherent to the embedded triple syntax. So, it’s not a bad design per se, but badly positioned and - unlike the widely expected better support for reification and property graph style modelling - not exactly badly asked for. 

The closest the CG has come to an actual solution for reification and statement qualification is an informal :occurrenceOf property. This is not what anyone expected the RDF-star effort to deliver. This mismatch may very well lead to unintentionally "gaming" RDF-star by those that take the syntax for what they expect (or desire) it to mean, and the shorthand syntax and the marketing blabla of the CG make this prospect even more likely. I fear that will ruin RDF-star in practice (and the reputation of RDF semantics in the property graph community).

Your QuID text talks about both faces of RDF-star - the one that the semantics in the CG report specifies and the one that it claims to also support and that people expect  - as if they were on equal footing. They are not, however. The latter is only a syntactic possibility but running against the proposed semantics. If anything is really gamed here it is the communities - both RDF and property graphs - by the RDF-star CG.

> The motivation for it was the impression I'm getting that RDF-star appears to be set up to be somewhat *gamed*. It appears to me (as is also currently debated) that there may be some conflation of statement and "asserted but implied event" in use cases here and there, and that this may be a potential problem going forward (at worst heading towards an httpRange-14 situation but for every triple...). I wanted to contrast this with a notion I've had for some time about the limited nature of the *identities* we use in RDF, and what that implies in relation to various forms of qualification, including this "gaming" of RDF-star, specifically its annotation form (which admittedly I'm drawn to for practical reasons, to the point of wanting to game it...).

W.r.t. the problem of identification (httpRange-14) the whole semantic web is gamed. But it works nonetheless, by and large. There is some late-binding strategy as an architectural principle at work that navigates troubled semantic waters by leaving things undefined that would be too cumbersome, too difficult to define. And that is definitely a fully rational approach. Ambiguity is unavoidable in practice and has to be handled out-of-band through conventions, comments or code. The problem IMO is rather that there is no good mechanism to disambiguate identification semantics explicitly when the need arises. There’s probaby also too little awareness about this tension and too little training in navigating it. 

On a practical level some machinery enabling qualification per term on demand might just be what’s needed.

In that respect there’s a lot about your QuID proposal that I like but also a lot that IMO is still missing, or rather not radical enough. Some of your examples could be modeled much more intuitively if you applied qualification to the property instead of the object. Why qualify the object (or the subject, or both) of a marriage relation when it is the relation itself that is only valid during a certain period of time? Sure, that requires blank nodes in predicate position. If that's all that’s needed, then let's go for it! Or fall back to the singleton properties approach.

I like your thoughts about querying support. A well rounded solution might even let queries automagically follow quid:qualifies relations to the effect that e.g. a query for <Berkshire> will also return all qualified occurrences of <Berkshire>. 

> I also have some concerns that this pursuit of reification (becoming "triples within triples") might make the simple substrate of triples in RDF much harder to grasp. But I may be wrong there.
> 
> Perhaps disqualifying me from the fully rational, I am actually fairly comfortable with ambiguity and broken semantics (though not necessarily with conflation); and if this "gaming" I am worried about is considered sound and as intended, I will probably continue to pursue it in some fashion (I've already gone down that route [4], if only to avoid inventing something.) It might be that annotations are "mixed quotations" [5], and the use/mention distinction cannot be readily applied. That might be a semantic bog in the making of course, but I suppose any applied semantics eventually strays from the picture and breaks cohesion. I put my graphs under names and certainly don't trust the giant global graph of the semantic web to be cohesive (one stray owl:sameAs and the entire castle comes tumbling down). It's all just maps, skewed, with varying symbols, granularities and abstractions. They're not beyond interlinking, even ontology-wise, and that's workable enough. I'm not a fan of entropy, and endless complexities (still, alas, I find myself an agent of it, time and again), but that's the game of nature, and just striving to navigate it may be enough to get by.

So far complexity is hidden in out-of-band mechanisms, in code and comments and conventions. Pulling those ongoings inside the semantic web will of course increase complexity of the RDF machinery but I think it’s worth it. The tensions between the abstract ideal of the global graph and the multitude of practical constraints or between the triple as the only harbinger of truth and the need for more elaborate constructs are so high that IMO they are unsustainable.

> I do think clarification and harmonization of (or dare I wish, even unification of) what is happening here on the quoted triple level with named graphs as a vehicle for triples and provenance would be the most valuable route towards standardization. And I do miss a clear stance on qualification (rdf:value has been around long enough without catching on, so something's amiss). As these are the papers and inks we're all working with, and we're now about to get a new kind of colored marker to work with, to clarify how these map making pieces are to be grasped together would be wise (lest we get lost in the map making process and forget to navigate our realities).


Of course I’ve got some ideas of my own:

I’m all for generalized RDF. Blank nodes in predicate position would allow to qualify properties with data about the relation itself: start and end date of a marriage belong here, not to the nodes (Burton and Taylor in our running example) and neither to the whole statement. Qualifications should always be as targeted as possible.

I like your analogy to rdf:lists. What if we treated _all_ IRI references as shortcuts for a blank node with a quid:qualifies relation to that IRI and added some sugar to our syntaxes and query engines to support this shortcut? That might get us a long way towards keeping the simple (simplistic) triple notation, preserving triple-based semantics and reasoning mechanisms and yet achieve the usability benefit that property graphs provide.

But how to annotate whole statements? The RDF-star approach IMO has too many downsides. Statement identification per statement id, as for example in RDF/XML and RDF standard reification, also isn’t ideal. I’d like to treat statements as singleton graphs and consequently statement qualification as a special case of graph annotation. I’ve never seen a good reason why single and multiple triples should be understood and handled as categorically diffferent.

That would require named graphs to have some formal semantics. Given that named graphs are the only grouping device that RDF provides this semantics should be as unambitious as possible: the name identifies the graph (no disambiguation between use/mention here) and the graph is referentially transparent. This semantics also reflects the established practice in SPARQL.
Any other, more involved semantics would have to be declared explicitly (e.g. with qualifiers on the graph name).
A nesting syntax like in N3 (but referentially transparent, and with a naming facility added) would be nice.

The httpRange14 problem lurks everywhere in RDF. We somehow got by so far without solving it but I think we should implement a mechanism to disambiguate identification semantics on demand, on any level: nodes, statements, graphs. That should solve a lot of problems. It could be realized as a qualifying property like :identificationSemantics with possible values :denotation|:indication (or :use|:mention or :documented|:document).
That would allow us to easily differentiate annotations on statements or graphs (as documents of their own right) from annotations on what they refer to, and annotations on documents themself (represented by their IRIs) from annotations on the meaning they convey.
More on the implications on the semantics below.

We need more design patterns. A symmetric relation like marriage is actually not a very good fit for a directed graph. Good old tables, encoded as n-ary relations can do a very good job at describing complex objects. However, if we had a solid and sound statement annotation facility we could combine both idioms:
- let the marriage be described in an n-ary relation in all gory detail, easily extendable to multiple marriages of the some couple and what have you. 
- let ’shortcut' triples describe the essential facts and link them to the complete description via a statement annotation.
Best of both worlds, IMHO. But it would require a way to soundly address those shortcutting triples.
Added benefit for symmetric relations: describe both directions via 'shortcut' triples that via qualifiers on the statement as a whole 'link back' to the complete description of the marriage as an n-ary relation.

Multisets are only problematic if the statement identifier identifies a type. The verbosity of the RDF-star embedded triple plus the need for an extra statement to define the occurrence makes this approach rather unattractive for statement annotation. We need another syntactic device: either statement identifiers or singleton graphs (the latter we do of course have already, the former we have in RDF/XML - albeit both without standardized semantics). 
Multisets like the marriages between Burton and Taylor however are best covered by qualified properties as they actually hinge on the relation between the two nodes, not on the statement as a whole:
    :Burton _:p :Taylor .
    _:p quid:qualifies :marriedTo; 
        :start 1964;
        :end 1974 .
Another version, with some more detail:
    _:s _:p _:o .
    _:s quid:qualifies :Burton ;
        :firstName :Richard .
    _:p quid:qualifies :marriedTo; 
          :start 1964;
          :end 1974 
    _:o quid:quaifies :Taylor ;
        :nickname "Liz" .
This can rightfully be accused of being decidedly blank node heavy but OTOH the traditional ways of modelling complex objects n RDF need a lot of blank nodes too and, more importantly, this approach here can easily be made more usable by hiding it behind a surface syntax. I’m leaning towards statement identifiers, with additional fragment identifiers:
    :Burton :marriedTo :Taylor id_1 .
    id_1#subject :firstName :Richard .
    id_1#predicate :start 1964; :end 1973 .
    id_1#object :nickname "Liz" .
This preserves the basic relation in an easy to write/read/query triple and keeps all refining detail nearby.

Annotating the whole statement is syntactically straigtforward with an identifier (which may be a singleton graph name, the subject of an RDF standard reification quadlet or some syntactic device as in RDF/XML):
    id_1 :y :Z .
but defining the meaning of this annotation is a little trickier, including ensuring that the triple remains the only harbinger of truth. 

Following the definition for graph semantics above a reference to id_1 identifies what id_1 means. 

To ensure that the simple triple is the only harbinger of truth a qualification of the meaning of a statement has to be understood as (and under the hood expanded/translated to) annotations/qualifications on all the terms it is constituted of, analogously for graphs. 
Annotating a statement as a documemt on its own right with eg provenance information should not be passed on the nodes it is comprised of. Again unambiguous identification is key. 
Qualifying a statement reference as having indication semantics would then look like
    _:t quid:qualifies id_1 ;
        identificationSemantics :Indication ;
        :src :Wikipedia .
which is of course quite verbose. Re-using the fragment identifier syntax from above however one could write
    id_1#indicationSemantics :src :Wikipedia .
to state the source of a statement but
    id_1#denotationSemantics :x :Y .
to declare qualifications that apply to each term in the statement. I have to admit that I’m not able to come up with a good example for a case where all three terms are equally qualified but I’m sure such cases exist.

I’m not sure that this is all: what about the need to speak about a statement or graph as an expression of meaning on its own right (which might be different from "adding up" the meaning of all the parts it is composed of)? Well, we’re not limited to just two graph naming semantics but things might get out of hand… 


Best,
Thomas


[0] https://arxiv.org/abs/1709.04970


> Sincerely,
> Niklas
> 
> [1]: https://en.wikipedia.org/wiki/Perdurantism
> [2]: https://en.wikipedia.org/wiki/Tl%C3%B6n,_Uqbar,_Orbis_Tertius
> [3]: Snapshot reference of the article code for future readers of this list: https://raw.githubusercontent.com/niklasl/quid/565a15b4263398d9d20b36e2c2af5b204430d6a2/index.html
> [4]: https://github.com/niklasl/ldtvm/blob/master/examples/Spec.md#qualified-relations-as-reifications
> [5]: https://plato.stanford.edu/entries/quotation/#MixeQuot
>
Received on Tuesday, 4 January 2022 15:05:37 UTC