Re: RDF star and LPGs from Niklas Lindström on 2024-07-19 (public-rdf-star-wg@w3.org from July 2024)

From: Niklas Lindström <lindstream@gmail.com>
Date: Fri, 19 Jul 2024 15:53:51 +0200
To: Gregory Williams <greg@evilfunhouse.com>
Cc: RDF-star WG <public-rdf-star-wg@w3.org>, Franconi Enrico <franconi@inf.unibz.it>
Message-ID: <CADjV5jd1SMed-789VXARRcd9hOxWXZ4OHiSrE1RJGJBGaOwZ9A@mail.gmail.com>
Hi Gregory,

Thank you for explaining your concerns! I think that you are right
that the example would be better with the assertions explicitly
spelled out; preferably using the annotation form:

    :a1 :TRANSACTION :a2 {| # :e1
      :amount 1000 ;
      :currency "gbp" ;
      :date "2002-09-24Z"^^xsd:date |} .

    :a2 :TRANSACTION :a3 {| # :e2
      :amount 900 ;
      :currency "gbp" ;
      :date "2002-10-03Z"^^xsd:date .

(Pardon the comments for "naming"; just pending resolution of [1].)


## Difference between RDF and LPGs

What you're touching upon is that significant data model difference
between RDF and LPGs. LPG edges/relationships conceptually
*instantiate* "relationship types". As Neo4J defines it:
"Relationships must have a type (one type) to define (classify) what
type of relationship they are" ([2]). Kurt also touched upon that
yesterday, and it has been mentioned before (e.g. what I wrote in
[3]).

But in RDF Concepts [4] (non-normative section): "Asserting an RDF
triple says that some relationship, indicated by the predicate, holds
between the resources denoted by the subject and object.". And
normatively, in semantics [5], there is the set of properties ("IP")
in an *interpretation*, and "a mapping IEXT from IP" where "IEXT(x),
called the extension of x" which "is a set of pairs which identify the
arguments for which the property is true, that is, a binary relational
extension". This is fundamentally a different notion. (And IMHO both a
more formal and simpler notion; but I have a long-term RDF bias.)


## Reifiers

What triple terms enable is, as I would like to define it, the ability
to directly (atomically, or as a fixed point) *reference a
relationship* (as in *denoting* it, so extending the definition in
[6]). Exactly like how an IRI denotes a resource (in `IR`, the set of
resources in the interpretation) That relationship would then be the
membership of that pair of resources in the set of properties; i.e. a
logical, abstract expression. And reifiers would be defined as
anything which is a concretization *of some kind* of such RDF
relationships, encoded like `:e1 rdf:reifies <<(:a1 :TRANSACTION
:a2)>>`. This enables us to further refine that with more statements
about what `:e1` denotes.

So a reifier is something more concrete, tied to one or more abstract
relationship(s). This simple relationship holds if it is also
asserted. Some reifiers may "naturally" expect this simple
relationship to hold, but the simple relationship does not depend upon
its reification. In LPGs, this is not the case, since as per above,
there are no "simple relationships" there. Everyone is an individual
occurrence thereof (with its own identity). And such are among the
possible uses for reifiers. So I do think Enrico has a valid point in
that you might for the sake of consistency be better off mapping LPG
data to RDF always using reifiers. But I may disagree in practice,
since not all LPG relationships are qualified; so asserting the
simpler relationships makes sense to me. Thus, to *bridge* the
differences, conceptually and in concrete syntaxes, the annotation
form should have a central place.


## Separate Assertion

I think you and Thomas worry that some reifiers would be "incomplete"
if the asserted triples are "forgotten" (or "disappear")? I can see
that, but would like to see a concrete case of such RDF Source [7]
editing scenarios, who do not also suffer from other problems such as
"dangling" blank nodes after "unlinking" them, "unravelled" RDF lists,
etc. I'd argue that concrete RDF representations, high level libraries
and tooling are quite enough to handle this. (Again, this may be RDF
bias on my part.)

I suspect you would be more comfortable if annotated triples were
somehow not possible to "remove" (from a source, yielding a new graph,
e.g. with a SPARQL `DELETE WHERE`) unless also removing a reification
thereof (some or all)? But I don't think e.g. deciding to remove
triple from a graph source without considering contextual information
is something RDF needs to "cover for" in a built-in fashion. One might
argue the other way around too, since if a triple is deleted but the
reifier is left, it at least is no longer asserted. (Well, unless
entailed anyway; but if you're using OWL, you *really* need a higher
level view of the consequences). Then you "only" need to go "garbage
collect" those "dangling" reifiers (presumably also every use of them
as subjects or objects?). But only those that don't make sense in the
domain.

I think concrete cases are needed here. The status of Wikidata snaks
is one such case, where a rank of "normal" and "preferred" both are
(AFICS) accompanied by the asserted simple triple, but "deprecated"
are left "dangling"; with qualifiers and references intact. I also
believe that Thomas is asking for signalling this directly with the
relationship from the reifier to the triple term (e.g. using an
rdf:implies instead of the "neutral" rdf:reifies). I currently believe
that there are a variety of possible different domain rules for what
constitutes the necessary dependency of a certain kind of reifier to
an asserted triple, and that this is not needed in RDF itself. But we
can explore this further.


## Detailed Modelling Notes

Here follows how I might approach this case, and how reifiers would
play a part in making choices. Skip this part if the explanation above
is enough. (First off a style detail: I would probably call the simple
relationships here something like `:transferTo`, to follow a fairly
common practice in RDF. Of course, the domain experts would have the
final call.)

Prior to being lured to the "dark side of overly simplistic designs",
I would also conclude that this requires a more complex model, and
define N-ary entities of type `:Transaction` directly, like:

    :t1 a :Transaction ;
      :from :a1 ;
      :to :a2 ;
      :amount 1000 ;
      :currency :GBP ;
      :date "2002-09-24Z"^^xsd:date .

I'd still argue that this option is important to consider, of course
in discussion about what makes sense with maintainers and client users
of the data. If in general, they will expect the simple binary
relationship `:transferTo` (in querying, when adding data, etc.), and
the details are wholly secondary, I might still first check if the
systems are expected to support full OWL reasoning capability. If so I
would simply define `:transferTo owl:propertyChainAxiom ( [
owl:inverseOf :from ] :to )` and be done; since the querying would
work with both the full N-ary above and with the simplistic binary
relation (which would be entailed). But today I'd wager that the
answer is usually "no" or "not yet". In RDF 1.1 that then requires a
choice with consequences: require the full N-ary form everywhere, or
maintain both, with no conceptual nor syntactic support.

But if RDF 1.2 gains this reifier feature, you can start with simple
relationships and then reify them as the needs grow. I've explained
that more in detail in [8]; the gist of it for the above case is:
`:e1` and `:t1` are perfectly possible to model as the same kind of
entity (a `:Transaction`), and could even in a future integration
scenario (when OWL may be back on the table) be declared as the same.

(From this perspective I can also see how Enrico finds the reifier
itself enough, in this particular case; since the information is all
there. But I don't think it's "use case complete" in that users
probably expect to be able to query for the simple relationship.)


Again, this is all my own stance here; and grounds for my motivation
to go forward with the current option.

Best regards,
Niklas

[1]: <>
[2]: <https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/>
[3]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Apr/0196.html>
[4]: <https://www.w3.org/TR/rdf12-concepts/#resources-and-statements>
[5]: <https://www.w3.org/TR/rdf12-semantics/#simple>
[6]: <https://www.w3.org/TR/rdf12-concepts/#dfn-denote>
[7]: <https://www.w3.org/TR/rdf12-concepts/#dfn-rdf-source>
[8]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0038.html>

On Fri, Jul 19, 2024 at 2:54 AM Gregory Williams <greg@evilfunhouse.com> wrote:
>
> I mentioned in the call today that I had concerns about the "RDF star and LPGs” document[1] that was linked on IRC. I didn’t find any previous mention of that document in the mailing list archives, so unsure if it has been discussed previously (perhaps on a previous call?). Overall, I'm skeptical of the mapping presented in the document, and especially as the basis for making arguments that the current baseline proposal is compatible with LPG use-cases that have been previously discussed.
>
> (I may be making assumptions about the mapping here, but since it is presented in the form of a single, simple example, I guess that’s to be expected. Similarly, I may be reading a lot into the "<—>” mapping indicator used between the example LPG and RDF data; please let me know if I’ve misunderstood the intention.)
>
>
>
> I think on its face the RDF-LPG mapping presented does not preserve the intended semantics. To me, the intent of the LPG syntax is pretty clearly to assert that there are two transactions that occurred. They exist!
>
> ```
> CREATE (a1)-[:TRANSACTION {amount: 1000, currency: "gbp", date: "2002-09-24Z"}]->(a2)
> CREATE (a2)-[:TRANSACTION {amount: 900, currency: "gbp", date: "2002-10-03Z"}]->(a3)
> ```
>
> But the mapped RDF does *not* define any transactions. There are no `:TRANSACTION` triples in the graph – that predicate only exists inside triple terms. So maybe they are hypothetical transactions?
>
> ```
> << :e1 | :a1 :TRANSACTION :a2 >> ;
>   :amount 1000 ;
>   :currency "gbp" ;
>   :date "2002-09-24Z"^^xsd:date .
>
> << :e2 | :a2 :TRANSACTION :a3 >> ;
>   :amount 900 ;
>   :currency "gbp" ;
>   :date "2002-10-03Z"^^xsd:date .
> ```
>
> Furthermore, the mapping does not appear to be a bijection, and cannot be round-tripped. What happens to those triple terms if you try to map back to LPG? Do they disappear, as they are not asserted in the RDF graph? If, instead, they are mapped back into actual `:TRANSACTION` edges, how would that be different from mapping similar RDF that also asserted the `:TRANSACTION` triples in the graph (in addition to the triple terms)? I think this starts to touch on the point that Thomas has been making about asserted vs. unasserted assertions and the difficulty of modeling that difference in our current formalization.
>
> Under “OBSERVATIONS":
>
> > Reifiers denote edge identifiers.
>
> I think this suggests that you'd end up with an empty graph if you tried to map RDF without any reifiers back to LPG (or at least a graph with no edges). That doesn't seem viable, as all existing RDF data prior to rdf-star would be entirely inaccessible based on this mapping. I am nervous about any one-way mapping, as it has implications for the sorts of systems you can build and the use-cases that they can address. And I’m similarly nervous about basing support for the baseline proposal on this document as an assurance that we are now able to capture LPG use cases. The mapping is just too casual, and leaves out far too many specifics to be useful in understanding if it can practically address LPG use-cases.
>
> thanks,
> .greg
>
> [1] https://github.com/w3c/rdf-star-wg/wiki/RDF-star-and-LPGs
>
Received on Friday, 19 July 2024 13:54:22 UTC