- From: Niklas Lindström <lindstream@gmail.com>
- Date: Fri, 19 Jul 2024 15:53:51 +0200
- To: Gregory Williams <greg@evilfunhouse.com>
- Cc: RDF-star WG <public-rdf-star-wg@w3.org>, Franconi Enrico <franconi@inf.unibz.it>
Hi Gregory, Thank you for explaining your concerns! I think that you are right that the example would be better with the assertions explicitly spelled out; preferably using the annotation form: :a1 :TRANSACTION :a2 {| # :e1 :amount 1000 ; :currency "gbp" ; :date "2002-09-24Z"^^xsd:date |} . :a2 :TRANSACTION :a3 {| # :e2 :amount 900 ; :currency "gbp" ; :date "2002-10-03Z"^^xsd:date . (Pardon the comments for "naming"; just pending resolution of [1].) ## Difference between RDF and LPGs What you're touching upon is that significant data model difference between RDF and LPGs. LPG edges/relationships conceptually *instantiate* "relationship types". As Neo4J defines it: "Relationships must have a type (one type) to define (classify) what type of relationship they are" ([2]). Kurt also touched upon that yesterday, and it has been mentioned before (e.g. what I wrote in [3]). But in RDF Concepts [4] (non-normative section): "Asserting an RDF triple says that some relationship, indicated by the predicate, holds between the resources denoted by the subject and object.". And normatively, in semantics [5], there is the set of properties ("IP") in an *interpretation*, and "a mapping IEXT from IP" where "IEXT(x), called the extension of x" which "is a set of pairs which identify the arguments for which the property is true, that is, a binary relational extension". This is fundamentally a different notion. (And IMHO both a more formal and simpler notion; but I have a long-term RDF bias.) ## Reifiers What triple terms enable is, as I would like to define it, the ability to directly (atomically, or as a fixed point) *reference a relationship* (as in *denoting* it, so extending the definition in [6]). Exactly like how an IRI denotes a resource (in `IR`, the set of resources in the interpretation) That relationship would then be the membership of that pair of resources in the set of properties; i.e. a logical, abstract expression. And reifiers would be defined as anything which is a concretization *of some kind* of such RDF relationships, encoded like `:e1 rdf:reifies <<(:a1 :TRANSACTION :a2)>>`. This enables us to further refine that with more statements about what `:e1` denotes. So a reifier is something more concrete, tied to one or more abstract relationship(s). This simple relationship holds if it is also asserted. Some reifiers may "naturally" expect this simple relationship to hold, but the simple relationship does not depend upon its reification. In LPGs, this is not the case, since as per above, there are no "simple relationships" there. Everyone is an individual occurrence thereof (with its own identity). And such are among the possible uses for reifiers. So I do think Enrico has a valid point in that you might for the sake of consistency be better off mapping LPG data to RDF always using reifiers. But I may disagree in practice, since not all LPG relationships are qualified; so asserting the simpler relationships makes sense to me. Thus, to *bridge* the differences, conceptually and in concrete syntaxes, the annotation form should have a central place. ## Separate Assertion I think you and Thomas worry that some reifiers would be "incomplete" if the asserted triples are "forgotten" (or "disappear")? I can see that, but would like to see a concrete case of such RDF Source [7] editing scenarios, who do not also suffer from other problems such as "dangling" blank nodes after "unlinking" them, "unravelled" RDF lists, etc. I'd argue that concrete RDF representations, high level libraries and tooling are quite enough to handle this. (Again, this may be RDF bias on my part.) I suspect you would be more comfortable if annotated triples were somehow not possible to "remove" (from a source, yielding a new graph, e.g. with a SPARQL `DELETE WHERE`) unless also removing a reification thereof (some or all)? But I don't think e.g. deciding to remove triple from a graph source without considering contextual information is something RDF needs to "cover for" in a built-in fashion. One might argue the other way around too, since if a triple is deleted but the reifier is left, it at least is no longer asserted. (Well, unless entailed anyway; but if you're using OWL, you *really* need a higher level view of the consequences). Then you "only" need to go "garbage collect" those "dangling" reifiers (presumably also every use of them as subjects or objects?). But only those that don't make sense in the domain. I think concrete cases are needed here. The status of Wikidata snaks is one such case, where a rank of "normal" and "preferred" both are (AFICS) accompanied by the asserted simple triple, but "deprecated" are left "dangling"; with qualifiers and references intact. I also believe that Thomas is asking for signalling this directly with the relationship from the reifier to the triple term (e.g. using an rdf:implies instead of the "neutral" rdf:reifies). I currently believe that there are a variety of possible different domain rules for what constitutes the necessary dependency of a certain kind of reifier to an asserted triple, and that this is not needed in RDF itself. But we can explore this further. ## Detailed Modelling Notes Here follows how I might approach this case, and how reifiers would play a part in making choices. Skip this part if the explanation above is enough. (First off a style detail: I would probably call the simple relationships here something like `:transferTo`, to follow a fairly common practice in RDF. Of course, the domain experts would have the final call.) Prior to being lured to the "dark side of overly simplistic designs", I would also conclude that this requires a more complex model, and define N-ary entities of type `:Transaction` directly, like: :t1 a :Transaction ; :from :a1 ; :to :a2 ; :amount 1000 ; :currency :GBP ; :date "2002-09-24Z"^^xsd:date . I'd still argue that this option is important to consider, of course in discussion about what makes sense with maintainers and client users of the data. If in general, they will expect the simple binary relationship `:transferTo` (in querying, when adding data, etc.), and the details are wholly secondary, I might still first check if the systems are expected to support full OWL reasoning capability. If so I would simply define `:transferTo owl:propertyChainAxiom ( [ owl:inverseOf :from ] :to )` and be done; since the querying would work with both the full N-ary above and with the simplistic binary relation (which would be entailed). But today I'd wager that the answer is usually "no" or "not yet". In RDF 1.1 that then requires a choice with consequences: require the full N-ary form everywhere, or maintain both, with no conceptual nor syntactic support. But if RDF 1.2 gains this reifier feature, you can start with simple relationships and then reify them as the needs grow. I've explained that more in detail in [8]; the gist of it for the above case is: `:e1` and `:t1` are perfectly possible to model as the same kind of entity (a `:Transaction`), and could even in a future integration scenario (when OWL may be back on the table) be declared as the same. (From this perspective I can also see how Enrico finds the reifier itself enough, in this particular case; since the information is all there. But I don't think it's "use case complete" in that users probably expect to be able to query for the simple relationship.) Again, this is all my own stance here; and grounds for my motivation to go forward with the current option. Best regards, Niklas [1]: <> [2]: <https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/> [3]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Apr/0196.html> [4]: <https://www.w3.org/TR/rdf12-concepts/#resources-and-statements> [5]: <https://www.w3.org/TR/rdf12-semantics/#simple> [6]: <https://www.w3.org/TR/rdf12-concepts/#dfn-denote> [7]: <https://www.w3.org/TR/rdf12-concepts/#dfn-rdf-source> [8]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0038.html> On Fri, Jul 19, 2024 at 2:54 AM Gregory Williams <greg@evilfunhouse.com> wrote: > > I mentioned in the call today that I had concerns about the "RDF star and LPGs” document[1] that was linked on IRC. I didn’t find any previous mention of that document in the mailing list archives, so unsure if it has been discussed previously (perhaps on a previous call?). Overall, I'm skeptical of the mapping presented in the document, and especially as the basis for making arguments that the current baseline proposal is compatible with LPG use-cases that have been previously discussed. > > (I may be making assumptions about the mapping here, but since it is presented in the form of a single, simple example, I guess that’s to be expected. Similarly, I may be reading a lot into the "<—>” mapping indicator used between the example LPG and RDF data; please let me know if I’ve misunderstood the intention.) > > > > I think on its face the RDF-LPG mapping presented does not preserve the intended semantics. To me, the intent of the LPG syntax is pretty clearly to assert that there are two transactions that occurred. They exist! > > ``` > CREATE (a1)-[:TRANSACTION {amount: 1000, currency: "gbp", date: "2002-09-24Z"}]->(a2) > CREATE (a2)-[:TRANSACTION {amount: 900, currency: "gbp", date: "2002-10-03Z"}]->(a3) > ``` > > But the mapped RDF does *not* define any transactions. There are no `:TRANSACTION` triples in the graph – that predicate only exists inside triple terms. So maybe they are hypothetical transactions? > > ``` > << :e1 | :a1 :TRANSACTION :a2 >> ; > :amount 1000 ; > :currency "gbp" ; > :date "2002-09-24Z"^^xsd:date . > > << :e2 | :a2 :TRANSACTION :a3 >> ; > :amount 900 ; > :currency "gbp" ; > :date "2002-10-03Z"^^xsd:date . > ``` > > Furthermore, the mapping does not appear to be a bijection, and cannot be round-tripped. What happens to those triple terms if you try to map back to LPG? Do they disappear, as they are not asserted in the RDF graph? If, instead, they are mapped back into actual `:TRANSACTION` edges, how would that be different from mapping similar RDF that also asserted the `:TRANSACTION` triples in the graph (in addition to the triple terms)? I think this starts to touch on the point that Thomas has been making about asserted vs. unasserted assertions and the difficulty of modeling that difference in our current formalization. > > Under “OBSERVATIONS": > > > Reifiers denote edge identifiers. > > I think this suggests that you'd end up with an empty graph if you tried to map RDF without any reifiers back to LPG (or at least a graph with no edges). That doesn't seem viable, as all existing RDF data prior to rdf-star would be entirely inaccessible based on this mapping. I am nervous about any one-way mapping, as it has implications for the sorts of systems you can build and the use-cases that they can address. And I’m similarly nervous about basing support for the baseline proposal on this document as an assurance that we are now able to capture LPG use cases. The mapping is just too casual, and leaves out far too many specifics to be useful in understanding if it can practically address LPG use-cases. > > thanks, > .greg > > [1] https://github.com/w3c/rdf-star-wg/wiki/RDF-star-and-LPGs >
Received on Friday, 19 July 2024 13:54:22 UTC