Authoring graphs - Re: labelled property graphs vs -star extension of RDFn vs -star extension of named graphs

> On 8. Dec 2023, at 15:42, Lassila, Ora <ora@amazon.com> wrote:
> 
> This example is similar to the ones we used in our OneGraph papers. We found no way to accommodate the "LPG way" and the "RDF way" simultaneously, without breaking RDF semantics. That's why OneGraph treats LPGs, RDF-star (as understood at the time we wrote those papers) and RDF as "lower-dimensional projections" of its graph model (that is, some information loss is possible).
> 
> I can imagine a scenario where we have two named graphs, with one :Liz :spouse :Dick triple in each, and thus those triples each get their own "edge properties". But I am reluctant to hijack named graphs for something like this, given that there are many (likely conflicting) uses of named graphs out there already. Given that named graphs have been around for quite some time already, and with no particular semantics, it would seem foolish (if not arrogant) for us to now propose one specific use for them.


The idea is to *add* one use, not to invalidate any existing ones. I have not yet seen any proof that this is not possible, only concerns - which we should investigate, but I’m optimistic. I mean: what happened to "anyone can say anything about anything"? On the other hand, thinking your argument to the end, in effect what it does is invalidate any use of named graphs outside the administartive domain. Let me explain...

For the sake of the argument I’m considering a dichotomy of use cases: administration and authoring. In practice both use named graphs. Administrative use cases can rely on out-of-band mechanisms, e.g. application code, to define semantics, authoring use cases can go the usual road of good enough conventions and practical fixings, e.g. the graph keyword in SPARQL defining very solidly the naming semantics in use. 

In this WG there is a strong tendency to dismiss any use of named graphs as either out of scope (not administrative) or unsound (not well defined) or untypical (not triples), and those views go well together, but they all ignore that named graphs are indeed used, and needed, for authoring, and in reasonably sound ways. 

Authoring with triples alone is considered too hard by many people: people want a way to express complex relationships more easily. LPG came along and provided primitives that many people find very attractive: primary relations vs secondary attributes, relations on objects vs relations between objects, contextualized tokens instead of just types. And our precious, mighty triple can’t handle any of that sufficienly well.
RDF without (named) graphs can’t even add some structure to a huge graph. RDF lacks a mechanism to define boundaries in a graph: it provides only algorithmic ones like CBD, or named graphs. The LPG modelling idiom of objects and their relations reflects that need: it groups statements into objects. 
A grouping mechanism is also indispensable in the authoring domain to easily add annotations to more than one statement - provenance for example is not just an administration issue as the use cases nicely illustrate.

In that sense "it would seem foolish (if not arrogant) for us to now propose one specific use for them" indeed! But considering named graphs off limits for authoring purposes like those described above in effect does just that - the argument goes both ways.  Either both uses can co-exist (which I assume they can) or we’ll have to make a decision that is pretty hard on a lot of people: tell them to move their data to something else, something new, instead of named graphs, if they want to model data, not just administrate it. 

If we do indeed find, or say, or just FUDle that administration and authoring can’t peacefully coexist and use the same named graph mechanism, then we have to introduce a new one, but we would also have to change existing systems: we would discourage use of named graphs for anything else than out-of-band administrative uses and try to nudge authoring uses of named graphs towards the new mechanism. 

Still, we might come to the conclusion that it's acceptable and strategically wise to swallow the bitter pill and standardize the two domains apart. But then it’s not enough to add a better kind of reification. Instead a new formalism for authoring with graphs has to be introduced - graph terms, graph literals, graph whatever - which is firmly placed into the realm of authoring, has sound (but not overyl permissive) semantics and nicely blends in syntactically.

In Dydra’s opinion - and Dydra’s customers work a lot with named graphs, and use them for all kinds of purposes - it is perfectly possible for different use cases from both domains - authoring and administration - to co-exist using the named garph mechanism of RDF. And if you have serious ACL requirements, you implement those not alone based on one RDF dataset anyway.

However Dydra can just as well implement a new graph term|literal|whatever mechanism on the basis of named graphs, so in a way Dydra is not biased at all. 

The decision is really on another level: 

 Do we want to break existing usage 
 by discouraging use of named graphs 
 for anything else than administrative 
 use cases?

Do we really want to go there, because of fears and concerns? Or is that a landgrab by administrators, unfairly putting the authors at a disadvantage? Well, if we find that we have to do it, then we have to provide an alternative graph formalism, nested (ha!) inside named graphs. If we don’t, then named graphs are our best option to base an annotation device on, because they unify the interface for inseperable concerns: annotating single or multiple triples. And they provide the most solid multi-edge support: direct reference from annotation to token. 

Best,
Thomas



> My preference, for this WG, would be to stay away from named graphs altogether. My $0.02, not wearing my chair hat.
> 
> Ora
> 
> -- 
> Dr. Ora Lassila 
> Principal Technologist, Amazon Neptune 
> 
> 
> 
> 
> On 12/8/23, 9:18 AM, "Peter F. Patel-Schneider" <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
> 
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> 
> 
> 
> At the teleconference yesterday I mentioned that there could be user-visible
> differences between different views of how to proceed, even when there is some
> consensus that different views are essentially the same.
> 
> 
> Here is one example of a user-visible divergence. Consider the following
> input, written in the community group syntax.
> 
> 
> :liz :spouse :dick {| :start 1964; :end 1974 |} .
> :liz :spouse :dick {| :start 1975; :end 1976 |} .
> 
> 
> In the community graph version of RDF-star this results in one asserted triple
> with subject :liz that is the subject of four triples. In SPARQL-star, the BGP
> 
> 
> :liz :spouse :dick {| :start 1964; :end 1976 |} .
> 
> 
> would match against a graph constructed from this input.
> 
> 
> In labelled property graphs this would appear to result in two asserted
> triples with subject :liz, each with two property-value pairs. The above BGP
> would not match.
> 
> 
> So there is a decided visible difference between the community graph version
> of RDF-star and labelled property graphs.
> 
> 
> If I am correct in reading the (sparse) information available about RDFn, a
> -star extension of RDFn would conform to the community group reading. So
> there would be noticeable differences between an extended RDFn and labelled
> property graphs.
> 
> 
> I am not aware of any proposal for using named graphs that says what the above
> would result in there, so it is unclear which side a named graphs version of
> -star would fit into.
> 
> 
> peter
> 
> 
> 
> 
> 

Received on Saturday, 9 December 2023 11:44:41 UTC