Re: proposal: a reifier should reify only one "thing" from Thomas Lörtsch on 2024-04-24 (public-rdf-star-wg@w3.org from April 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Wed, 24 Apr 2024 23:08:03 +0200
To: Gregory Williams <greg@evilfunhouse.com>
Cc: Franconi Enrico <franconi@inf.unibz.it>, Pierre-Antoine Champin <pierre-antoine@w3.org>, "Sasaki, Felix" <felix.sasaki@sap.com>, RDF-star WG <public-rdf-star-wg@w3.org>
Message-Id: <99AB9570-CD7F-4727-8D70-F5DA817FFDB5@rat.io>
Hi Greg,

sorry for the late and somewhat hasty response!


> On 19. Apr 2024, at 17:44, Gregory Williams <greg@evilfunhouse.com> wrote:
> 
> 
> 
>> On Apr 18, 2024, at 9:00 AM, Thomas Lörtsch <tl@rat.io> wrote:
>> 
>>> On 18. Apr 2024, at 17:41, Gregory Williams <greg@evilfunhouse.com> wrote:
>>> 
>>> We all come from diverse backgrounds. I’m not sure if “us” was meant to mean WG members, or the RDF community/users, or something else, but I’d suggest we not make claims about what “most” people are or are not confused by here.
>> 
>> Sure speaking for "us" is always a bit difficult in heterogenous groups (and the WG is a heterogeneous group, not to mention the RDF community/users)  but I share the sentiment that there was at least a lot of surprise about the position Ora brought forward. "Some" of "us" would definitely not like to constrain the mechanism just for the sake of not irritating prospective LPG-convertants.
> 
> Understood. But I have a slightly different take – it’s not just about “not irritating LPG-coverts.” It’s about leveraging real value to be found in increasing interop between LPG and SPARQL. The charter specifically calls out that the early RDF* work did this, and I continue to think we should be striving to maintain that.

RDD/LPG interop is certainly the big driver of this endeavour, but it still is an RDF project. And if the proposed mechanism has a straightforward interpretation as a grouping mechanism in RDF, if it in no way impedes the LPG use case, if on the other hand constraining it to singletons requires strange contortions on the RDF side, I really see no reason why we shouldn’t reap the benefits of this lucky opportunity.

>>> I’m not sure I’d describe myself as being confused by (most of) your proposal, but I do think it addresses use-cases which I myself have never encountered, it adds complexity for implementations,
>> 
>> How?
> 
> I think this is partly my fault for poor choice of language. I’ll admit that if I implemented the current proposal as pure triples, with the rdf:reifies data as just another triple in a simple graph, then it’s probably no more complex an implementation to do many-to-many as it would be for many-to-one.

Right. It is natural RDF.

> However, if you’re starting out with the goal of having a system that can support both LPG and SPARQL over a shared data model and storage system (as in AWS’s proposed OneGraph), or if you’ve got a system where triples/quads are stored in something like a relational table (conceptually, not necessarily at the implementation level), then you may very well already have identifiers for each edge/triple/row. What the many-to-many proposal does for these systems is block off a possible implementation approach that would use the existing identifiers as the reifiers. The “added complexity” I mentioned is really about having to add to these systems something on top of the existing identifiers. This extra system of many-to-many identifiers would then introduce requirements for extra storage/indexing and extra joins to support something that otherwise might be nearly already implemented.

I guess that RDBMS-based systems have such identifiers, whereas native triples/quad stores tend to not have them (exceptions notwithstanding). Souri from Oracke has no problem with many-to-many reifications, quite to the contrary. More on implememtation below.

Ora and Brian Thompson both maintain that implementation is not the problem. Of course they don’t really explain _what_ in their view the problem is...

>> I bet you happily use named graphs for grouping ;-) Well, maybe, maybe not, but if you accept that named graphs have no semantics and should not be used for anything else than application-specific concerns, then how do you group things?
> 
> I’d group things with application-level modeling. Not everything has to be built *into* RDF.

We don’t have to build it in, the many-to-many construct is already there - it came naturally with option 3 (the discussion from Jan/Feb). We’d have to put strange and possibly brittle constraints into place and find sensible answers to the questions that Enrico outlined to forbid its use.

> There are many ways to do this sort of grouping *on top of* RDF, and I’m nervous about the WG being well ahead of any practical experience with the current proposal.

I do understand that, but any decision can be the wrong one.

> If we get something wrong (

> if it’s hard to understand for users,

That is the claim made by Ora and others, but it’s largely unsubtantiated by any evidence, and many in the WG don’t share the intuition.

> or if it makes interop impossible,

It certainly doesn’t make it impossible. It may even make it easier, see my recent mail about a LPG/RDF interop use case that benefits from X-to-many reification.

> or if it makes efficient implementation much more difficult

Checking for well-formedness criteria certainly makes implementation more difficult. Ora and Brian Thompson were asked repeatedly for more detail about implementation issues and didn’t provide any. Souri from Oracle called X-to-many  reifications a non-issue, Dydra (which I represent in the WG) has no issue with them. No other implementor brought forward any concerns, even after Brian Thompson from Amazon explicitly called for it. From all that I reckon that there is no implementation issue.

> ), there’s no going back once we put it into the spec. There’s a fair amount of experience with the RDF* CG work that we’re standing on top of; it seems clear we’re going to go beyond what the work did (e.g. to support "parallel edges”/“multiple reifiers”), but the farther we go beyond the existing experience, the more risk there is that we get something wrong.

We can get it wrong either way. We can however decide to give this issue more time to investigate - modelling, mapping to LPG, querying, implementation, maybe hitting some unforseen problems. The RDF* CG wasted a lot of time, years even, by belitteling and ignoring questions about parallel edges (which is another name for occurrences). Accepting that and that those years are irrevocably lost might help us develop a more reasonable sense of how much time we still need. We have spent years on an unworkable proposal, we only have spent a few months on the much more practical occurrence/multi-edge based approach. It is certainly okay to need a few more weeks or months even to get this right.

>> And don’t you agree that grouping stuff by attributes is one of the most basic KR activities there is? That’s my use case, and to me its an absolute no-brainer, but if you don’t need it, don’t use it - completely fine by me! But formulating extra restrictions, performing operations at the heart of RDF, just to disable a potentially very (or maybe just marginally) useful feature? I’m really irritated by this fervor. I just can’t imagine who could actually get hurt, and how, if we leave this option open. Reminder: the annotation syntax doesn’t support it, so casual users won’t even note (which I could agree to be a good thing, given the LPG part of the target audience).

Best,
Thomas
> 
> thanks,
> greg
Received on Wednesday, 24 April 2024 21:08:13 UTC