Re: Implementation issues [was Re: Future-proof modelling] from James Anderson on 2023-01-23 (public-rdf-star-wg@w3.org from January 2023)

From: James Anderson <anderson.james.1955@gmail.com>
Date: Mon, 23 Jan 2023 17:19:57 +0100
To: public-rdf-star-wg@w3.org
Message-Id: <75FC8388-B79A-4F72-80D5-930D31B9B865@gmail.com>
good afternoon;

> On 23. Jan 2023, at 16:50, Thomas Lörtsch <tl@rat.io> wrote:
> 
> Hi Andy,
> 
>> On 22. Jan 2023, at 14:24, Andy Seaborne <andy@apache.org> wrote:
>> 
>> On 20/01/2023 23:27, Pierre-Antoine Champin wrote:
>>> Dear Souri,
>> ...
>>> - it adds a "4th column" to every triple. IIUC, you seem to assume that all implementations already deal with some for of triple identifier, all we need is to expose it to the user. But I am not sure that all implementations have such an internal identifier (I am actually pretty sure that some don't).
>> 
>> Many systems, for persistent storage, intern RDF terms and that implicitly gives a triple identifier as a concatenation of SPO. But they work on quads so it is meaningless. And may change at a whim.
>> 
>> If in-memory, triples have value-equality-semantics, so two pointers to two different areas of memory can represent the same triple even if the terms are intern'ed are not the same reference. No single triple identifier.
> 
> I have to say that I was (and maybe still am) operating under the same assumption as Souri: most systems support statement identifiers anyway. Maybe not all, but many. And not all systems support quoted triples. So where does that lead us?
> 
> Some systems seem to (plan to) implement RDF-star via statement identifiers. IIRC Holger Knublauch made a remark to that effect and Stardog  does it already. Other systems that offer RDF and LPG support seem to use the fourth column to store statement identifiers for LPG and named graph identifiers for RDF. RDBMS based solutions like Oracle and Virtuoso could be expected to not have much trouble implementing a statement level identifier. AllegroGraph already provide statement identifiers in addition to a named graph identifier, similarily Dydra (IIRC). I spoke to you and Ruben Taelman about adding statement IDs to Jena and Communica respectively at the Berlin Graph WS 2019 and my impression was that you both wouldn’t see a big problem supporting them. Tools like KGTK implement triples plus identifier, no named graphs. MilleniumDB supports identifiers IIRC. If I understand your reamrk above correctly, Jena  would probably implement identifiers based on quoted triples rather than the other way round. Not sure about Qlever but given their approach I guess they would belong to the same camp as Jena - OTOH they are considered as as replacement for Blazegraph at WikiData, so they must support statement annotation (but maybe instead of graphs?). I don’t know about GraphDB and rdf4j.

perhaps it would be useful were someone to conduct a detailed and conclusive survey about the approaches which implementations are taking to statement identity.
the statement, above, with respect to dydra does not agree with my understanding of the implementation.

> 
> In summary it seems that there is no very strong leaning to one side or the other. Some solutions support quoted triples, some statement identifiers, some support no named graphs but statement identifiers, some both, soem either or. Given the complexity of the problem it seems wise to just not take implementation issues into account. They are solvable.

not approaches which depend on particular optimization to make them effective are.
if one restricts the notion of statement identity such that it is determined by internal term identifiers it becomes a non-starter.

> Usability and semantics are hard enough and important enough to discuss them on their own right and find a solution based on those discussions. There will be no free lunch for everybody. An extra level of expressivity doesn’t come for free.
> ...

---
james anderson | james@dydra.com | https://dydra.com
Received on Monday, 23 January 2023 16:20:23 UTC