Implementation issues [was Re: Future-proof modelling]

Hi Andy,

> On 22. Jan 2023, at 14:24, Andy Seaborne <andy@apache.org> wrote:
> 
> On 20/01/2023 23:27, Pierre-Antoine Champin wrote:
>> Dear Souri,
> ...
>> - it adds a "4th column" to every triple. IIUC, you seem to assume that all implementations already deal with some for of triple identifier, all we need is to expose it to the user. But I am not sure that all implementations have such an internal identifier (I am actually pretty sure that some don't).
> 
> Many systems, for persistent storage, intern RDF terms and that implicitly gives a triple identifier as a concatenation of SPO. But they work on quads so it is meaningless. And may change at a whim.
> 
> If in-memory, triples have value-equality-semantics, so two pointers to two different areas of memory can represent the same triple even if the terms are intern'ed are not the same reference. No single triple identifier.

I have to say that I was (and maybe still am) operating under the same assumption as Souri: most systems support statement identifiers anyway. Maybe not all, but many. And not all systems support quoted triples. So where does that lead us?

Some systems seem to (plan to) implement RDF-star via statement identifiers. IIRC Holger Knublauch made a remark to that effect and Stardog  does it already. Other systems that offer RDF and LPG support seem to use the fourth column to store statement identifiers for LPG and named graph identifiers for RDF. RDBMS based solutions like Oracle and Virtuoso could be expected to not have much trouble implementing a statement level identifier. AllegroGraph already provide statement identifiers in addition to a named graph identifier, similarily Dydra (IIRC). I spoke to you and Ruben Taelman about adding statement IDs to Jena and Communica respectively at the Berlin Graph WS 2019 and my impression was that you both wouldn’t see a big problem supporting them. Tools like KGTK implement triples plus identifier, no named graphs. MilleniumDB supports identifiers IIRC. If I understand your reamrk above correctly, Jena  would probably implement identifiers based on quoted triples rather than the other way round. Not sure about Qlever but given their approach I guess they would belong to the same camp as Jena - OTOH they are considered as as replacement for Blazegraph at WikiData, so they must support statement annotation (but maybe instead of graphs?). I don’t know about GraphDB and rdf4j.

In summary it seems that there is no very strong leaning to one side or the other. Some solutions support quoted triples, some statement identifiers, some support no named graphs but statement identifiers, some both, soem either or. Given the complexity of the problem it seems wise to just not take implementation issues into account. They are solvable. Usability and semantics are hard enough and important enough to discuss them on their own right and find a solution based on those discussions. There will be no free lunch for everybody. An extra level of expressivity doesn’t come for free.


Named graphs are an interesting aspect: they have no semantics and are not part of the RDF core. They are designed as a database administrators tool. I was trying to discuss how they could be provided with semantics - e.g. via conventions for sound graph naming and a dedicated vocabulary - but you for example were rather unwilling to discuss any approach in that direction. I’m convinced that it is possible to define such mechanisms with modest effort but I’m increasingly thinking that they might indeed be left without semantics, as a tool to optimize applications and organize out-of-band issues like administrative details. Maybe even as a tool to collect re-occurring statement annotations (no matter if ID-based or via quoted triples) in an application-oriented way. But in any case outside the formal semantics, just mapping to them if possible, as an application aspect, an implementation issue. 
However this also means that in the discussion about statement annotation they shouldn’t play any role, they should be assumed to not exist.


Best,
Thomas

Received on Monday, 23 January 2023 15:51:18 UTC