Re: Against the notion of reification well-formed graph (i.e., atomicity) from Thomas Lörtsch on 2024-01-25 (public-rdf-star-wg@w3.org from January 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Thu, 25 Jan 2024 13:27:06 +0100
To: Andy Seaborne <andy@apache.org>
Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Message-Id: <5556CF5F-D718-414A-80AA-98D8B6A0C15D@rat.io>
> On 25. Jan 2024, at 12:22, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> The answer to all these questions comes from the fundamental idea that the proposal just adds syntactic sugar.  So just do the expansion to triples and work from there, just as you would for any of the syntactic sugar already in Turtle.

Right. It should for example be specified under which conditions an unasserted refication of a statement counts. That’s just part of the specification work we have to do.

> Whether syntactic sugar is the correct approach is a separate question.

See below.

> peter
> 
> On 1/25/24 06:08, Andy Seaborne wrote:
>> On 23/01/2024 12:08, Thomas Lörtsch wrote:
>>> 
>>> 
>>>> On 23. Jan 2024, at 12:50, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>>> 
>>>> On 1/23/24 06:30, Thomas Lörtsch wrote:
>>>>>> On 23. Jan 2024, at 12:22, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>>>>> 
>>>> [..]
>>>>>> 
>>>>>> What the proposal does talk about is RDF reifications, nodes in an RDF graph that are subjects of rdf:subject, rdf:predicate, or rdf:object triples.  The well-formedness requirement states that an RDF graph is ill-formed if it has a node that is the subject of a triple with any of these predicates and is not the subject of exactly
>>>>> Shouldn’t this be changed to *at least*? See my prior mail in response to Dörthe.
>>>>>> one triple with each of these predicates.  No bijection between triples and anything is either mentioned or implied.  The notion of well-formedness is completely syntactic.
>>>> 
>>>> [...]
>>>> 
>>>> The proposal is *exactly*.  Changing to *at least* could make it harder to optimize RDF reifications in implementations.
>>>> 
>>>> As far as I can tell, multiple subjects, predicates, or objects is more difficult to optimize than missing subjects, predicates, or objects, but I haven't implemented an RDF triple store that optimizes RDF reifications.
>> Are there any today that make specific optimization for reification?

In comparisons of different approaches to metamodelling Virtuoso regularly performs equally well for RDF standard reification and named graphs, and very well overall. I can’t read the source code though, so I can’t tell you how they do it. Row level identifiers plus a bit indicating assertedness would be my guess since its based on an RDBMS.

>>> But what does it *mean*? Optimizations should only be applied after we know that it means what we want it to mean.
>> Agreed.
>> We can start with our goals. "bloat" has been used in two senses : "visual bloat" and "size bloat".

You’re forgetting "term bloat"…

>> Is the WG addressing the size bloat issue?

IMO it can’t be addressed in N-Triples, as N-Triples is per definition a strictly triple-based serialization, with pretty atomic terms (language tags to literals being the acceptable exception). It’s whole purpose is to ease processing by eliminating shortcuts. Everything you add to that - especially new term types that combine already defined atomic term types into more complex term types, e.g. triple terms - breaks this simplicity and straightforwardness.

B.t.w. "streaming" is an argument that has been brought forward a lot. Can you point me to any halfway concise treatise of the problems and practices of streaming RDF data? I’d like to understand how solving problems with reification by means of a triple term would relate to other issues. Would it be a decisive breakthrough, or more like a drop in the bucket?
Because my hunch is that it’s rather the latter. And where does it end? What about list terms? What about CBD terms? Or even graph terms?


I’ve been peaking into your draft proposal that you mentioned to Felix the other day, at
https://github.com/afs/rdf-star-notes/blob/main/reif-atoms.md
You give a list of 7 problems with RDF reification. Some of them (problems 4, 5 and 6) would be handled by a notion of wellformedness. Problem 1 would be solved by the proposed annotation syntax. Problems 3 and 7, especially blank nodes split over multiple graphs when breaking a big graph into files of a more manageable size, are not specific to reification but a general problem. 
That leaves problem 2, verbosity in N-Triples, and that just comes with the terrain. There sometimes are more or less verbose ways to represent a complex type in straight triples - RDF Collections are much worse than RDF Containers - but RDF standard reification is not too bad in that respect. 
You mention a reification atom <<(s p o)>> as a possible addition to N-Triples. That seems like a slight variation of N-Triples-Star to me, and I’m not fundamentally opposed if it helps and doesn’t rely on a new term type. My question is: does it really help? And would you also add list atoms, CBD atoms, graph atoms?

Thomas


>> Optimization is not just storage space (and the choices there change over the space of a few years at the moment) - it's also preserving the outcome of queries.
>> What does SELECT (count(*) AS ?C) { ?s ?p ?o } return?
>> or any query with a ?p.

>>     Andy
>>> 
>>> I just realized that saying *at least* makes an implicit assumption about different terms in object position refering to the same entity in the realm of interpretation, i.e. a kind of owl:sameAs-ness. That may be way beyond what we want fix, and insofar saying *exactly* might be the safer and more restrained definition.
>>> Still it introduces a hint of opacity that I’m not happy with.
>>> 
>>> Thomas
>>> 
>>>> peter
>>>> 
>>> 
>>> 
>
Received on Thursday, 25 January 2024 12:27:18 UTC