Anecdote/implemention note Re: Against the notion of reification well-formed graph (i.e., atomicity) from Jerven Tjalling Bolleman on 2024-01-23 (public-rdf-star-wg@w3.org from January 2024)

From: Jerven Tjalling Bolleman <jerven.bolleman@sib.swiss>
Date: Tue, 23 Jan 2024 14:07:10 +0100
To: public-rdf-star-wg@w3.org
Message-ID: <5cfdd039-656a-4fed-aa39-150ef2df9108@sib.swiss>
Hi All,

Comment from the peanut gallery, please assign value as such:

A question regarding "wellformednes" comes up in the RDF/XML spec in the 
reification shorthand.

When using rdf:ID on a property element it introduces an refication 
quad. However the rdf:ID attribute value must be unique in a single 
RDF/XML document.

UniProt is distributed as RDF/XML (on FTP, AWS open-data etc.) ignores
this and happily duplicates rdf:ID attribute values. Normally without 
issue for the UniProt users. Now, in the UniProt example there are no 
cases of any rdf:Statement with multiple rdf:subject, rdf:predicate, 
rdf:object values unless inferred by owl reasoning after ingestion.

The "normally" without issues is true because it is very expensive to 
check for duplicate rdf:ID values. So most tools stop checking after X 
number of rdf:ID values have been found. This means this duplicate 
rdf:ID issue only triggers problems when they are close to the beginning 
of one of the UniProt rdf files. While that has happened in the past, we 
only ever received one complaint about this. So in practice this unique 
rdf:ID idea is a soft constraint.


A Note regarding optimizations in store for multiple subjects, 
predicates objects for one named triple.

Taking the approach of a QUAD(S id, P id, O id, G id) and VALUES(I id, V 
value) table we add one table REIFICATION(T id, S id, P id, O id) and we 
either have an unique constraint on REIFICATION.T or not.

The second QUAD(S id, P id, O id, G id, T id) is also possible and also 
allows for duplicate T (T points to the name of the triple not in the 
VALUES table). We already need to deal with multiple T for any SPOG 
combination.

This makes me feel that being able to deal with "illformed" options 
won't be to bad for implementers. Considering that this will be 
relatively rare as most data will be provided with syntactic shorthands 
that prevents "illformed" options.

Regards,
Jerven




On 1/23/24 13:30, Doerthe Arndt wrote:
> Dear Peter,
> 
> Just to be sure (I am still forming an opinion): your proposal would be to add a definition of "wellformedness" (name might change) for  RDF graphs which is simply a property they may or may not have. Then, you envision that there are some applications which expect wellformed graphs as input and complain/warn in case of malformed input?  Or should wellformedness be a global requirement for anyone sharing an RDF graph?
> 
> Kind regards,
> Dörthe
> 
>> Am 23.01.2024 um 13:17 schrieb Peter F. Patel-Schneider <pfpschneider@gmail.com>:
>>
>>
>>
>> On 1/23/24 07:08, Thomas Lörtsch wrote:
>>>> On 23. Jan 2024, at 12:50, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>>>
>>>> On 1/23/24 06:30, Thomas Lörtsch wrote:
>>>>>> On 23. Jan 2024, at 12:22, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>>>>>
>>>> [..]
>>>>>>
>>>>>> What the proposal does talk about is RDF reifications, nodes in an RDF graph that are subjects of rdf:subject, rdf:predicate, or rdf:object triples.  The well-formedness requirement states that an RDF graph is ill-formed if it has a node that is the subject of a triple with any of these predicates and is not the subject of exactly
>>>>> Shouldn’t this be changed to *at least*? See my prior mail in response to Dörthe.
>>>>>> one triple with each of these predicates.  No bijection between triples and anything is either mentioned or implied.  The notion of well-formedness is completely syntactic.
>>>>
>>>> [...]
>>>>
>>>> The proposal is *exactly*.  Changing to *at least* could make it harder to optimize RDF reifications in implementations.
>>>>
>>>> As far as I can tell, multiple subjects, predicates, or objects is more difficult to optimize than missing subjects, predicates, or objects, but I haven't implemented an RDF triple store that optimizes RDF reifications.
>>> But what does it *mean*? Optimizations should only be applied after we know that it means what we want it to mean.
>>> I just realized that saying *at least* makes an implicit assumption about different terms in object position refering to the same entity in the realm of interpretation, i.e. a kind of owl:sameAs-ness. That may be way beyond what we want fix, and insofar saying *exactly* might be the safer and more restrained definition.
>>> Still it introduces a hint of opacity that I’m not happy with.
>>> Thomas
>>>> peter
>>
>> Not so.  The proposal makes no changes to the semantics of RDF.   (So far. There might be a semantic extension that does but I don't think that any change to the semantics, even if there is a semantic extension, is necessary.)
>>
>> So the formal RDF meaning of an RDF graph like
>>
>> :a rdf:subject :b, :c .
>>
>> is unchanged.
>>
>> Of course, users can add whatever intended meaning they want.  As far as I know, there is nothing in any RDF document that argues against users creating their own semantic extensions based on what RDF graphs mean for them.  There is not even any prohibition against users creating their own completely different meaning for RDF graphs.
>>
>> peter
>>
>>
>
Received on Tuesday, 23 January 2024 13:07:19 UTC