Re: Anecdote/implemention note Re: Against the notion of reification well-formed graph (i.e., atomicity) from Andy Seaborne on 2024-01-23 (public-rdf-star-wg@w3.org from January 2024)

From: Andy Seaborne <andy@apache.org>
Date: Tue, 23 Jan 2024 17:04:45 +0000
To: public-rdf-star-wg@w3.org
Message-ID: <5581a851-ffe7-4d65-8140-eaf923114702@apache.org>
On 23/01/2024 13:07, Jerven Tjalling Bolleman wrote:
> Hi All,
> 
> Comment from the peanut gallery, please assign value as such:
> 
> A question regarding "wellformednes" comes up in the RDF/XML spec in the 
> reification shorthand.
> 
> When using rdf:ID on a property element it introduces an refication 
> quad. However the rdf:ID attribute value must be unique in a single 
> RDF/XML document.
> 
> UniProt is distributed as RDF/XML (on FTP, AWS open-data etc.) ignores
> this and happily duplicates rdf:ID attribute values. Normally without 
> issue for the UniProt users. Now, in the UniProt example there are no 
> cases of any rdf:Statement with multiple rdf:subject, rdf:predicate, 
> rdf:object values unless inferred by owl reasoning after ingestion.
> 
> The "normally" without issues is true because it is very expensive to 
> check for duplicate rdf:ID values. So most tools stop checking after X 
> number of rdf:ID values have been found. This means this duplicate 
> rdf:ID issue only triggers problems when they are close to the beginning 
> of one of the UniProt rdf files. While that has happened in the past, we 
> only ever received one complaint about this. So in practice this unique 
> rdf:ID idea is a soft constraint.

Yes - I've encountered exactly that situation.

> A Note regarding optimizations in store for multiple subjects, 
> predicates objects for one named triple.
> 
> Taking the approach of a QUAD(S id, P id, O id, G id) and VALUES(I id, V 
> value) table we add one table REIFICATION(T id, S id, P id, O id) and we 
> either have an unique constraint on REIFICATION.T or not.
> 
> The second QUAD(S id, P id, O id, G id, T id) is also possible and also 
> allows for duplicate T (T points to the name of the triple not in the 
> VALUES table). We already need to deal with multiple T for any SPOG 
> combination.
> 
> This makes me feel that being able to deal with "illformed" options 
> won't be to bad for implementers. Considering that this will be 
> relatively rare as most data will be provided with syntactic shorthands 
> that prevents "illformed" options.

Here is some prior work:

https://www.hpl.hp.com/techreports/2003/HPL-2003-266.pdf

     Andy

> 
> Regards,
> Jerven
> 
> 
> 
> 
> On 1/23/24 13:30, Doerthe Arndt wrote:
>> Dear Peter,
>>
>> Just to be sure (I am still forming an opinion): your proposal would 
>> be to add a definition of "wellformedness" (name might change) for  
>> RDF graphs which is simply a property they may or may not have. Then, 
>> you envision that there are some applications which expect wellformed 
>> graphs as input and complain/warn in case of malformed input?  Or 
>> should wellformedness be a global requirement for anyone sharing an 
>> RDF graph?
>>
>> Kind regards,
>> Dörthe
>>
>>> Am 23.01.2024 um 13:17 schrieb Peter F. Patel-Schneider 
>>> <pfpschneider@gmail.com>:
>>>
>>>
>>>
>>> On 1/23/24 07:08, Thomas Lörtsch wrote:
>>>>> On 23. Jan 2024, at 12:50, Peter F. Patel-Schneider 
>>>>> <pfpschneider@gmail.com> wrote:
>>>>>
>>>>> On 1/23/24 06:30, Thomas Lörtsch wrote:
>>>>>>> On 23. Jan 2024, at 12:22, Peter F. Patel-Schneider 
>>>>>>> <pfpschneider@gmail.com> wrote:
>>>>>>>
>>>>> [..]
>>>>>>>
>>>>>>> What the proposal does talk about is RDF reifications, nodes in 
>>>>>>> an RDF graph that are subjects of rdf:subject, rdf:predicate, or 
>>>>>>> rdf:object triples.  The well-formedness requirement states that 
>>>>>>> an RDF graph is ill-formed if it has a node that is the subject 
>>>>>>> of a triple with any of these predicates and is not the subject 
>>>>>>> of exactly
>>>>>> Shouldn’t this be changed to *at least*? See my prior mail in 
>>>>>> response to Dörthe.
>>>>>>> one triple with each of these predicates.  No bijection between 
>>>>>>> triples and anything is either mentioned or implied.  The notion 
>>>>>>> of well-formedness is completely syntactic.
>>>>>
>>>>> [...]
>>>>>
>>>>> The proposal is *exactly*.  Changing to *at least* could make it 
>>>>> harder to optimize RDF reifications in implementations.
>>>>>
>>>>> As far as I can tell, multiple subjects, predicates, or objects is 
>>>>> more difficult to optimize than missing subjects, predicates, or 
>>>>> objects, but I haven't implemented an RDF triple store that 
>>>>> optimizes RDF reifications.
>>>> But what does it *mean*? Optimizations should only be applied after 
>>>> we know that it means what we want it to mean.
>>>> I just realized that saying *at least* makes an implicit assumption 
>>>> about different terms in object position refering to the same entity 
>>>> in the realm of interpretation, i.e. a kind of owl:sameAs-ness. That 
>>>> may be way beyond what we want fix, and insofar saying *exactly* 
>>>> might be the safer and more restrained definition.
>>>> Still it introduces a hint of opacity that I’m not happy with.
>>>> Thomas
>>>>> peter
>>>
>>> Not so.  The proposal makes no changes to the semantics of RDF.   (So 
>>> far. There might be a semantic extension that does but I don't think 
>>> that any change to the semantics, even if there is a semantic 
>>> extension, is necessary.)
>>>
>>> So the formal RDF meaning of an RDF graph like
>>>
>>> :a rdf:subject :b, :c .
>>>
>>> is unchanged.
>>>
>>> Of course, users can add whatever intended meaning they want.  As far 
>>> as I know, there is nothing in any RDF document that argues against 
>>> users creating their own semantic extensions based on what RDF graphs 
>>> mean for them.  There is not even any prohibition against users 
>>> creating their own completely different meaning for RDF graphs.
>>>
>>> peter
>>>
>>>
>>
>
Received on Tuesday, 23 January 2024 17:04:53 UTC