- From: Andy Seaborne <andy@apache.org>
- Date: Tue, 23 Jan 2024 17:04:45 +0000
- To: public-rdf-star-wg@w3.org
On 23/01/2024 13:07, Jerven Tjalling Bolleman wrote:
> Hi All,
>
> Comment from the peanut gallery, please assign value as such:
>
> A question regarding "wellformednes" comes up in the RDF/XML spec in the
> reification shorthand.
>
> When using rdf:ID on a property element it introduces an refication
> quad. However the rdf:ID attribute value must be unique in a single
> RDF/XML document.
>
> UniProt is distributed as RDF/XML (on FTP, AWS open-data etc.) ignores
> this and happily duplicates rdf:ID attribute values. Normally without
> issue for the UniProt users. Now, in the UniProt example there are no
> cases of any rdf:Statement with multiple rdf:subject, rdf:predicate,
> rdf:object values unless inferred by owl reasoning after ingestion.
>
> The "normally" without issues is true because it is very expensive to
> check for duplicate rdf:ID values. So most tools stop checking after X
> number of rdf:ID values have been found. This means this duplicate
> rdf:ID issue only triggers problems when they are close to the beginning
> of one of the UniProt rdf files. While that has happened in the past, we
> only ever received one complaint about this. So in practice this unique
> rdf:ID idea is a soft constraint.
Yes - I've encountered exactly that situation.
> A Note regarding optimizations in store for multiple subjects,
> predicates objects for one named triple.
>
> Taking the approach of a QUAD(S id, P id, O id, G id) and VALUES(I id, V
> value) table we add one table REIFICATION(T id, S id, P id, O id) and we
> either have an unique constraint on REIFICATION.T or not.
>
> The second QUAD(S id, P id, O id, G id, T id) is also possible and also
> allows for duplicate T (T points to the name of the triple not in the
> VALUES table). We already need to deal with multiple T for any SPOG
> combination.
>
> This makes me feel that being able to deal with "illformed" options
> won't be to bad for implementers. Considering that this will be
> relatively rare as most data will be provided with syntactic shorthands
> that prevents "illformed" options.
Here is some prior work:
https://www.hpl.hp.com/techreports/2003/HPL-2003-266.pdf
Andy
>
> Regards,
> Jerven
>
>
>
>
> On 1/23/24 13:30, Doerthe Arndt wrote:
>> Dear Peter,
>>
>> Just to be sure (I am still forming an opinion): your proposal would
>> be to add a definition of "wellformedness" (name might change) for
>> RDF graphs which is simply a property they may or may not have. Then,
>> you envision that there are some applications which expect wellformed
>> graphs as input and complain/warn in case of malformed input? Or
>> should wellformedness be a global requirement for anyone sharing an
>> RDF graph?
>>
>> Kind regards,
>> Dörthe
>>
>>> Am 23.01.2024 um 13:17 schrieb Peter F. Patel-Schneider
>>> <pfpschneider@gmail.com>:
>>>
>>>
>>>
>>> On 1/23/24 07:08, Thomas Lörtsch wrote:
>>>>> On 23. Jan 2024, at 12:50, Peter F. Patel-Schneider
>>>>> <pfpschneider@gmail.com> wrote:
>>>>>
>>>>> On 1/23/24 06:30, Thomas Lörtsch wrote:
>>>>>>> On 23. Jan 2024, at 12:22, Peter F. Patel-Schneider
>>>>>>> <pfpschneider@gmail.com> wrote:
>>>>>>>
>>>>> [..]
>>>>>>>
>>>>>>> What the proposal does talk about is RDF reifications, nodes in
>>>>>>> an RDF graph that are subjects of rdf:subject, rdf:predicate, or
>>>>>>> rdf:object triples. The well-formedness requirement states that
>>>>>>> an RDF graph is ill-formed if it has a node that is the subject
>>>>>>> of a triple with any of these predicates and is not the subject
>>>>>>> of exactly
>>>>>> Shouldn’t this be changed to *at least*? See my prior mail in
>>>>>> response to Dörthe.
>>>>>>> one triple with each of these predicates. No bijection between
>>>>>>> triples and anything is either mentioned or implied. The notion
>>>>>>> of well-formedness is completely syntactic.
>>>>>
>>>>> [...]
>>>>>
>>>>> The proposal is *exactly*. Changing to *at least* could make it
>>>>> harder to optimize RDF reifications in implementations.
>>>>>
>>>>> As far as I can tell, multiple subjects, predicates, or objects is
>>>>> more difficult to optimize than missing subjects, predicates, or
>>>>> objects, but I haven't implemented an RDF triple store that
>>>>> optimizes RDF reifications.
>>>> But what does it *mean*? Optimizations should only be applied after
>>>> we know that it means what we want it to mean.
>>>> I just realized that saying *at least* makes an implicit assumption
>>>> about different terms in object position refering to the same entity
>>>> in the realm of interpretation, i.e. a kind of owl:sameAs-ness. That
>>>> may be way beyond what we want fix, and insofar saying *exactly*
>>>> might be the safer and more restrained definition.
>>>> Still it introduces a hint of opacity that I’m not happy with.
>>>> Thomas
>>>>> peter
>>>
>>> Not so. The proposal makes no changes to the semantics of RDF. (So
>>> far. There might be a semantic extension that does but I don't think
>>> that any change to the semantics, even if there is a semantic
>>> extension, is necessary.)
>>>
>>> So the formal RDF meaning of an RDF graph like
>>>
>>> :a rdf:subject :b, :c .
>>>
>>> is unchanged.
>>>
>>> Of course, users can add whatever intended meaning they want. As far
>>> as I know, there is nothing in any RDF document that argues against
>>> users creating their own semantic extensions based on what RDF graphs
>>> mean for them. There is not even any prohibition against users
>>> creating their own completely different meaning for RDF graphs.
>>>
>>> peter
>>>
>>>
>>
>
Received on Tuesday, 23 January 2024 17:04:53 UTC