- From: Andy Seaborne <andy@apache.org>
- Date: Tue, 23 Jan 2024 17:04:45 +0000
- To: public-rdf-star-wg@w3.org
On 23/01/2024 13:07, Jerven Tjalling Bolleman wrote: > Hi All, > > Comment from the peanut gallery, please assign value as such: > > A question regarding "wellformednes" comes up in the RDF/XML spec in the > reification shorthand. > > When using rdf:ID on a property element it introduces an refication > quad. However the rdf:ID attribute value must be unique in a single > RDF/XML document. > > UniProt is distributed as RDF/XML (on FTP, AWS open-data etc.) ignores > this and happily duplicates rdf:ID attribute values. Normally without > issue for the UniProt users. Now, in the UniProt example there are no > cases of any rdf:Statement with multiple rdf:subject, rdf:predicate, > rdf:object values unless inferred by owl reasoning after ingestion. > > The "normally" without issues is true because it is very expensive to > check for duplicate rdf:ID values. So most tools stop checking after X > number of rdf:ID values have been found. This means this duplicate > rdf:ID issue only triggers problems when they are close to the beginning > of one of the UniProt rdf files. While that has happened in the past, we > only ever received one complaint about this. So in practice this unique > rdf:ID idea is a soft constraint. Yes - I've encountered exactly that situation. > A Note regarding optimizations in store for multiple subjects, > predicates objects for one named triple. > > Taking the approach of a QUAD(S id, P id, O id, G id) and VALUES(I id, V > value) table we add one table REIFICATION(T id, S id, P id, O id) and we > either have an unique constraint on REIFICATION.T or not. > > The second QUAD(S id, P id, O id, G id, T id) is also possible and also > allows for duplicate T (T points to the name of the triple not in the > VALUES table). We already need to deal with multiple T for any SPOG > combination. > > This makes me feel that being able to deal with "illformed" options > won't be to bad for implementers. Considering that this will be > relatively rare as most data will be provided with syntactic shorthands > that prevents "illformed" options. Here is some prior work: https://www.hpl.hp.com/techreports/2003/HPL-2003-266.pdf Andy > > Regards, > Jerven > > > > > On 1/23/24 13:30, Doerthe Arndt wrote: >> Dear Peter, >> >> Just to be sure (I am still forming an opinion): your proposal would >> be to add a definition of "wellformedness" (name might change) for >> RDF graphs which is simply a property they may or may not have. Then, >> you envision that there are some applications which expect wellformed >> graphs as input and complain/warn in case of malformed input? Or >> should wellformedness be a global requirement for anyone sharing an >> RDF graph? >> >> Kind regards, >> Dörthe >> >>> Am 23.01.2024 um 13:17 schrieb Peter F. Patel-Schneider >>> <pfpschneider@gmail.com>: >>> >>> >>> >>> On 1/23/24 07:08, Thomas Lörtsch wrote: >>>>> On 23. Jan 2024, at 12:50, Peter F. Patel-Schneider >>>>> <pfpschneider@gmail.com> wrote: >>>>> >>>>> On 1/23/24 06:30, Thomas Lörtsch wrote: >>>>>>> On 23. Jan 2024, at 12:22, Peter F. Patel-Schneider >>>>>>> <pfpschneider@gmail.com> wrote: >>>>>>> >>>>> [..] >>>>>>> >>>>>>> What the proposal does talk about is RDF reifications, nodes in >>>>>>> an RDF graph that are subjects of rdf:subject, rdf:predicate, or >>>>>>> rdf:object triples. The well-formedness requirement states that >>>>>>> an RDF graph is ill-formed if it has a node that is the subject >>>>>>> of a triple with any of these predicates and is not the subject >>>>>>> of exactly >>>>>> Shouldn’t this be changed to *at least*? See my prior mail in >>>>>> response to Dörthe. >>>>>>> one triple with each of these predicates. No bijection between >>>>>>> triples and anything is either mentioned or implied. The notion >>>>>>> of well-formedness is completely syntactic. >>>>> >>>>> [...] >>>>> >>>>> The proposal is *exactly*. Changing to *at least* could make it >>>>> harder to optimize RDF reifications in implementations. >>>>> >>>>> As far as I can tell, multiple subjects, predicates, or objects is >>>>> more difficult to optimize than missing subjects, predicates, or >>>>> objects, but I haven't implemented an RDF triple store that >>>>> optimizes RDF reifications. >>>> But what does it *mean*? Optimizations should only be applied after >>>> we know that it means what we want it to mean. >>>> I just realized that saying *at least* makes an implicit assumption >>>> about different terms in object position refering to the same entity >>>> in the realm of interpretation, i.e. a kind of owl:sameAs-ness. That >>>> may be way beyond what we want fix, and insofar saying *exactly* >>>> might be the safer and more restrained definition. >>>> Still it introduces a hint of opacity that I’m not happy with. >>>> Thomas >>>>> peter >>> >>> Not so. The proposal makes no changes to the semantics of RDF. (So >>> far. There might be a semantic extension that does but I don't think >>> that any change to the semantics, even if there is a semantic >>> extension, is necessary.) >>> >>> So the formal RDF meaning of an RDF graph like >>> >>> :a rdf:subject :b, :c . >>> >>> is unchanged. >>> >>> Of course, users can add whatever intended meaning they want. As far >>> as I know, there is nothing in any RDF document that argues against >>> users creating their own semantic extensions based on what RDF graphs >>> mean for them. There is not even any prohibition against users >>> creating their own completely different meaning for RDF graphs. >>> >>> peter >>> >>> >> >
Received on Tuesday, 23 January 2024 17:04:53 UTC