Re: Future-Proofing for Transition to Multi-Edge caused by Data Arrival from Thomas Lörtsch on 2023-01-30 (public-rdf-star-wg@w3.org from January 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Mon, 30 Jan 2023 01:18:09 +0100
To: Franconi Enrico <franconi@inf.unibz.it>
Cc: Timothée Haudebourg <timothee.haudebourg@spruceid.com>, "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>
Message-Id: <70382D34-A339-47F2-B40D-5935073ECC2D@rat.io>
The example you make is akin to "Singleton Property" [0], an approach that does indeed have its merits. Why it didn’t garner widespread adoption and e.g. hadn’t mmore support at the W3C Graph Workshop in Berlin 2019 remains a not too well explored question. 

It has some cons: 
- it needs fully indexed property columns, which some triple stores omit for optimization
- it adds one more join to queries which is always bad
- it messes with vocabulary terms, properties more specifically, which from a usability perspective is not advisable.

It has some pros too:
- it has sound semantics (creating instances (or subtypes, depending on which paper you consult) without endangering set-semantics or monotonicity) and so would have RDFn if mapped to Singleton Properties (which is straightforward, as they both work on occurrences)
- it doesn’t need a new node type
- it doesn’t need to educate users on sound modelling, like RDF-star (good luck with that), because it doesn’t encourage unsound modelling (through perceived, and advertised, simplicity) in the first place.

> On 26. Jan 2023, at 17:55, Franconi Enrico <franconi@inf.unibz.it> wrote:
> 
> Totally agree. "Multi-edges" can be covered in this way.

It beats me how "more complicated, but possible" can be interpreted as "better". Maybe that depends on the interpretation of "huge changes" (below) - I don’t see them, at least not bigger than adding the quoted triple node type. But I see a lot of benefits for users and better interoperability because of more predictable modelling, which is what IMO should count most.

One thing should be clear: _everything_ can be represented with n-ary relations in plain RDF - that it not the question. The question must be: does an approach that claims to facilitate a popular need, like in this case statement annotation, actually meet demand and practices, is it easy, is it intuitive and do laypersons' intuitions and the underlying logical mechanisms match. In RDF-star they don't, neither in syntax (occurrences) nor in semantics (refererential opacity). No handwaving references to "prudent approach", "encouraging sound modelling" and "Transparency Enabling Properties" will solve those problems. Since when do people care for the advice of some logicians? The best a semantics can hope for is to formalize common practice, pave the cow paths and gently nudge stray users in the right direction.

The same companies that adopted RDF* early on and that proponents of the CG report like to cite as evidence for support of their approach, those same companies will be the first to ignore the proposed semantics and implement an "intuitive" version of ad-hoc occurrences (just like blank nodes are treated not as existentials but as nominals in most applications) and default referential transparency "because that’s what users expect". Wait for it.

RDF-star sure looks simple, but the problem it claims to solve isn’t. It’s like buying a cheap tool for a hard task: it will break, people will get hurt and in the end nothing will be achieved. RDF*/star is a failure. RDFn on the other hand shows real promise for the annoatation use case, provenance and LPG alike. The semantics can be fixed to RDF-conform monotonic and set-abiding sub-statements. An RDF literal datatype can do the rest and enable those niche use cases that the RDF-star semantics is optimized for (including modalities). That would be a complete, easy and prudent approach.

Best,
Thomas


[0] Nguyen, Vinh, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification? Making statements about statements using singleton property." Proceedings of the 23rd international conference on World wide web. 2014.


> —e.
> 
>> On 26 Jan 2023, at 17:48, Timothée Haudebourg <timothee.haudebourg@spruceid.com> wrote:
>> 
>> I'm sorry in advance If I missed the point of this example, but I feel like we are having the same conversation again and again here. There are countless of example where named triples, or multisets of triples, could facilitate a little bit the design of these kind of models. But it is never impossible to do with current RDF. Again with this example, we can use Greg's solution with multiple :index values. If you really want to have a single value per :index, it is my understanding that you can use sub-properties a such (I never used SPARQL so I hope its correct):
>> 
>> :Trump :servedAs45 :President .
>> :servedAs45 rdfs:subPropertyOf :servedAs .
>> :servedAs45 :index 45 .
>> 
>> :Trump :servedAs47 :President .
>> :servedAs47 rdfs:subPropertyOf :servedAs .
>> :servedAs47 :index 47 .
>> 
>> SELECT ?x {
>>   ?x ?servedAs :President .
>>   ?servedAs rdf:subPropertyOf :servedAs .
>>   ?servedAs :index ?idx
>> } ORDER BY ?idx
>> 
>> Yes it is always a little bit more verbose than what we would get with named triples, but then it is never impossible to do without. I thing giving more and more variants of the same problem won't get more convincing than that, unless we can show that the impact on the semantics and implementations of RDF and SPARQL will not be so dramatic it is still worth going in this direction. For the current understanding I have of RDF and SPARQL, it seems to me that the changes would be huge, for little benefits.
>> 
>> By the way, using sub properties like that, you can still have 
>> 
>> :Bob :knows << :Bush :servedAs :President >>
>> 
>> Where here Bob knows that Bush served as president but don't know how many times and the details that are specified in the graph. I don't know, it's not really relevant but I find it nice.
>> 
>> 
>> -- 
>> Timothée
>> 
>> 
>> Le 26/01/2023 à 16:55, Gregory Williams a écrit :
>>>> On Jan 26, 2023, at 6:06 AM, Souripriya Das <souripriya.das@oracle.com> wrote:
>>>> 
>>>> Initial Data (no shortcuts, to avoid hiding important details):
>>>> ------------------------------------------------------------------------------
>>>>     :Bush :servedAs :President .
>>>>     << :Bush :servedAs :President >> :index 43 .
>>>>     :Obama :servedAs :President .
>>>>     << :Obama :servedAs :President >> :index 44 .
>>>>     :Trump :servedAs :President .
>>>>     << :Trump :servedAs :President >> :index 45 .
>>>>     :Biden :servedAs :President .
>>>>     << :Biden :servedAs :President >> :index 46 .
>>> 
>>> …
>>> 
>>>> Let's say that in 2024 Trump wins again. How will the data get represented in RDF-star? Since RDF-star, like RDF, has "no duplicate triples" constraint, the data creator has to extend the schema to introduce a new structure that involves a new property, :occurrenceOf. Since the above query was written without expecting the new structure, it has to be modified as well.
>>> 
>>> 
>>> Souri –
>>> 
>>> I’m not sure I understand the issue here. Won’t the original query work just fine if the input data contains an extra index value?
>>> 
>>> << :Trump :servedAs :President >> :index 45, 47 .
>>> 
>>> =>
>>> 
>>> :Bush
>>> :Obama
>>> :Trump
>>> :Biden
>>> :Trump
>>> 
>>> Thanks,
>>> Greg
>>> 
>
Received on Monday, 30 January 2023 00:18:35 UTC