Re: Future-Proofing for Transition to Multi-Edge caused by Data Arrival from Thomas Lörtsch on 2023-02-03 (public-rdf-star-wg@w3.org from February 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Sat, 4 Feb 2023 00:49:49 +0100
To: Timothée Haudebourg <timothee.haudebourg@spruceid.com>
Cc: Franconi Enrico <franconi@inf.unibz.it>, "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>
Message-Id: <B820D6E1-A28F-4F04-BFA9-E160F2BBF412@rat.io>
Hi Timothée,

> On 2. Feb 2023, at 23:56, Timothée Haudebourg <timothee.haudebourg@spruceid.com> wrote:
> 
> > It beats me how "more complicated, but possible" can be interpreted as "better".
> 
> For my part I didn't mean it is better. But as you stated: "_everything_ can be represented with n-ary relations in plain RDF", so what's the point of adding complexity to a model that is already powerful enough? That's also why I would prefer building quoted triples with reification.

Well, RDF standard reification did never catch on because it is so verbose. Property Graphs are very popular because they make annotating statements so easy. So syntax does indeed matter a lot. N-ary relations are not only not well standardized - there are different flavours and one can never know which one is applied - they also require re-modelling when a leaf becomes a subtree. Statement annotations can spare that effort as Souri demonstrated. I find that an extremely compelling feature. IMO usability of graphs is actually not all good: they provide a lot of expressive freedom when authoring but very little guidance in usage and retrieval. An idiom that can keep the freedom but improves guidance IMO is a very welcome addition.

Both the singleton property approach and Souri's RDFn are fundamentally different from RDF standard reification in that they are attached to specific statement occurrences. 
RDF standard reification doesn’t establish a connection between a specific triple and its reification. The reification rather describes an occurrence of the triple. It doesn’t represent "X" itsef but rather something like "someone said 'X'". 
Singleton Properties and RDFn do establish that connection and although it may seem like a small detail it has huge consequences. Most importantly they do indeed _qualify_ a specific statement, they create a sub-statement, a more detailed version of the original (i.e. un-annotated) statement. Whereas RDFn provides only syntax, the Singleton Property proposal also defines a semantics very much like the one you propose below. Well, actually I know at least three papers by Nguyen proposing three different semantics for singleton properties, but that only concerns the mapping to RDF/S vocabulary: is the singleton property an instance or a subproperty or something entirely new. The gist however is always the same.

> > It has some cons:
> >  - it needs fully indexed property columns, which some triple stores omit for optimization
> >  - it adds one more join to queries which is always bad
> >  - it messes with vocabulary terms, properties more specifically, which from a usability perspective is not advisable.
> 
> I completely agree that those are some legitimate concerns. If this is a recurring pattern, you need to be able to optimize it. I think I was confused because I don't quite see how it is directly related to RDF-star. 

RDF-star is such a strange creation… well, I’ve been advocating a more thorough handling of occurrences for years in the CG and lately Pierre-Antoine seems to have started to agree. 
The Singleton Property approach could directly be mapped to concrete occurrences, RDFn would provide the most efficient syntax, RDF-star with referentially transparent semantics describes the type of the un-annotated statement and RDF literals record such a statement type in syntactic fidelity. 
That’s the complete picture, it doesn’t cut any corners and is still easy to understand and relatively easy to implement. If that is too much then the first thing that I would throw out is the RDF-star embedded triple I’m afraid, because it has the least practical use on its own and the most potential for misunderstandings. 

> For me the solution would come from standardizing Singleton Properties (for this specific case). 

They can really be used for every case. It is much easier to construct supertypes on the fly by querying for all subtypes matching a specific criterion (e.g. all annotated triples that share the same SPO, without their annotations), than it is to create subtypes in hindsight by re-modelling existing data, turning leaves into subtrees.

> If there is an agreed upon way of using Singleton Properties in this case, then we can build optimizations, and even some syntactic sugar. Here is how I would do it:
> 
> Define, for each property `P` and non negative integer `i` a new IRI <https://w3c.org/indexed/P/i> such that:

I would not go for such a centralized namespace. Also, the 'P' would have to be percent encoded as it is an IRI on its own: not pretty.

> ```
> <https://w3c.org/indexed/P/i> a rdf:Property;
>    rdfs:subPropertyOf P;
>    rdfs:propertyIndex i.
> ```
> And for all `S` and `O`, the triple `S P O` entails `S <https://w3c.org/indexed/P/0> O` (the default index is 0).

Yes. And b.t.w. that solves all issues around set-semantics and monotonicity. An instantiation approach that declares the singleton property to be an instance of the original property's type is however also possible. I guess it depends on if one expects further subproperties of subproperties or not. Instantiating instances is of course also possible but generally not the way inheritance is modeled.

> Since `S <https://w3c.org/indexed/P/0> O` also entails `S P O` (by `rdfs:subPropertyOf`) then `<https://w3c.org/indexed/P/0>` is equivalent to `P`. By construction you can always reason on <https://w3c.org/indexed/P/?> instead of P. Therefore in practice, if your implementation is aware of this semantics, for each triple you can just add a single new column in your database for the `propertyIndex` part of the property and skip the rest. No fully indexed property needed, no additional join query.

Hahaaa, but only IF you have that extra column - but in principle you are of course right. I wasn’t involved in discussions about the Singleton Property and its implementation. "Evaluation of Metadata Representations in RDF stores" by Frey at al gives the best overview that I’m aware of. I don’t know why nobody seems to have proposed such an extra column. Maybe if Nguyen herself had pushed that aspect. But it’s certainly not too late!

So much for today…

Best,
Thomas

> Now we can add syntactic sugar over it. For instance:
>   - `S P O [i]` for `S <https://w3c.org/indexed/P/i> O`, and
>   - `S P O`     for `S <https://w3c.org/indexed/P/0> O`.
> 
> That's different from RDF-star though, maybe some RDF-i or something? But it could be combined with RDF-star to add annotations for instance:
> ```
> <<S P O [1]>> :label "One".
> <<S P O [2]>> :label "Two".
> ```
> 
> Using my related proposition on how to desugar quoted triples we would end up with:
> ```
> <https://w3c.org/reify/S/https:%2F%2Fw3c.org%2Findexed%2FP%2F1/O> a rdf:Statement;
>   rdf:subject S;
>   rdf:predicate <https://w3c.org/indexed/P/1>;
>   rdf:object O;
>   :label "One".
>   
> <https://w3c.org/indexed/P/0> a rdf:Property;
>   rdfs:subPropertyOf P;
>   rdfs:propertyIndex 1.
>   
> <https://w3c.org/reify/S/https:%2F%2Fw3c.org%2Findexed%2FP%2F2/O> a rdf:Statement;
>   rdf:subject S;
>   rdf:predicate <https://w3c.org/indexed/P/2>;
>   rdf:object O;
>   :label "Two";
>   
> <https://w3c.org/indexed/P/2> a rdf:Property;
>   rdfs:subPropertyOf P;
>   rdfs:propertyIndex 2.
> ```
> 
> I'll admit some IRIs are not human friendly. But only if you desugar everything! The syntactic sugar is powerful, concise, predictable enough to implement a clever optimization, optional for implementors, and all of that without the need to touch the core semantics of RDF.
> 
> Best,
> -- 
> Timothée
Received on Friday, 3 February 2023 23:50:07 UTC