Re: The singleton property option from Thompson, Bryan on 2024-04-30 (public-rdf-star-wg@w3.org from April 2024)

From: Thompson, Bryan <bryant@amazon.com>
Date: Tue, 30 Apr 2024 19:46:51 +0000
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>, "Williams, Gregory" <ngregwil@amazon.com>, "Schmidt, Michael" <schmdtm@amazon.com>
Message-ID: <429abfd1e8294dc281b23e898392ca13@amazon.com>
Your proposal would require two statements on top of the original SPO statement before you should begin to make assertions about the original SPO statement?


Anything based on the singleton property approach will have quite an impact on database statistics.  The number of used predicates would jump from millions (for open linked data) to the cardinality of the statements about which statements are being made (e.g., billions, 10s of billions, etc.).  @Williams, Gregory<mailto:ngregwil@amazon.com> or @Schmidt, Michael<mailto:schmdtm@amazon.com> can comment on this, but this certainly places a new burden on common techniques for extracting statistics from a graph.


Note that there is really no reason to rely on the P position in your proposal.  You could use S since it already allows blank nodes.  You then hang the Subject of the original asserted SPO on the statement about that unique subject. (Or you could use O, which might be kinder for database statistics since they tend to focus on SP* analysis.)


_:si :statementInstanceHasSubject :s .
_:si :p :o .
:s :p :o.


I have been impressed in the past with the space and time overhead which arises out of various modeling decisions around possible statements about statements treatments.  I would recommend carefully considering that impact.  Another 2 triples makes a huge difference when all statements carry annotations, as they do in some domains.  For example, consider the relatively common case in which you have a graph consisting of a topology and edge weights.  This is very common - lots of graphs are simply edges and their weights.  As I understand it, your proposal would have 3 times the data volume to model the topology (some set of edges) in a manner which would permit associating edge weights with the edges in that topology.  And the database would need to chase a long chain to obtain those edge weights in a correct manner:  :s :p :o. => :s _:pi :o => _:pi rdfs:subPropertyOf :p . => _:pi :hasWeight 1.0.  The cost of chasing that chain would make applications relying on edge weights very expensive in both time and space.  I can't see that as being responsive to such use cases.  To be efficient, there needs to be a close association between an edge and the properties of that edge.  Their resolution needs to be very efficient.


Also note that this singleton property proposal would not support alignment in the data (interoperability in the data) with LPG edge properties.  So it would fail to offer a unification path for the common use cases of RDF and LPG.


Thanks,

Bryan

________________________________
From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Sent: Tuesday, April 30, 2024 10:40:18 AM
To: public-rdf-star-wg@w3.org
Subject: RE: [EXTERNAL] The singleton property option

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



I think that this is far too strong.   The singleton property approach has
problems, but not to this extent.

For any statement that does not require annotation, the singleton property
approach does not require any changes at all, i.e.,  just use
:s :p :o .

For a statement that does require annotation, the singleton property requires
two or three triples, one to make the blank node a subproperty of the desired
property, one to state the relationship using the blank node, and, if the RDF
system does not implement RDFS semantics, one to make the statement using the
regular property, i.e.,
_:pi rdfs:subPropertyOf :p .
:s _:pi :o .
:s :p :o.
The added storage for this might be less than that needed for efficient
processing of quoted triples, particularly if the third statement is not needed.

There is no need to change modelling if the statement is annotated after the fact.

peter



On 4/30/24 12:26, Thompson, Bryan wrote:
> The singleton property approach undermines the direct use of predicates in
> statements and forces a second hop for any use case to determine the actual
> predicate used.  It also requires that the "statement" is modeled differently
> in advance, thus increasing the space requirements even if no statements about
> statements are used.
>
>
> This is not efficient.
>
>
> Effectively, the singleton property model says that the RDF triple is wrong.
> It says that you should model using (S ID O) and then model the predicate and
> other information as statements about that ID.  This is not the RDF model.
>
>
> The approach with Statements about Statements should IMHO be built on (S P O
> ID).  That is, there is a unique identifier for the SPO and you make
> statements about that statement ID.
>
>
> Bryan
>
> ------------------------------------------------------------------------------
> *From:* Thomas Lörtsch <tl@rat.io>
> *Sent:* Tuesday, April 30, 2024 12:02:21 AM
> *To:* public-rdf-star-wg@w3.org; Thompson, Bryan; Niklas Lindström; RDF-star
> Working Group
> *Subject:* RE: [EXTERNAL] The singleton property option
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know the
> content is safe.
>
>
> Brian,
>
> Niklas combines the RDF-star syntax with the semantics of Singleton
> Properties. AFAIK no implementations of or papers on Singleton Properties have
> done that. This combination doesn't even require an index on properties.
>
> This combination is nearer to the original RDR approach than anything else
> discussed by CG and WG. It is IMO a very neat idea and deserves a closer look.
>
> Thomas
>
>
>
> Am 29. April 2024 19:06:37 MESZ schrieb "Thompson, Bryan" <bryant@amazon.com>:
>
>     The singleton property approach has many downsides and is pragmatically
>     unworkable.  There is a good reason people are not happy with this approach.
>
>
>     Bryan
>
>     ------------------------------------------------------------------------------
>     *From:* Niklas Lindström <lindstream@gmail.com>
>     *Sent:* Friday, April 26, 2024 2:08:41 PM
>     *To:* RDF-star Working Group
>     *Subject:* [EXTERNAL] The singleton property option
>     CAUTION: This email originated from outside of the organization. Do not
>     click links or open attachments unless you can confirm the sender and know
>     the content is safe.
>
>
>
>     For completeness (and perhaps to widen the perspective), here is the
>     singleton property option I briefly mentioned on the semantics call
>     (and alluded to in [1]). Also see [2] for the original; this is just a
>     quick strawman adaptation for the benefit of the LPG perspective.
>
>     It extends RDF 1.1 differently; no triple terms, no opacity, just:
>
>     1. Allow bnodes as predicates (blank predicates).
>     2. Define rdf:singletonPropertyOf for linking those to the property
>     they represent instances/occurrences/edges of.
>
>     3. Well-formedness conditions:
>     3.1 Bnode predicates are only to be used once; with one s and o
>     (similar to list cons nodes, who are "single purposed").
>     3.2 The rdf:singletonPropertyOf is semantically functional (exactly
>     like rdf:first and rdf:rest).
>
>     4. For optimization, implementations can put triples with blank
>     predicates in a dedicated table (using edgename as unique key),
>     relying on well-formedness for cohesion. Such a table is completed in
>     two steps: 1) the singleton assertion inserts s and o for edgename; 2)
>     the rdf:singletonPropertyOf assertion inserts p for edgename. If
>     well-formedness is broken, all optimization bets are off. Perhaps a
>     dedicated skolemization scheme can be employed for some more control
>     and/or "unstarring".
>
>     5. RDF-star syntax obviously needs no naming syntax; naming these
>     would break well-formedness.
>     6. What these *mean* of course needs a good definition (property
>     specializations, edge type instances or similar). Are they asserted?
>     Sure. Do they assert something using their rdf:singletonPropertyOf
>     property as predicate? No. (Could they? Well, they can be declared
>     ("inline") to *also* be subPropertyOf the same property, and through
>     entailment that would happen.)
>     7. Reifiers become a usage pattern (informative) as suggested from the
>     property edge perspective. Any desired :reifiedBy or :partOf relation
>     can link predicate singletons to one or more "reifiers".
>
>     Basic example:
>
>          << :s :p :o >> :source <stream662be7ba> ;
>              :timestampMills 1714153402 .
>
>     Expands to:
>
>          :s _:e1 :o .
>          _:e1 rdf:singletonPropertyOf :p ;
>              :source <stream662be7ba> ;
>              :timestampMills 1714153402 .
>
>     Annotation syntax:
>
>          :s :p :o {| :reifiedBy <#reifier> |} .
>
>     Expands to:
>
>          :s :p :o .
>          :s _:e1 :o .
>          _:e1 rdf:singletonPropertyOf :p ;
>            :reifiedBy <#reifier> .
>
>     Possible singleton property entailment?:
>
>          _:e1 a rdf:SingletonProperty;
>              rdf:subject :s ;
>              rdf:prediate :p ;
>              rdf:object :o .
>
>     Will entailment break well-formedness if (accidentally?) *put back*
>     into a regular graph? Of course, just as RDF lists are "broken"
>     whenever that happens (as in look terrible when serialized, make no
>     sense when queried, etc.).
>
>     Best regards,
>     Niklas
>
>     [1]:
>     <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Apr/0158.html
>     <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Apr/0158.html>>
>     [2]: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/
>     <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/>>
>
Received on Tuesday, 30 April 2024 19:46:57 UTC