Re: The singleton property option from Peter F. Patel-Schneider on 2024-05-02 (public-rdf-star-wg@w3.org from May 2024)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 2 May 2024 10:00:59 -0400
To: public-rdf-star-wg@w3.org
Message-ID: <530f30bb-5460-4ede-bc08-f76b6ec33cd6@gmail.com>
The singleton property approach has benefits and downsides.  The quoted triple 
approach has benefits and downsides.

One very big advantage of the singleton property approach is that it is 
(barely) possible to use it with any RDF system, even RDF systems that have no 
optimizations.  A big disadvantage of the quoted triple approach is that it 
requires new syntax, new semantics, and new implementations.

One cannot successfully argue that just because the singleton property 
approach may require more triples that it is inherently worse than the quoted 
triple approach.   RDF implementations can be tuned to the singleton property 
approach, providing special data structures for singleton properties and 
special code to optimize SPARQL queries for the singleton property approach.

One possible way to do this is to use a special approach for singleton 
properties where the internal name of the blank node encodes the parent 
property.  This could result in minimal or even no storage overhead for 
singleton properties.  Of course the implementation effort to make this 
completely transparent would be significant, but then so is the effort to make 
a performative implementation of quoted triples.

I note that in this approach the singleton property triples would look very 
much like multiple edges, i.e., this could be considered to be a 
space-efficient implementation of RDFn.

peter


On 4/30/24 15:46, Thompson, Bryan wrote:
> Your proposal would require two statements on top of the original SPO 
> statement before you should begin to make assertions about the original SPO 
> statement?
> 
> 
> Anything based on the singleton property approach will have quite an impact on 
> database statistics.  The number of used predicates would jump from millions 
> (for open linked data) to the cardinality of the statements about which 
> statements are being made (e.g., billions, 10s of billions, etc.). @Williams, 
> Gregory <mailto:ngregwil@amazon.com> or @Schmidt, Michael 
> <mailto:schmdtm@amazon.com> can comment on this, but this certainly places a 
> new burden on common techniques for extracting statistics from a graph.
> 
> 
> Note that there is really no reason to rely on the P position in your 
> proposal.  You could use S since it already allows blank nodes.  You then hang 
> the Subject of the original asserted SPO on the statement about that unique 
> subject. (Or you could use O, which might be kinder for database statistics 
> since they tend to focus on SP* analysis.)
> 
> 
> _:si :statementInstanceHasSubject :s .
> _:si :p :o .
> :s :p :o.
> 
> 
> I have been impressed in the past with the space and time overhead which 
> arises out of various modeling decisions around possible statements about 
> statements treatments.  I would recommend carefully considering that impact.  
> Another 2 triples makes a huge difference when all statements carry 
> annotations, as they do in some domains.  For example, consider the relatively 
> common case in which you have a graph consisting of a topology and edge 
> weights.  This is very common - lots of graphs are simply edges and their 
> weights.  As I understand it, your proposal would have 3 times the data volume 
> to model the topology (some set of edges) in a manner which would permit 
> associating edge weights with the edges in that topology.  And the database 
> would need to chase a long chain to obtain those edge weights in a correct 
> manner: :s :p :o. => :s _:pi :o => _:pi rdfs:subPropertyOf :p . => _:pi 
> :hasWeight 1.0.  The cost of chasing that chain would make applications 
> relying on edge weights very expensive in both time and space.  I can't see 
> that as being responsive to such use cases.  To be efficient, there needs to 
> be a close association between an edge and the properties of that edge.  Their 
> resolution needs to be very efficient.
> 
> 
> Also note that this singleton property proposal would not support alignment in 
> the data (interoperability in the data) with LPG edge properties.  So it would 
> fail to offer a unification path for the common use cases of RDF and LPG.
> 
> 
> Thanks,
> 
> Bryan
> 
> ------------------------------------------------------------------------------
> *From:* Peter F. Patel-Schneider <pfpschneider@gmail.com>
> *Sent:* Tuesday, April 30, 2024 10:40:18 AM
> *To:* public-rdf-star-wg@w3.org
> *Subject:* RE: [EXTERNAL] The singleton property option
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
> 
> 
> 
> I think that this is far too strong.   The singleton property approach has
> problems, but not to this extent.
> 
> For any statement that does not require annotation, the singleton property
> approach does not require any changes at all, i.e.,  just use
> :s :p :o .
> 
> For a statement that does require annotation, the singleton property requires
> two or three triples, one to make the blank node a subproperty of the desired
> property, one to state the relationship using the blank node, and, if the RDF
> system does not implement RDFS semantics, one to make the statement using the
> regular property, i.e.,
> _:pi rdfs:subPropertyOf :p .
> :s _:pi :o .
> :s :p :o.
> The added storage for this might be less than that needed for efficient
> processing of quoted triples, particularly if the third statement is not needed.
> 
> There is no need to change modelling if the statement is annotated after the fact.
> 
> peter
> 
> 
> 
> On 4/30/24 12:26, Thompson, Bryan wrote:
>> The singleton property approach undermines the direct use of predicates in
>> statements and forces a second hop for any use case to determine the actual
>> predicate used.  It also requires that the "statement" is modeled differently
>> in advance, thus increasing the space requirements even if no statements about
>> statements are used.
>>
>>
>> This is not efficient.
>>
>>
>> Effectively, the singleton property model says that the RDF triple is wrong.
>> It says that you should model using (S ID O) and then model the predicate and
>> other information as statements about that ID.  This is not the RDF model.
>>
>>
>> The approach with Statements about Statements should IMHO be built on (S P O
>> ID).  That is, there is a unique identifier for the SPO and you make
>> statements about that statement ID.
>>
>>
>> Bryan
>>
>> ------------------------------------------------------------------------------
>> *From:* Thomas Lörtsch <tl@rat.io>
>> *Sent:* Tuesday, April 30, 2024 12:02:21 AM
>> *To:* public-rdf-star-wg@w3.org; Thompson, Bryan; Niklas Lindström; RDF-star
>> Working Group
>> *Subject:* RE: [EXTERNAL] The singleton property option
>>
>> *CAUTION*: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know the
>> content is safe.
>>
>>
>> Brian,
>>
>> Niklas combines the RDF-star syntax with the semantics of Singleton
>> Properties. AFAIK no implementations of or papers on Singleton Properties have
>> done that. This combination doesn't even require an index on properties.
>>
>> This combination is nearer to the original RDR approach than anything else
>> discussed by CG and WG. It is IMO a very neat idea and deserves a closer look.
>>
>> Thomas
>>
>>
>>
>> Am 29. April 2024 19:06:37 MESZ schrieb "Thompson, Bryan" <bryant@amazon.com>:
>>
>>     The singleton property approach has many downsides and is pragmatically
>>     unworkable.  There is a good reason people are not happy with this approach.
>>
>>
>>     Bryan
>>
>>     ------------------------------------------------------------------------------
>>     *From:* Niklas Lindström <lindstream@gmail.com>
>>     *Sent:* Friday, April 26, 2024 2:08:41 PM
>>     *To:* RDF-star Working Group
>>     *Subject:* [EXTERNAL] The singleton property option
>>     CAUTION: This email originated from outside of the organization. Do not
>>     click links or open attachments unless you can confirm the sender and know
>>     the content is safe.
>>
>>
>>
>>     For completeness (and perhaps to widen the perspective), here is the
>>     singleton property option I briefly mentioned on the semantics call
>>     (and alluded to in [1]). Also see [2] for the original; this is just a
>>     quick strawman adaptation for the benefit of the LPG perspective.
>>
>>     It extends RDF 1.1 differently; no triple terms, no opacity, just:
>>
>>     1. Allow bnodes as predicates (blank predicates).
>>     2. Define rdf:singletonPropertyOf for linking those to the property
>>     they represent instances/occurrences/edges of.
>>
>>     3. Well-formedness conditions:
>>     3.1 Bnode predicates are only to be used once; with one s and o
>>     (similar to list cons nodes, who are "single purposed").
>>     3.2 The rdf:singletonPropertyOf is semantically functional (exactly
>>     like rdf:first and rdf:rest).
>>
>>     4. For optimization, implementations can put triples with blank
>>     predicates in a dedicated table (using edgename as unique key),
>>     relying on well-formedness for cohesion. Such a table is completed in
>>     two steps: 1) the singleton assertion inserts s and o for edgename; 2)
>>     the rdf:singletonPropertyOf assertion inserts p for edgename. If
>>     well-formedness is broken, all optimization bets are off. Perhaps a
>>     dedicated skolemization scheme can be employed for some more control
>>     and/or "unstarring".
>>
>>     5. RDF-star syntax obviously needs no naming syntax; naming these
>>     would break well-formedness.
>>     6. What these *mean* of course needs a good definition (property
>>     specializations, edge type instances or similar). Are they asserted?
>>     Sure. Do they assert something using their rdf:singletonPropertyOf
>>     property as predicate? No. (Could they? Well, they can be declared
>>     ("inline") to *also* be subPropertyOf the same property, and through
>>     entailment that would happen.)
>>     7. Reifiers become a usage pattern (informative) as suggested from the
>>     property edge perspective. Any desired :reifiedBy or :partOf relation
>>     can link predicate singletons to one or more "reifiers".
>>
>>     Basic example:
>>
>>          << :s :p :o >> :source <stream662be7ba> ;
>>              :timestampMills 1714153402 .
>>
>>     Expands to:
>>
>>          :s _:e1 :o .
>>          _:e1 rdf:singletonPropertyOf :p ;
>>              :source <stream662be7ba> ;
>>              :timestampMills 1714153402 .
>>
>>     Annotation syntax:
>>
>>          :s :p :o {| :reifiedBy <#reifier> |} .
>>
>>     Expands to:
>>
>>          :s :p :o .
>>          :s _:e1 :o .
>>          _:e1 rdf:singletonPropertyOf :p ;
>>            :reifiedBy <#reifier> .
>>
>>     Possible singleton property entailment?:
>>
>>          _:e1 a rdf:SingletonProperty;
>>              rdf:subject :s ;
>>              rdf:prediate :p ;
>>              rdf:object :o .
>>
>>     Will entailment break well-formedness if (accidentally?) *put back*
>>     into a regular graph? Of course, just as RDF lists are "broken"
>>     whenever that happens (as in look terrible when serialized, make no
>>     sense when queried, etc.).
>>
>>     Best regards,
>>     Niklas
>>
>>     [1]:
>>     <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Apr/0158.html
>>     <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Apr/0158.html 
> <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Apr/0158.html>>>
>>     [2]: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/
>>     <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/ 
> <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/>>>
>>
>
Received on Thursday, 2 May 2024 14:01:04 UTC