Re: Naming triples

Hi Eric,

Thanks for your input and example!

(The following is my take on where the WG is at in relation to your
example. But as it also reflects my own opinion within the WG, others may
disagree.)

What you are doing is a perfectly valid approach. But it has the generally
major drawback that you now cannot put those quads in a named graph. Thus,
you have to decide on either using the fourth position to name triples (or
sets of them), or to partition a dataset into administrative units.

*If* we could relate graphs, say by stating that one is "owned" by another,
then it *might* be doable. But we spent the latter part of 2023 exploring
ways to do that, and it was deemed impractical, due to 1) lack of named
graph semantics (it is *outside* of the formal interpretation), and 2)
adding semantics and options here without stepping on existing uses of
named graphs, particularly for security and access control, would be
complex if even possible, and 3) could even if so take years to see
possible uptake of in quad stores. It would also 4) be far beyond the
charter of this working group. That does not mean that we don't have a
responsibility to avoid adding more confusion to the mix though -- whatever
is added should be explainable *in relation to* practices for named graphs.

We have also basically agreed that the use cases we must cater for all are
about *some kind* of occurrences
(reifications/qualifications/tokenizations) of triples. (That is essential
for mapping to LPGs, for instance, who are decidedly "instances of an edge
type", or "multisets", in some descriptions thereof.) So we won't (if all
goes well, IMHO) allow triple terms themselves as subjects, since triples
themselves are the atomic, logical axioms we model a particular domain of
discourse with, and do not denote any particular underlying circumstance.
Such circumstances (which can be anything, including but not limited to the
statement tokens of classical reification) are always "reifying" them; so
we call those "reifiers".

Therefore we at the beginning of 2024 tentatively agree that e.g. this:

    << _:Person__1 ex:hasPhoneNumber _:11111111111 >> pr:source _:DHS .

Would be shorthand for this (a blank node unless named):

    _:b rdf:reifies <<( _:Person__1 ex:hasAddress _:a )>> .
    _:b pr:source _:DHS .

Where the triple term itself is only allowed in the object position, and is
in some ways a "literal-like" three-tuple. (The choice of <<( ... )>> is to
distinguish these from the sugar, which has been kept to do minimal
modification at this stage, and stay compatible with the RDF-star CG
syntax.)

There is also the additional annotation sugar:

    _:Person__1 ex:hasPhoneNumber _:11111111111 {| pr:source _:DHS |} .

Which means the above plus the assertion itself (the triple being in the
graph), i.e.:

    _:Person__1 ex:hasPhoneNumber _:11111111111 .
    _:b rdf:reifies <<( _:Person__1 ex:hasPhoneNumber _:11111111111 )>> .
    _:b pr:source _:DHS .

But this agreement did not hold. We are now debating whether or not the
fact that it caters for more variations is a good thing, such as this:

    _:b rdf:reifies <<( _:Person__1 ex:hasAddress _:a )>> .
    _:b rdf:reifies <<( _:a :streetAddress "SomeStreet 1" )>> .
    _:b rdf:reifies <<( _:a :city <SomeCity> )>> .

    _:b pr:clearance  _:UNCLASSIFIED ;
        pr:source  _:DHS ;
        pr:likelihood 0.8 ;
        pr:dataSet _:someDataSet ;
        pr:sourceRecord _:PERSON ;
        pr:sourceRecordID 329 .

Or if that is a problem. It has been deemed, by some, a problem since LPGs
cannot handle that, and possibly since classical reification was decidedly
one reifier per statement. But the important question is whether or not it
*makes sense*. In general reification (e.g. in UML and some forms of N-ary
relations) and philosophy (see "truth-makers" [1] [2]) it appears to do so.
A third objection is that this blurs the line between reification and named
graphs.

But IMHO your example is actually a good example of why perhaps we *should*
"blur" that line. Since named graphs, as per above, cannot be used
simultaneously for recording detailed sub-graph provenance *and* be outside
of interpretation, as they are today, we have the opportunity to address
this shortcoming with this. (Named graphs are then, effectively, mostly for
working with dataset management (such as in quad stores and the LDP); with
the upshot that such a mechanical, "opaque" treatment is critical for
access control (secured private graphs, etc.)).

Another, related, contentious issue is whether triple terms should
themselves be "opaque" or not (or if the same triple could be either opaque
and transparent, such as *in a way* is the case with graphs -- since
they're *outside* of semantics, you can decide whether to interpret them or
just treat them as sets of triples). It is still unclear if this opacity is
*needed* (to avoid entailment) or a nice-to-have (if so, you could instead,
for instance, record in a reifier representing a statement token exactly
what syntactic form was used, but using your own terms, not any particular
syntactic sugar or datatype semantics thereof).

Again, thanks for your input! If you have further practical use cases for
triple provenance, please share what you can. And if through these you have
any opinion also on the "reifier for multiple triples" and "opacity"
questions, please let us know.

(Aside: Please note that there is an issue with using pipes for naming the
reifier [3], at least for the annotation form; so don't rely on it yet
other than for discussion.)

Best regards,
Niklas

[1]: <
https://www.researchgate.net/publication/325995356_Reification_and_Truthmaking_Patterns
>
[2]: <https://plato.stanford.edu/entries/truthmakers/>
[3]: <https://github.com/w3c/rdf-star-wg/issues/116>


On Tue, Jun 18, 2024 at 8:46 AM Peterson, Eric L. [US-US] <
Eric.L.Peterson@leidos.com> wrote:

> I live happily without RDF named graphs by constructing provenance
> networks which are much richer than RDF graphs.  And they are optimizable.
> I claim we should put the optimization  burden on the SPARQL query tool
> before we complicate RDF - like named graphs did and like the RDF-star
> proposal threatens to do.
>
> Please consider quads, my friends.
>
> Just add a new SPARQL term called *EDGE*.  It would be completely
> synonymous with *GRAPH. *
>
> The fourth member of the quad is a terrible thing to waste.
>
>
>
> ------------------------------
> *From:* Peterson, Eric L. [US-US] <Eric.L.Peterson@leidos.com>
> *Sent:* Monday, June 17, 2024 3:26 PM
> *To:* public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
> *Subject:* Re: Naming triples
>
> I have been naming my triples for many years.
>
> I just use the fourth member of the quad for the triple name (URI):
>
> ------------------------------
> *From:* Peterson, Eric L. [US-US] <Eric.L.Peterson@leidos.com>
> *Sent:* Monday, June 17, 2024 3:04 PM
> *To:* public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
> *Subject:* Naming triples
>
> Hi folks;
>
> Thanks for working on getting us something better than triple reification
> for edge metadata!
>
> Would I be justified in being very disappointed in a spec that didn't
> allow me to name triple terms?
>
> For simplicity below, I didn't model this example the way I would at
> work.  But look at all the notational duplication.  Can we have a spec that
> allows that naming of triple terms and the subsequent referencing of the
> name in place of a triple term?
>
> I'm very new to RDF-star/SPARQL-star.  Please forgive me if I've missed
> some way around this issue.
>
> <<_:Person__1 ex :hasPhoneNumber _:11111111111>> pr:clearance
> _:UNCLASSIFIED .
> <<_:Person__1 ex :hasPhoneNumber _:11111111111>> pr:source  _:DHS .
> <<_:Person__1 ex :hasPhoneNumber _:11111111111>> pr:likelihood 0.8
> <<_:Person__1 ex :hasPhoneNumber _:11111111111>> pr:dataSet _:someDataSet .
> <<_:Person__1 ex :hasPhoneNumber _:11111111111>> pr:sourceRecord _:PERSON .
> <<_:Person__1 ex :hasPhoneNumber _:11111111111>> pr:sourceRecordID 329 .
>
>
> Thanks!
>
> -Eric
>

Received on Tuesday, 18 June 2024 10:08:03 UTC