Re: [External] : Re: RDF star and LPGs from Souripriya Das on 2024-08-01 (public-rdf-star-wg@w3.org from August 2024)

From: Souripriya Das <souripriya.das@oracle.com>
Date: Thu, 1 Aug 2024 14:22:41 +0000
To: "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>
Message-ID: <CY5PR10MB6071EF6B5CE744A851FC5C51FAB22@CY5PR10MB6071.namprd10.prod.outlook.com>
After giving it some more thought, I felt that auto-generation of id for individual s-p-o triples that I was looking for is difficult to accommodate in a standards specification and decided not to pursue that idea.

So, at this point, among all the ideas that are being discussed, the ones that appeal to me can be summarized as follows:

  1.
An RDF graph in RDF1.1 is a set of <s,p,o> triples, but in RDF1.2, it may include a set of <s,p,o,id> 4-tuples as well.
  2.
Additionally, each such 4-tuple (present in an RDF graph) is either "asserted" or "reified" (and never both at the same time). So, the general form (of a 4-tuple) is: :id (rdf:asserts | rdf:reifies) <<( :s :p :o )>> .
  3.
The two sets – set of triples and set of 4-tuples – are independent of each other. Specifically, presence or absence of the 4-tuple <s,p,o,id>, whether "asserted" or "reified", does not in any way determine the presence or absence of an <s,p,o> triple in the RDF graph (and vice-versa).

Thanks,
Souri.
________________________________
From: Souripriya Das <souripriya.das@oracle.com>
Sent: Wednesday, July 31, 2024 10:42 AM
To: public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
Subject: Re: [External] : Re: RDF star and LPGs

This is a response to Gregg Kellogg's reply [3]  (which somehow had an unrelated subject for the message) to my last message [2].

Hi Gregg,

A) Regarding "superfluous":
======================
Here is the essence of the approach that I was trying to illustrate using the examples in my last email [1]: An RDF graph in RDF1.1 is a set of <s,p,o> triples, but in RDF1.2 it is a set of <s,p,o,id> 4-tuples. Additionally, each such 4-tuple (present in an RDF graph) is either "asserted" or "reified" (and never both at the same time). So, the general form is: :id (rdf:asserts | rdf:reifies) <<( :s :p :o )>> .

[Gregg]
I don’t see how associating an identifier with each triple helps. The many-to-one use case has multiple reifiers/identifiers for a given triple, and of course, many triples exist within the graph without being explicitly reified, making the identifier superfluous.
[Souri]

The id generated by foo(s,p,o) only need to exist logically because it can computed when needed. Only when something is said about a never-before-reified triple that a practical system will store the generated id. Looking at Example 2 of my earlier email [1]:
NOTE: The two forms :s :p :o . and foo(s,p,o) rdf:asserts <<( :s :p :o )>> . are equivalent in RDF1.2.
Example 2:
=========
    RDF1.1 => :s :p :o .
    LPG => (s) -[:p]-> (o) with edge id computed using value of foo(s,p,o).
    RDF1.2 => foo(s,p,o) rdf:asserts <<( :s :p :o )>> .
Due to the equivalence mentioned there, when an :s :p :o triple is inserted into a practical system, it  will normally store it simply as=> :s :p :o . – NOT the full form=> foo(s,p,o) rdf:asserts <<( :s :p :o )>> . However, when something is said about this triple (in RDF1.2), its id will be materialized (and used wherever this triple is referenced). So, if out of 500 s-p-o triples only 5 are referred in other triples, only those 5 will have their identifiers materialized and stored.

B) Regarding "many-to-one use case has multiple reifiers/identifiers for a given triple":
===========
Not sure I understand the point you are raising here. Here is a slight variation of Example 7 (see [1], but here :r1 rdf:asserts instead of rdf:reifies) that shows a many-to-one situation: both foo(s,p,o) and :r1 assert the same triple structure and :r2 reifies it. It is easy to see that even many-to-many can be supported.
Example 7:
=========
...
  RDF1.2 =>
     foo(s,p,o) rdf:asserts <<( :s :p :o )>> .# likely will be stored as :s :p :o . because nobody is referring to it (yet)
     :r1 rdf:asserts <<( :s :p :o )>> .
     :r1 :prop :val .
     :r2 rdf:reifies <<( :s :p :o )>> .
     :r2 :prop2 :val2 .

C) Regarding "A foo(s,p,o) function to generate an identifier suffers from the very problem that RDF Dataset Canonicalization addresses":
===================
RDF1.1 is simple because it uses only a single kind of artifact – triples. IMO, RDF1.2 should retain that simplicity by using only a single kind of artifact – 4-tuples. Trying to have two kinds of artifacts – "type" triples and "occurrence" of "type" triples – takes away that simplicity.

The use of a foo(s,p,o) function to generate the id for an (RDF1.1-style) triple is suggested only for backward compatibility (e.g., bringing RDF1.1 data into RDF1.2) and to avoid, whenever possible, burdening the data creators with the task of creating an id for such a 4-tuple. The id in every 4-tuple, whether explicitly created by user or generated as foo(s,p,o), can be used as subject or object in another 4-tuple if needed, as long as no direct (e.g., <id, id, p, o> ) or indirect (e.g., <id1, id2, p1, o1>, <id2, id1, p2, o2>) circularity is involved.

[Gregg]
A foo(s,p,o) function to generate an identifier suffers from the very problem that RDF Dataset Canonicalization addresses, because blank nodes are not sufficiently stable to use to create a unique identifier. There may be two triples with the same subject and predicate but different blank node objects, which you can’t really distinguish from each other without examining the context of other triples using the same blank node.
[Souri]

Not sure I fully understood your point, but let us consider this sample data in N-Triple:
    foo(:s,:p,_:b1) rdf:asserts <<( :s :p _:b1 )>> .
    foo(:s,:p,_:b2) rdf:asserts <<( :s :p _:b2 )>> .

The two generated identifiers will indeed be different because they have different arguments for the foo() function call. If due to some canonicalization we figure out that _:b1 owl:sameAs _:b2, that has nothing to do with the id generation that was done based on the N-Triple input. Of course if we want to obtain the same id after canonicalization that unifies  _:b1 and _:b2 to say _:b, we could delete the two affected original triples and re-insert them (which would collapse them into a single triple => foo(:s,:p,_:b) rdf:asserts <<( :s :p _:b )>> .).

Thanks,
Souri.

[1] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0152.html<https://urldefense.com/v3/__https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0152.html__;!!ACWV5N9M2RV99hQ!MArDGrXbK8JoNSbuv4HUTyf9ZUeE7iXqpXkNaT3s6DwXEmo7tfrwkGn473HklZBxfzMubLsrLtM07tIi5c3NCuj_4jvD$> (Souri Das, Sun, 28 Jul 2024 16:28:04 +0000)
[2] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0155.html<https://urldefense.com/v3/__https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0155.html__;!!ACWV5N9M2RV99hQ!MArDGrXbK8JoNSbuv4HUTyf9ZUeE7iXqpXkNaT3s6DwXEmo7tfrwkGn473HklZBxfzMubLsrLtM07tIi5c3NCpoc-xmr$> (Souri Das, Tue, 30 Jul 2024 20:14:58 +0000)
[3] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0156.html<https://urldefense.com/v3/__https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0156.html__;!!ACWV5N9M2RV99hQ!MArDGrXbK8JoNSbuv4HUTyf9ZUeE7iXqpXkNaT3s6DwXEmo7tfrwkGn473HklZBxfzMubLsrLtM07tIi5c3NCh7PNS0F$> (Gregg Kellogg, Tue, 30 Jul 2024 15:43:27 -0700)

________________________________

________________________________
From: Souripriya Das <souripriya.das@oracle.com>
Sent: Tuesday, July 30, 2024 4:14 PM
To: public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
Subject: Re: [External] : Re: RDF star and LPGs

Here is the essence of the approach that I was trying to illustrate using the examples in my last email [1]: An RDF graph in RDF1.1 is a set of <s,p,o> triples, but in RDF1.2 it is a set of <s,p,o,id> 4-tuples. Additionally, each such 4-tuple (present in an RDF graph) is either "asserted" or "reified" (and never both at the same time). So, the general form is: :id (rdf:asserts | rdf:reifies) <<( :s :p :o )>> .

RDF1.1 is simple because it uses only a single kind of artifact – triples. IMO, RDF1.2 should retain that simplicity by using only a single kind of artifact – 4-tuples. Trying to have two kinds of artifacts – "type" triples and "occurrence" of "type" triples – takes away that simplicity.

The use of a foo(s,p,o) function to generate the id for an (RDF1.1-style) triple is suggested only for backward compatibility (e.g., bringing RDF1.1 data into RDF1.2) and to avoid, whenever possible, burdening the data creators with the task of creating an id for such a 4-tuple. The id in every 4-tuple, whether explicitly created by user or generated as foo(s,p,o), can be used as subject or object in another 4-tuple if needed, as long as no direct (e.g., <id, id, p, o> ) or indirect (e.g., <id1, id2, p1, o1>, <id2, id1, p2, o2>) circularity is involved.

Regarding conversion between LPG and RDF1.2: LPG, although popular, is far less expressive than RDF1.2. It is sufficient for us to ensure that every LPG graph can be easily modeled as an RDF graph in RDF1.2. The opposite conversion – RDF1.2 to LPG –  is not always going to be straightforward because LPG has several restrictions: 1) it does not allow an edge to connect a vertex and an edge, or connect two edges; 2) it does not support unasserted (reified) edges; and 3) multiple edges in an LPG cannot share the same edge-id.

Thanks,
Souri.

[1] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0152.html<https://urldefense.com/v3/__https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0152.html__;!!ACWV5N9M2RV99hQ!Jl29YIOD4b3cm45yn0PKxPz2kf0XZMrqlJdrSLULjOOISFctotwmOS2FO1EX4qHN8CKW2E4hb1uIOezM5ecxWog8nuXN$>
________________________________
From: Thomas Lörtsch <tl@rat.io>
Sent: Tuesday, July 30, 2024 6:47 AM
To: Souripriya Das <souripriya.das@oracle.com>; Gregory Williams <greg@evilfunhouse.com>
Cc: public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
Subject: Re: [External] : Re: RDF star and LPGs

Hi Souri, Greg,

I think the discussion so far shows that everythig *can* be done, but probably not everything *should* be done, at least not as a default arrangement. I take it that every edge in LPG has its own identifier, even if it’s not annotated. So LPG is multiset all the way. RDF on the other hand is based on sets, and when we introduce an annotation mechanism which quite necessarily ventures into multiset territory, we should take care not to overdo it. After all, those annotations are triples too and nobody expects them to be heavily annotated as well (it might/can/should_be_possible_to happen, with nested annotations, but it will probably not be the norm).


Souri,

your use of foo(s,p,o) has a problem: the current baseline proposal talks about occurrences all the way, and my recent proposal [2] is no different - neither for asserted/stated nor for reified/described statements. The only thing that can meaningfully be identified by a function of (s,p,o) is the asserted triple without annotations - but does that need an identifier? Rather not, at least not on the RDF side. So the foo(s,p,o) might be useful to generate edge identifiers for mapping a (unannotated) RDF triple to LPG, but why should we define the way that LPG edge identifiers are created? Whenever we need to create an identifier on the RDF side, it will be for an occurrence (again, either asserted/stated or reified/described) and I guess the existing mechanisms to mint blank nodes do that just fine.

OTOH, depending on how I read your mail, I might get the impression that you advocate to turn all triples into triple+id’s, but I don’t think that’s a good idea - see the first paragraph.
Also, IIUC, on the LPG side we do not only have edges (which each have an id, no matter if attributed or not) but also attributes to edges and nodes, which have no identifiers and could easily be mapped to standard RDF triples. Sorry if I’m stating the obvious here, but to me it wasn’t, and it changes the big picture quite a bit...

To map LPG to RDF it seems like a good idea to map all edges, annotated or not, to an annotated triple in RDF, the annotation(s) being comprised of at least the edge identifier.
LPG attributes however would be mapped to standard RDF triples.
That RDF-constructed-from-LPG data should be easy to map back to LPG.
So roundtripping when starting from LPG doesn’t seem to pose much problems w.r.t edge id’s (I’m of course glossing over all the potemtial problems with datatypes, because I know nect to nothing about that).

However, the situation is more intricate when going the other way and starting from RDF. To map native RDF to LPG either nothing has to be done - welcome to RDF triple soup ;-) - or the RDF has to be modeled in a way that easily translates into primary objects, primary relations between those primary objects, and secondary attributes on primary objects and their relations. So either the RDF is analyzed and design patterns, shapes and rules are employed to extract those primary objects and their relations, or the RDF data is modelled in an LPG conformant way, with annotated relations representing the main knowledge structure and simple triples being interpreted as mere attributes. That would indeed require that triples which are not attributed but are considered to be part of the primary knowledge structure, get annotated with at least a hint that they are "primary". That would then close the circle to the first paragraph above and define when a triple, although not annotated, should be represented by an occurrence (stated/asserted).


Greg,

the baseline proposal is too deficient to be a useful basis for this discussion so I’ll concentrate on my proposal from last Tuesday [2]. As I said above we should try not to mess with the set semantics of RDF more than necessary. I’m not completely sure yet how to do that, but in my proposal I sketched the following approach: stated triple terms do entail the triple they describe (and state), i.e. when you write

    _:a1 rdf12:states <<( :s :p :o )>>

then ':s :p :o' becomes part of the graph. However, how that entailment is put into practice is another question.

A) One solution is to add the triple to the store, i.e. the above '_:a1 rdf12:states <<( :s :p :o )>>' is equivalent to

    _:a1 rdf12:states <<( :s :p :o )>>
    :s :p :o .

That can be called entailment, but it can also be called "macro", as it can be reduced to a very simple and basically syntactic operation.
This approach misses the distinction between an asserted but not annotated triple and an annotated triple of the same type. I guess that is not a problem for people used to RDF, and I *suspect* it won't become one in the future, but it is a problem for LPG users. However, I made a proposal above in the part directed at Souri, to annotate such triples as "annotated non-annotations" (the oxymoron raising its ugly head again ;-) as e.g. "primary". Any simple label would do and that should solve that problem at least on the LPG side.

B) Another solution is to employ a combination of techniques:
- make sure that SPARQL queries stated terms just as well as simple triples
- let upper levels of the semantic web stack know via an axiom that the stated term entails the triple.
- support stated and described terms with different types of syntactic sugar. The  annotation syntax would be a good representation of the stated terms.

AFAIKT solution B has the edge w.r.t. expressivity, as it never mixes stated terms with simple triples. However it asks quite a lot from implementations, and it is not backwards compatible, as mappings to RDF 1.0/1 will have to introduce the simple triple nonetheless, as in solution A.


To conclude, I think we can make the bridge pretty solid if we push RDF as in B. If we stay with the a more down-to-earth version of RDF as in A we can at least make it solid enough to be usable, and pretty well usable in combination with some simple modelling techniques that will probably be needed anyway given the profound differences of how both languages approach modelling.


Some more complete examples that contain say two or three attributed objects and some relations - attributed or not - between them might be helpful.


Best,
Thomas



> On 28. Jul 2024, at 18:28, Souripriya Das <souripriya.das@oracle.com> wrote:
>
> Gregory Williams said in an earlier message [1]: " ...to get a good mapping, you’d have to enforce the use of reifiers on every triple, whether or not there are any annotations attached to the reifier." I agree, but, in the following, I would use the term "identifier" (or, id) instead of "reifier" to cover asserting as well.
>
> I also agree with the theme of Thomas' arguments in his last several messages, including [2] , where he talks about the need for supporting both "stated" and "described" triples. In the examples below, I have used rdf:asserts instead of rdf12:states and rdf:reifies instead of rdf12:describes.
>
> Since every edge in LPG is concrete (asserted) and each has a unique id, we could convert an edge in LPG to an RDF1.2 triple as follows:
> Example 1:
> =========
>     LPG => (s) -[:p]-> (o), with edge-id e
>     RDF1.2 => :e rdf:asserts <<( :s :p :o )>> .
>
> In general, I'd posit that every RDF1.1 (asserted) triple and every edge in LPG can be modeled in RDF1.2 as an rdf:asserts triple with an auto-generated resource as its id. So, below I assume that there is a function "foo" that maps each (s,p,o) to a unique id (and it is a one-to-one correspondence between identifiers and triple-terms). As to how to make foo() invisible to the users, I am leaving out that aspect from here.
>
> RDF1.1 supports asserted triples only. Converting them to LPG and RDF1.2 can be done as follows:
> NOTE: The two forms :s :p :o . and foo(s,p,o) rdf:asserts <<( :s :p :o )>> . are equivalent in RDF1.2.
> Example 2:
> =========
>     RDF1.1 => :s :p :o .
>     LPG => (s) -[:p]-> (o) with edge id computed using value of foo(s,p,o).
>     RDF1.2 => foo(s,p,o) rdf:asserts <<( :s :p :o )>> .
>
> Constraint in RDF1.2: The same "id" and triple-term pair cannot be related by both rdf:asserts as well as rdf:reifies at the same time. So, the following combination is invalid.
> Example 3: invalid combination
> ==========================
>     foo(s,p,o) rdf:asserts <<( :s :p :o )>> .
>     foo(s,p,o) rdf:reifies <<( :s :p :o )>> .
>
> SPARQL results for a given query remains unchanged:
> =========
> - The following original SPARQL1.1 query pattern { ?s ?p ?o }  will still return the same results even when executed against the RDF1.2 data (by internally converting the pattern to foo(?s,?p,?o) rdf:asserts <<( ?s ?p ?o )>> ).
> - A new SPARQL query pattern, { ?e rdf:asserts <<( ?s ?p ?o )>> }, will, for each solution, return the same values for ?s, ?p, and ?o as does the above query, but will return a value for ?e as well.
>
> Roundtrip RDF1.1 -> RDF1.2 -> RDF1.1 works as follows:
> Example 5:
> =========
>     RDF1.2 => foo(s,p,o) rdf:asserts <<( :s :p :o )>> .
>     RDF1.1 => :s :p :o .
>
> What if annotations got added in RDF1.2 version, as shown below? Can we convert such expanded expanded content to an equivalent representation in RDF1.1? Yes, but only with use of reification.
> Example 6:
> =========
>   RDF1.2 =>
>     foo(s,p,o) rdf:asserts <<( :s :p :o )>> .
>     foo(s,p,o)  :prop :val .
>   RDF1.1 =>
>      :s :p :o .
>      foo(s,p,o) a rdf:Statement; rdf:subject :s ; rdf:predicate :p ; rdf:object :o .
>      foo(s,p,o) :prop :val .
>
> Another interesting example of RDF1.1 to RDF1.2 conversion:
> Example 7:
> =========
>   RDF1.1 =>
>     :s :p :o .
>     :r1 a rdf:Statement; rdf:subject :s ; rdf:predicate :p ; rdf:object :o .
>     :r1 :prop :val .
>     :r2 a rdf:Statement; rdf:subject :s ; rdf:predicate :p ; rdf:object :o .
>     :r2 :prop2 :val2 .
>   RDF1.2 =>
>      foo(s,p,o) rdf:asserts <<( :s :p :o )>> .
>      :r1 rdf:reifies <<( :s :p :o )>> .
>      :r1 :prop :val .
>      :r2 rdf:reifies <<( :s :p :o )>> .
>      :r2 :prop2 :val2 .
>
> Thanks,
> Souri.
>
> [1] https://urldefense.com/v3/__https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0112.html__;!!ACWV5N9M2RV99hQ!IXtYwH1lEv2EfW65Fg4Ma6XgPD3GRjuEjo1hnwOuXGFB2sezDC-wGaFlryull2kkN19gjKOXdzC5$  (Gregory Williams, Wed, 24 Jul 2024 21:17:03 -0700)
> [2] https://urldefense.com/v3/__https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0109.html__;!!ACWV5N9M2RV99hQ!IXtYwH1lEv2EfW65Fg4Ma6XgPD3GRjuEjo1hnwOuXGFB2sezDC-wGaFlryull2kkN19gjCaagtlR$  (Thomas, Tue, 23 Jul 2024 14:24:45 +0200)
>
> From: Andy Seaborne <andy@apache.org>
> Sent: Thursday, July 25, 2024 10:52 AM
> To: public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
> Subject: [External] : Re: RDF star and LPGs
>
>
>
> On 25/07/2024 05:17, Gregory Williams wrote:
>
> The more I look at this, the more I think that to get a good mapping, you’d have to enforce the use of reifiers on every triple, whether or not there are any annotations attached to the reifier. Triples without corresponding reifiers would simply be invisible in LPG data. And reifiers without corresponding asserted triples would have to be invisible as well.
>
> thanks,
> .greg
>
>
> Let's refine exactly what we're trying for.
>
> I don't think the WG objective is to define a systematic or universal mapping from LPG to RDF. For any given data, a mapping may take into account the LPG context, such as written documentation about the data, to give a mapping that captures the usage and intent of the LPG data. There may even be several mappings if the mapping enriches the data.
>
> We should show that mappings from a specific LPG data domain to publishable RDF are possible.
>
> Secondly, automatic round-tripping is not a requirement. A reverse mapping from RDF to LPG can be based on an understanding of the forward mapping, taking into account unreified triples that are generated in the forward direction. It would be nice if there is a unique/canonical/preferred reverse mapping; it's not a requirement though.
>
> A complete solution to LPG<->RDF is a working group in its own right.
>
>     Andy
Received on Thursday, 1 August 2024 14:22:51 UTC