Re: different kinds of occurrences (Re: "Multi-Edge Support in RDFn" slides) from souripriya.das@oracle.com on 2022-12-15 (public-rdf-star-wg@w3.org from December 2022)

From: <souripriya.das@oracle.com>
Date: Thu, 15 Dec 2022 09:00:38 -0500
To: public-rdf-star-wg@w3.org
Message-ID: <16f06ac6-c6f3-0685-00dc-2f2991a0e0cb@oracle.com>
Hi Pierre-Antoine,

You have provided great examples for showing sets of properties for 
different "kinds of occurrence" of the same s-p-o. Using an extra triple 
with properties equivalent of :occursAs or :occurrenceOf is always an 
option for the data creator. Use of named triples (with explicit 
naming), on the other hand, provides an additional option that can often 
be more compact and efficient and can avoid any need to rewrite 
pre-existing queries even as data evolves. Below I show two ways of 
doing your example using RDFn. At the end, I also show, using an 
example, the benefits of using RDFn when data may involve multi-edges.

*RDFn: shorter version*: (after reducing two-hop paths like :mention/:in 
to single edges like :mentionIn)

      << :alice :workingFor :acme >> | (:m1, :m2, :c1, :c2, :s1, :s2) .

      :m1 :mentionIn <file1.nt> ; mentionLine 12.
      :m2 :mentionIn <file2.nt> ; :mentionLine 34.

      :c1 :claimBy :alice ; :claimAt 
""2022-11-10T12:34:56Z"^^xsd:dateTimestamp.
      :c2 :claimBy :charlie ; :claimAt 
""2022-12-09T01:23:45Z"^^xsd:dateTimestamp.

      :s1 :situationStartDate "2020-01-02"^^xsd:date ; :situationEndDate 
"2021-03-04"^^xsd:date.
      :s2 :situationStartDate "2022-05-06"^^xsd:date.

*RDFn: longer version*: (this is similar in structure to the :mention, 
:claim, and :situation property use in your formulation but uses 
rdf:type instead)

      << :alice :workingFor :acme >> | (:m1, :m2, :c1, :c2, :s1, :s2) .

      :m1 a :mention ; :in <file1.nt> ; :line 12.
      :m2 a :mention ; :in <file2.nt> ; :line 34.

      :c1 a :claim ; :by :alice ; :at 
""2022-11-10T12:34:56Z"^^xsd:dateTimestamp.
      :c2 a :claim ; :by :charlie ; :at 
""2022-12-09T01:23:45Z"^^xsd:dateTimestamp.

      :s1 a :situation ; :startDate "2020-01-02"^^xsd:date ; :endDate 
"2021-03-04"^^xsd:date.
      :s2 a :situation ; :startDate "2022-05-06"^^xsd:date.

*Coming back to the main benefit of using named triples (supporting 
multi-edge)*, it may be understood as follows. When creating data 
involving a given property say :p

  * one does not need to know whether :p will ever be involved in a
    multi-edge -- even if it does, we are able to avoid changing how the
    data is structured (and that prevents query invalidation even as :p
    goes from no-multi-edge to multi-edge)
  * if we already have the data but :p is involved in a multi-edge only
    in small numbers compared to the overall data size, the impact of
    multi-edge presence is minimal on both data size and query complexity

Example: Representing the complete data for US Presidents that shows 
only one (Grover Cleveland) among the 45 past US Presidents has served 
two non-consecutive terms.

RDFn: needs only (45 + 1=) 46 triples, not counting the :startYear and 
:endYear triples

     :Washington :servedAs :POTUS {| :startYear 1789; :endYear 1797 |}
     :Adams :servedAs :POTUS {| :startYear 1797; :endYear 1801 |}
     ...
     :Cleveland :servedAs :POTUS ( :term1, :term2 ) . :term1 :startYear 
1885; :endYear 1889 . :term2 :startYear 1893; :endYear 1897 .
     ...
     :Trump :servedAs :POTUS {| :startYear 2017; :endYear 2021 |}

If we use the :occursAs or :occurrenceOf form, to represent the same 
facts (in a uniform structure), it would take (45 * 2=) 90 triples, not 
counting the :startYear and :endYear triples.

Thanks,
Souri.

 >>Hi Souri,



 >>Another remark about the presentation :


 >>in slides 6, you point out that RDF-star requires an extra predicate 
to link between the triple and the IDs of its occurrences (:occursAs in 
your example), while in RDFn there is no need for such an extra predicate.


 >>I would argue that this extra predicate is actually desirable:


 >>multiple "occurrences" of the same triple can be used to model a 
large range of /different/ things. E.q.


 >>     # the same triple being mentionned in different sources:


 >>     << :alice :workingFor :acme >> :mention :m1, :m2.
 >>     :m1 :in <file1.nt> ; :line 12.
 >>     :m2 :in <file2.nt> ; :lin 34.


 >>     # the same claim being made by different people:


 >>     << :alice :workingFor :acme >> :claim :c1, :c2.
 >>     :c1 :by :alice ; :at ""2022-11-10T12:34:56Z"^^xsd:dateTimestamp.
 >>     :c2 :by :charlie ; :at ""2022-12-09T01:23:45Z"^^xsd:dateTimestamp.


 >>     # the same situation happening at different time:


 >>     << :alice :workingFor :acme >> :situation :s1, :s2.
 >>     :s1 :startDate "2020-01-02"^^xsd:date ; :endDate 
"2021-03-04"^^xsd:date.
 >>     :s2 :startDate "2022-05-06"^^xsd:date.



 >>so the extra predicate is important to explicitly indicate what 
"kind" of occurrence we are talking about.


 >>   pa
Received on Thursday, 15 December 2022 14:01:01 UTC