Future-Proofing for Transition to Multi-Edge caused by Data Arrival

[Wrote up the following because this, compared to PDF slide deck, is easier to discuss in emails.]

Consider the data for post-2000 US presidential terms. We will show below that even if new data arrival causes transition to multi-edge, pre-existing queries remain valid when using RDFn/SPARQLn, unlike in the case of RDF-star/SPARQL-star use.

The main point to note is that RDF-star's lack of explicit naming capability forces changes in schema used for storing data even though the arriving data uses pre-existing properties with same domains and ranges as before. The difference in mechanics in handling transition to multi-edge is as follows: 1) RDF-star associates a new id (similar to an explicit name) with the original triple using a new :occurrenceOf triple (thus changing the storage structure), whereas 2) RDFn creates a new named triple which has a new unique name and whose triple component is a copy of the original triple.

RDF-star:
=======

Initial Data (no shortcuts, to avoid hiding important details):
------------------------------------------------------------------------------
    :Bush :servedAs :President .
    << :Bush :servedAs :President >> :index 43 .
    :Obama :servedAs :President .
    << :Obama :servedAs :President >> :index 44 .
    :Trump :servedAs :President .
    << :Trump :servedAs :President >> :index 45 .
    :Biden :servedAs :President .
    << :Biden :servedAs :President >> :index 46 .

Initial SPARQL-star query to retrieve the presidents, ordered by the presidential index:
-----------------------------------------------------------------------------------------------------------------
    SELECT ?x {
      ?x :servedAs :President .
      << ?x :servedAs :President >> :index ?idx
    } ORDER BY ?idx

The expected output is:
------------------------------
    :Bush
    :Obama
    :Trump
    :Biden

Let's say that in 2024 Trump wins again. How will the data get represented in RDF-star? Since RDF-star, like RDF, has "no duplicate triples" constraint, the data creator has to extend the schema to introduce a new structure that involves a new property, :occurrenceOf. Since the above query was written without expecting the new structure, it has to be modified as well.

Final RDF-star Data (after Trump gets a second term) would include the following extra triples (:trump47 is used as the new occurrence id):
------------------------------------------------------------------------
[Note: A variation, not shown here, would be to add two :occurrenceOf triples with say :trump45 and :trump47 as the new occurrence ids and so on. In that case the new SPARQL-star query below would need to be created using the OPTIONAL clause instead of the UNION clause-based formation shown below.]

    :trump47 :occurrenceOf << :Trump :servedAs :President >> .
    :trump47 :index 47 .

Final SPARQL-star Query:
-----------------------------------
    SELECT ?x {
      { ?x :servedAs :President .
        << ?x :servedAs :President >> :index ?idx }
      UNION
      { ?occ :occurrenceOf << ?x :servedAs :President >>
        ?occ :index ?idx
      }
    } ORDER BY ?idx

The new expected output is:
-------------------------------------
:Bush
:Obama
:Trump
:Biden
:Trump

RDFn:
====

Initial Data (no shortcuts, to avoid hiding important details):
-----------------------------------------------------------------------------
[Note: uses placeholders that exclusively use the form rdfn:..., scoped to input dataset only, for the implicit names]

    :Bush :servedAs :President | rdfn:id1 .
    rdfn:id1 :index 43 .
    :Obama :servedAs :President | rdfn:id2 .
    rdfn:id2 :index 44 .
    :Trump :servedAs :President | rdfn:id3 .
    rdfn:id3 :index 45 .
    :Biden :servedAs :President | rdfn:id4 .
    rdfn:id4 :index 46 .

Initial SPARQLn Query:
------------------------------
SELECT ?x {
?x :servedAs :President | ?n .
?n :index ?idx
} ORDER BY ?idx

Final RDFn Data (after Trump gets a second term) would include the following extra triples (:trump47 is used as the new explicit name):
-------------------------------------------------------------------

    :Trump :servedAs :President | :trump47 .
    :trump47 :index 47 .

Final SPARQLn Query:
------------------------------
    <same as initial SPARQLn query>

My goal here was to show that there is a problem as far as future-proofing for multi-edge transitions is concerned and we should try to address it if possible because it is important for avoiding costly query rewrites and possible application downtimes.

Thanks,
Souri.

Received on Thursday, 26 January 2023 14:06:31 UTC