Re: [External] : Future-proof modelling from Franconi Enrico on 2023-01-23 (public-rdf-star-wg@w3.org from January 2023)

From: Franconi Enrico <franconi@inf.unibz.it>
Date: Mon, 23 Jan 2023 17:57:42 +0000
To: Souripriya Das <souripriya.das@oracle.com>
CC: Pierre-Antoine Champin <pierre-antoine@w3.org>, RDF-star WG <public-rdf-star-wg@w3.org>
Message-ID: <E39ECF34-EE1E-452A-9C16-34CEFAD28B7F@inf.unibz.it>

Well, I guess that my argument nonetheless applies exactly to your Rich&Liz example.
—e.

On 23 Jan 2023, at 17:11, Souripriya Das <souripriya.das@oracle.com> wrote:


Sorry, by saying "Contrast the above ..." I probably created enough scope for confusion. The paragraph from my email that Enrico has cited was a follow on for my comments (in the immediately preceding paragraph) about "Example 2", not "Example 1." -- Souri.

________________________________
From: Pierre-Antoine Champin
Sent: Monday, January 23, 2023 11:08 AM
To: Franconi Enrico; Souripriya Das
Cc: RDF-star WG
Subject: Re: [External] : Future-proof modelling



On 23/01/2023 16:08, Franconi Enrico wrote:
Hi Souri,

Contrast the above with the type of situation I was trying to illustrate in my slides. There, the new data that is coming in is using the same properties with the same respective domains and ranges as before, but its arrival has caused the occurrence counts to go from one to greater-than-one for some of the properties (e.g., suppose that we just found out that :Taylor :married :Burton a second time -- something that never happened to the :married property before this). This addition became a reality in the world being modeled -- the data architect has no control over this. Such changes can and should be handled as seamlessly as possible -- pre-existing queries should retain their validity despite those changes. This is where named triples -- with support for both implicit and explicit names -- would come in handy.

I guess that the Example 1 by Pierre-Antoine is exactly about this.
In the relational model, you would have a table Person with a column Name (PK), a column Married (with FK to Person.name), and, say, a column Address.
You start by stating that the tuples <“Richard”, “Liz”, “Addr1"> <“Liz", “Richard”, “Addr2"> are in the table.
You then realise that there are two distinct occurrences of the marriage, and therefore you have to change the schema by deleting the column Married from Person, and adding a table Marriage with two attributes FKs to Person and additional attributes identifying each distinct marriage (e.g., the date of the wedding, and/or the marriage period, etc).
Of course, you could have a Marriage table since the very beginning, and use a “surrogate key” of this table to identify the distinct marriages, but this is possible in the relational model since it allows to model n-ary relations with n>2, and not by allowing multisets (bags). Bags are meaningful in the relational model ONLY if the origin of the multiplicity is known (e.g., the bag of salaries of each person in the original table, obtained by a projection from a bag-free relation of persons and their salaries). A table T with repeating tuples given a-priori is semantically indistinguishable from (SELECT DISTINCT * FROM T) in the relational model: you can't understand which “hidden” attribute would disambiguate the identity of the tuples (multiple marriages? multiple addresses? etc). In general, it is well known that bags destroy the basic principle of the relational model as a modelling language, which associates an identity to tuples in 5th normal form, like the above.
And I don’t believe we should assume in RDF that a graph can be a-priori a bag of triples.
—e.

what Enrico said! :)

Received on Monday, 23 January 2023 17:57:57 UTC