Future-proof modelling from Pierre-Antoine Champin on 2023-01-20 (public-rdf-star-wg@w3.org from January 2023)

From: Pierre-Antoine Champin <pierre-antoine@w3.org>
Date: Sat, 21 Jan 2023 00:27:42 +0100
To: Souripriya Das <souripriya.das@oracle.com>
Cc: RDF-star WG <public-rdf-star-wg@w3.org>
Message-ID: <ebdbdff5-42b5-f91c-fc8c-4ad24c570e1a@w3.org>

Dear Souri,

I wanted to react to your presentation during the RDF-star call 
yesterday, especially about the "future proof modelling" argument.

Consider the following two examples:

Example 1: consider a relational data model with a table Person and a 
table Company. The table Person contains a column "woksFor", that is a 
foreign key to Company. At some point, we need to represent the fact 
that a given person works for two different companies at the same time. 
Currently, this requires changing the model (replacing the column 
Person.worksFor by a new table WorksFor with 2 foreign keys to Person 
and Company).
Following your logic at the extreme, this would be an argument to extend 
the relational model to allow multiple values in a column, so that this 
use-case could be accommodated without changing the original model.
This would make the relation model much more complex, and would probably 
not be worth it.

Example 2: consider an RDF graph where a property :postalAddress has 
domain :Person and range xsd:string. This is all very well, until 
someone wants to describe addresses themselves (separate their different 
"fields", link them to an entity of type city rather than a city name, 
add geo-coordinates to an address...). This would require a change in 
the model, where :postalAdress now points to an IRI or blank node, which 
would carry original string in its rdf:value property, but could carry 
additional properties as well.
Following your logic at the extreme, someone could argue in favor of 
allowing string literals in the subject position, so that they could add 
properties to the "address string" without changing the original model.
This would be a very bad idea, because it would be conflating strings 
with the addresses that they represent.
(NB: my point here is not to say that "literals as subjects" is a bad 
idea per se, but that this would be a bad solution to this particular 
problem)

My point here is that remodeling can not always be avoided -- or that 
avoiding it would overly complicate the model (example 1), or lead to 
even worse modelling (example 2).

So yes, we should strive to make the user's life easier. But we must 
keep in mind that this is a trade-off. The curse is sometimes worse than 
the disease.

RDFn makes the inner model more complex (alla Example 1 above):

- it adds a "4th column" to every triple. IIUC, you seem to assume that 
all implementations already deal with some for of triple identifier, all 
we need is to expose it to the user. But I am not sure that all 
implementations have such an internal identifier (I am actually pretty 
sure that some don't).

- somehow, it turns graphs, that are currently sets of triples, into 
multisets of triples. And multisets are tricky. What happens for example 
when you merge two graphs containing an identical triple? Is it the same 
triple? Two triples with different "default" identifiers? What appens 
when you use SPARQL UPDATE to remove a triple? Do you remove only one of 
them or all of them? Can of worm ahead...

   pa

Attachments

application/pgp-keys attachment: OpenPGP public key

Received on Friday, 20 January 2023 23:27:45 UTC