Re: RDF-star use cases from Amazon Neptune from Pierre-Antoine Champin on 2021-12-07 (public-rdf-star@w3.org from December 2021)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Tue, 7 Dec 2021 14:26:43 +0100
To: "Lassila, Ora" <ora@amazon.com>, David Booth <david@dbooth.org>, "public-rdf-star@w3.org" <public-rdf-star@w3.org>, Peter Patel-Schneider <pfpschneider@gmail.com>
Message-ID: <042b5c4b-e1ca-101a-567a-50511b6c62c4@ercim.eu>
David, Ora, Peter,

this is a grouped answer to your replies to my previous email [1].

I got slightly carried away when I wrote "RDF is a logic". What I really 
meant was "RDF has an underlying semantics". As Peter pointed out, the 
fact that this semantics is formally captured by the model theoretic 
semantics [2] is not really relevant here. The "base" semantics is 
presented in 'RDF 1.1 Concepts and Abstract Syntax', in particular §1.2 
and §1.3 [3].

I see this underlying semantics as a kind of contract
- between a person using RDF and the RDF "machinery" that Ora mentioned, and
- between an agent publishing RDF data and an agent consuming that data.

I agree with David that this contract is not a "get-of-of-jail-free" 
card: you always have to carefully check external data you are 
consuming. First, because mistakes happen and some published data may 
not comply with the contract. Second, because the contract is minimal; 
the full semantics of the data is "distributed" across the RDF base 
semantics, the vocabulary-specific semantics, and some off-band shared 
knowledge (or assumed to be shared).

Still, this base semantics impacts what we consider good, no-so-good, or 
bad modelling in RDF. More specifically, a modelling can be deemed bad 
if it breaks the expectation of the RDF "machinery" / of other RDF 
users. Consider the property ex:lengthInChars, taking an IRI as its 
subject, and and the number of characters in that IRI as its value :

     <http://champin.net/#pa> a schema:Person ;
         schema:name "Pierre-Antoine" ;
         owl:sameAs 
<http://data.semanticweb.org/person/pierre-antoine-champin> ;
         ex:lengthInChars 22 .

I hope we agree that this is bad RDF modelling, because it violates the 
expectation that an IRI always denote the same thing [3]: in the first 
three statements, it denotes a person, while in the 3rd statement, it 
denotes itself.

This expectation is semantics -- nothing in the abstract syntax breaks 
with the example above. And this semantic expectation is baked in a 
large part of the RDF machinery, which therefore may backfire by 
producing unexpected/undesirable results. E.g. SPARQL with some 
entailment regime, or an OWL reasoner, will produce

<http://data.semanticweb.org/person/pierre-antoine-champin> 
ex:lengthInChars 22.

On the other hand, the LPG machinery has much less expectations about 
the meaning of the data it processes. This leaves the design space much 
more open, which is a good thing in some situations, as Ora pointed out. 
But this also makes interoperability harder. RDF, with its base 
semantics, makes interoperability easier -- even if it does not 
magically provide full interoperability for free, of course.

   pa

[1] https://www.w3.org/mid/90bc155e-4b86-23a8-a99b-81c0aa5c86a6@ercim.eu

[2] https://www.w3.org/TR/rdf11-mt/

[3] https://www.w3.org/TR/rdf11-concepts/#resources-and-statements



PS: here is another example of bad RDF modelling, which does not involve 
reasoning in the classical sense. Consider the property 
ex:measuredHeightInCm, which expects an object of type xsd:decimal, and 
whose documentation mandates that the lexical value should include all 
and only significant digits. Therefore

     ex:monalisa ex:measuredHeightInCm "77.0"^^xsd:decimal.

and

     ex:monalisa ex:measuredHeightInCm "77.00"^^xsd:decimal.

would be considered to have different meanings.

Again, from a purely syntactical point of view, there is nothing wrong 
with this modelling, because these two graphs are indeed different (in 
the abstract syntax).

However, this breaks the expectation that literals are interpreted 
according to the shared definition of their datatype (in that case, 
[4]). Some implementations may do some under-the-hood normalization of 
the literals, and conflate the two graphs above.

[4] https://www.w3.org/TR/xmlschema-2/#decimal



On 06/12/2021 11:58, Lassila, Ora wrote:
> I think David makes some valid points here.
>
> A couple of observations:
>
> 1) The "dependence on semantics" is an interesting issue. For sure I have built applications that very much depended on RDF's semantics (and used a reasoner). To not do so would mean to build everything from scratch, which is not what I prefer to do. So even if I technically may "control all data" in my application, it is *much* easier for me to rely on RDF than not.
>
> 2) Related to #1, the RDF vs. LPG question, even I you reduce it to the question of "different graph representations", has to take into account the fact that there is a lot of "machinery" that comes with RDF that you would end up building yourself if you used LPGs. This does not necessarily reflect negatively on LPGs, since it may suit you just fine to build whatever mechanisms and machinery you need. I see Neptune customer use cases where one or the other approach makes more sense. If I do want mechanisms like those that RDF offers (including reasoning and well-defined semantics), I prefer to take what RDF gives me rather than rolling my own. I (and many other folks) went through a lot of effort and pain to get RDF where it is now.
>
> 3) The question on how to nest RDF-star statements to represent the right semantics (in my marriage use case) is important. How you nest (that is, in which order) makes a difference, because whoever is going to query that representation has to know it. In general, I stand by my characterization of "awkward", since I want to take into account the "user experience" of querying. The ability to query a representation is dependent on whether there is a likelihood that you manage to write a correct query. This in addition to semantic correctness: if the query does not give you the answers you need, what good is the query? Also note that even the nesting in my example did not fully solve the problem at hand.
>
> Regards,
>
> Ora
>
>
> On 12/5/21, 7:00 PM, "David Booth" <david@dbooth.org> wrote:
>
>      CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>      >> On 12/3/21 6:31 AM, Pierre-Antoine Champin wrote:
>      >>> In my view, the impedance mismatch
>      >>> between RDF and PGs is not due to some arbitrary restriction on the
>      >>> RDF model. It is due to the fact that RDF is a logic, that can be
>      >>> represented as a graph, while PG is a graph data model, without any
>      >>> semantic commitment.
>
>      I respectfully but very much disagree.  I see RDF being used to solve
>      problems, just like PGs.  And although I like RDF's grounding in
>      semantics, I have never seen an RDF application that truly depended on
>      that semantic grounding.  Consider this:
>
>        - For an application in which you control all of the data, clearly
>      your application does not depend on RDF's semantics, because your
>      application could just as well CHOOSE to apply RDF's semantics.
>
>        - And for an application in which you do NOT control all of the data
>      -- I'm thinking here primarily of Linked Data applications -- do you
>      really think that those applications would not work if the data
>      producers had published PGs for you to consume instead of RDF (and your
>      application used PGs)?   Personally, I seriously doubt it.
>
>      Even with RDF's grounding in a standard semantics, every application
>      developer who uses RDF from other sources needs to look carefully at
>      that external data in advance to see if its semantics matches the needs
>      of the application.  Otherwise the application will likely produce
>      garbage output.  In other words, even though RDF itself has a standard
>      semantic grounding, that grounding is no get-out-of-jail-free card to
>      bypass the need to apply application-specific semantics.
>
>      I have always viewed the most significant differences between RDF and
>      PGs as being purely practical choices of graph representation.  But
>      maybe this is just a difference in perception?
>
>      Best wishes,
>      David Booth
>
>
Attachments

application/pgp-keys attachment: OpenPGP public key
Received on Tuesday, 7 December 2021 13:26:50 UTC