owl:sameAs/referential opacity Re: Can RDFstar be defined as only syntactic sugar on top of RDF (Re: weakness of embedded triples) from Jerven Tjalling Bolleman on 2020-10-28 (public-rdf-star@w3.org from October 2020)

From: Jerven Tjalling Bolleman <Jerven.Bolleman@sib.swiss>
Date: Wed, 28 Oct 2020 18:32:35 +0100
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Cc: public-rdf-star@w3.org
Message-ID: <550154fbec404f1097ac2520ae923f5e@imap.sib.swiss>
Hi Pierre-Antoine,

Splitting the reply, especially the first part into a different e-mail 
regarding syntactic sugar.

Maybe talk about referential opacity in face of owl:sameAs as a single 
topic.
I think this is a tricky thing, and well worth considering dropping.

Your examples

<< dbr:Paris dbo:populationTotal 2229621>> :assertedBy
  <http://dbpedia.org> .
<< geo:2229621 gn:population 2138551>> :assertedBy
  <http://geonames.org> .
   dbr:Paris owl:sameAs geo:2229621.

inferes to me no more than

<< geo:2229621 dbo:populationTotal 2229621>> :assertedBy
  <http://dbpedia.org> .
<< dbr:Paris gn:population 2138551>> :assertedBy
  <http://geonames.org> .

which to me seems a reasonable thing as Paris is Paris ;)

The clarkKent/superman example to me always fails because to me
they are nor the same thing but two alter ego's of Kal-El. i.e.
:clarkKent is not the sameAs :superman.

Or alternatively, once
:louisLane infers/finds out :clarkKent owl:sameAs :superman, she does 
believe
:clarkKent :can :fly . In which case it is fine to not have opacity.

However, there are still ways arround to get the this behaviour
without referential opacity being a requirement. For example in graphdb 
by ontotext infered statements are in a different named graph. Which for 
the SPARQLstar side allows asking on demand if referential opacity is 
desired or not.

Another way to think about it is. If users don't want the consequences
of OWL reasoning, maybe they shouldn't use OWL reasoners ;) Because to 
me
wanting to have referential opacity in the face of reasoners is not a 
given.
For example, in the reasoner is a police detective frame of mind.

<< :thief :stole :diamond >> a :criminalAct .
then
:butler owl:sameAs :thief .
infering
<< :butler :stole :diamond >> a :criminalAct .
seems reasonable to me .


Regards,
Jerven






On 2020-10-28 17:36, Pierre-Antoine Champin wrote:
> Jerven,
> 
> thanks for your feedback.
> On 28/10/2020 12:29, Jerven Bolleman wrote:
> 
>> Hi All,
>> 
>> Yes it can be defined as syntactic sugar, and IMO it should.
>> Considering in the beginning that RDF* was defined in terms of RDF
>> reification.
> 
> To be precise, RDF* was defined with an abstract syntax of its own,
> but no specific semantics (instead, a projection to RDF standard
> reification).
> 
> What motivated us to propose a dedicated semantics was, mostly
> 
> * that standard reification has no specific semantics in standard RDF,
> and
> 
> * to provision for referential opacity (more on this below).
> 
> I'll discuss the second item below (as you mention it in the following
> of your message).
> 
>> And it can be implemented that way e.g. as POC done for rdflib.
>> 
>> Which gets is back to the reason what do we want to achieve with
>> RDFstar? I think most of us want to have reification without the
>> hassle of writing out a quad all the time. Plus having optimizations
>> possible at the storage level/query engine.
>> 
>> A solution for the triples with all IRI and/or Literals is simply
>> generate an implicit IRI for them.
>> 
>> For the triples with a blank node I think the simplest is to
>> generate an
>> implicit new blank node for them. (Which would be skolemized in some
>> form in any triple store anyway).
>> 
>> SPARQL and RDF syntax wise this would be simple. API wise this would
>> 
>> also be easy in python rdflib. We have two new subclasses of URIRef,
>> and BNode basically.
>> 
>> class TripleUriRef(URIRef):
>> 
>> class TripleBnode(BNode):
>> 
>> in java rdf4j it would probably be something like
>> 
>> interface Triple{
>> }
>> interface TripleIRI extends IRI, Triple {
>> }
>> interface TripleBNode extends BNode, Triple {
>> }
>> 
>> Semantics stay the same, we only get a new syntax for reification.
>> Question about PG|SA stays unanswered by doing this.
> 
> I agree that this strategy would probably work if it wasn't for
> referential opacity.
> 
> However, you would still need to specify a precise semantics for
> rdf:subject, rdf:predicate and rdf:object, to specify how to handle
> corner cases such as:
> 
>     :alice :says [ rdf:subject :bob ].
> 
>     :alice :says [ rdf:subject :bob, :charlie;  rdf:predicate :age;
> rdf:age 42 ].
> 
>     :alice :says [ rdf:subject :bob;  rdf:predicate :age;  rdf:age 42
> ].
>     :bob :says [ rdf:subject :bob;  rdf:predicate :age;  rdf:age 42 ].
> 
>> For me as an user, what benefits would a different semantics for
>> RDFstar bring?
> 
> Calling it a "different semantics" is slightly unfair; it is an
> _extension_ of the standard semantics.
> 
> In any case, you need to extend the semantics, either of the new
> abstract syntax (the approach we chose),
> 
> or of the specific vocabulary (here, rdf:subject and co.) used to
> "encode" RDF* into RDF's abstract syntax.
> 
> Granted, the second option would be less disruptive. But it assumes
> that referential opacity is not a problem (again, see below).
> 
>> Going the syntactic sugar route we don't need a specification for
>> RDFstar, just for TurtleStar, RDF/XMLstar and SPARQLstar etc.
>> Which we would need anyway.
>> 
>> The problem here becomes, lack of a WG means we don't have a good
>> way
>> to determine consensus and actually record a decision.
>  The idea was to mimic a WG process as much as possible, to help
> things move forward (with a report, github issues to track discussions
> and decisions, and weekly calls).
> 
>> Still mapping RDFstar in terms of RDF reification leaves open the
>> issue of referential opacity. I think that this is a red herring, I
>> think the superman problem is an issue with datamodelling, which
>> should not complicate our entire tech stack. i.e the :superman
>> owl:sameAs :clarkKent is a faulty assertion and that fault leads to
>> the impossibility to correctly express what :louislane believes.
> 
> owl:sameAs is a can of worms of its own. If you are not convinced by
> the superman problem, what about:
> 
>     << dbr:Paris dbo:populationTotal 2229621>> :assertedBy
> <http://dbpedia.org> [2].
>     << geo:2229621 gn:population 2138551>> :assertedBy
> <http://geonames.org> [3].
>     dbr:Paris owl:sameAs geo:2229621.
> 
> Arguably, we don't want to infer that <http://geonames.org> [3] says
> anything about dbr:Paris (that is, if we want to preserve *which
> terms* was used by each source, not only *which resource* they are
> describing).
> 
>   best
> 
>> Regards,
>> Jerven
>> 
>> PS. Regarding the PG or SA mode I am a fan of going for PG, given
>> the UniProt experience with RDF/XML rdf:ID  which is a PG syntax.
>> rdf:ID being a PG syntax is important for our internal code being
>> needed to generate and read our rdf/xml. Not having this kind of
>> sugar in other syntaxes is why we are still preferably shipping
>> rdf/xml for UniProt.
>> 
>> PSS. about 15% of UniProt triples are reifcation quads. Being able
>> to get these out of the quad table would be nice. Especially the
>> consequences for reducing the joins to use them and how badly these
>> quads fit into current indexes.
>> 
>> On 10/28/20 9:57 AM, Pierre-Antoine Champin wrote:
>> Holger,
>> 
>> (I did what should have been done a long time ago: rename this
>> subthread to something more relevant)
>> 
>> On 27/10/2020 23:31, Holger Knublauch wrote:
>> On 10/28/2020 1:53 AM, Pierre-Antoine Champin wrote:
>> Holger,
>> 
>> Now I'm confused. This thread (which should have been renamed a long
>> 
>> time ago) is, in my understanding, about Martynas' question raised
>> here
>> 
>> 
> <https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com>
>> [1]
>> 
>> Does RDF* need new semantics at all?
>> While I believe the answer is "yes", I concede that answering "no"
>> to
>> that question would be convenient, because it would mean that
>> existing
>> implementations of RDF could handle RDF* at the syntactical level
>> only,
>> i.e. parse Turtle* and store it standard RDF triples.
>> 
>> In your examples below, however, you propose to extend existing
>> implementations -- which defeats the purpose of fitting RDF* into
>> standard RDF semantics...
> 
> The current RDF* draft requires introducing a 4th term type "RDF*
> triples":
> 
>> IRIs <https://www.w3.org/TR/rdf11-concepts/#dfn-iri> [4],literals
> <https://www.w3.org/TR/rdf11-concepts/#dfn-literal> [5],blank nodes
> <https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node> [6]andRDF*
> triples <https://w3c.github.io/rdf-star/#dfn-triple> [7]are
> collectively known asRDF* terms.
> 
>  Correct.
> 
> In my understanding, introducing a new subclass of Node (TripleNode)
> was the implementation counterpart of this extension of the abstract
> syntax, but it seems that I was misunderstanding.
> 
>> The approaches based on (long) URIs avoid this and therefore are
>> likely much less concerning w.r.t. existing implementations. A
>> syntactic mapping means that existing APIs can represent these
>> triples as normal URI nodes. From our own experience moving to this
>> design was not disruptive (although some users have raised concerns
>> about exposing the ugly long URIs in unexpected places such as
>> exporting them to plain Turtle).
>> 
>> Having said this, *some* implementations may represent these triple
>> nodes differently, e.g. using the internal data structure I outlined
>> below. This avoids ever storing these long URIs, makes it arguably
>> easier to index and search over them, and probably keeps the issues
>> with changing bnode identifiers at bay. But this is an
>> implementation detail to me.
>  Agreed, but conversely, implementing the current draft using long
> IRIs can be considered an implementation detail...
> 
> I think we also agree that the aim of the spec is to be as clear and
> simple as possible, and that implementations may depart use their own
> different internal models (for various reasons: backward
> compatibility, optimization...) as long as they behave according the
> the spec.
> 
>> The additional semantics of interpreting these special URI nodes
>> differently would be local to the RDF* specs and would not require
>> adaptations to existing specs.
>  Again, agreed: this would be much less disruptive to the entire
> ecosystem –and much less work for us in writing this document ;-).
> 
> But again I think that blank nodes in embedded triples make this
> approach very hard. Defining the correct behaviour of "long IRI" when
> they actually represent embedded blank nodes, if even possible, would
> be extremely cumbersome (as opposed to the clear and simple way that
> the spec should aim for). That being said, any PR proving me wrong is
> welcome.
> 
>> Holger
> 
> 
> 
> Links:
> ------
> [1]
> https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com
> [2] http://dbpedia.org
> [3] http://geonames.org
> [4] https://www.w3.org/TR/rdf11-concepts/#dfn-iri
> [5] https://www.w3.org/TR/rdf11-concepts/#dfn-literal
> [6] https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node
> [7] https://w3c.github.io/rdf-star/#dfn-triple

-- 
Jerven Tjalling Bolleman
SIB | Swiss Institute of Bioinformatics
CMU - 1, rue Michel Servet - 1211 Geneva 4
t: +41 22 379 58 85 - f: +41 22 379 58 58
Jerven.Bolleman@sib.swiss - http://www.sib.swiss
Received on Wednesday, 28 October 2020 17:33:06 UTC