Re: owl:sameAs/referential opacity Re: Can RDFstar be defined as only syntactic sugar on top of RDF (Re: weakness of embedded triples) from thomas lörtsch on 2020-10-29 (public-rdf-star@w3.org from October 2020)

From: thomas lörtsch <tl@rat.io>
Date: Thu, 29 Oct 2020 01:14:20 +0100
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Cc: Jerven.Bolleman@sib.swiss, public-rdf-star@w3.org
Message-Id: <1B709C35-8B9E-4B5F-98C3-138B720A7947@rat.io>
Pierre-Antoine,

I sympathize with the goal of referential opacity but it seems like you have to work hard against the mechanics of RDF to achieve something within RDF that RDF by its very nature does not only not support but rather aims to get past.

Another question is if the need for referential opacity isn’t rather special and therefor could better be realized through a special mechanism rather than as the guiding design principle. Provenance is only one use case for statement annotation and even then the demands are usually not as extreme as your semantics try to support.
The argument that it’s easier to add functionality than to take it away later sounds good and true but maybe it’s not when the system within which you are working - RDF - is already based on another paradigm.

Why not go another route and document the original statement as a string - withdrawn from all greedy reasoners - when the need arises and in all other cases just live with the unavoidable unhelpful entailment now and then. That’s actually not my idea but yours, below ;-)


> On 28. Oct 2020, at 19:31, Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu> wrote:
> 
> Jerven,
> 
> On 28/10/2020 18:32, Jerven Tjalling Bolleman wrote:
>> Hi Pierre-Antoine,
>> 
>> Splitting the reply, especially the first part into a different e-mail regarding syntactic sugar.
>> 
>> Maybe talk about referential opacity in face of owl:sameAs as a single topic.
>> I think this is a tricky thing, and well worth considering dropping.
> Agreed
>> 
>> Your examples
>> 
>> << dbr:Paris dbo:populationTotal 2229621>> :assertedBy
>>  <http://dbpedia.org> .
>> << geo:2229621 gn:population 2138551>> :assertedBy
>>  <http://geonames.org> .
>>   dbr:Paris owl:sameAs geo:2229621.
>> 
>> inferes to me no more than
>> 
>> << geo:2229621 dbo:populationTotal 2229621>> :assertedBy
>>  <http://dbpedia.org> .
>> << dbr:Paris gn:population 2138551>> :assertedBy
>>  <http://geonames.org> .
>> 
>> which to me seems a reasonable thing as Paris is Paris ;)
> The point is : you assume that my triple above means
> 
>    (1) DBpedia says that Paris has a population of 2229621.
> 
> while I want it to mean
> 
>     (2) DBPedia says: “Paris has a population of 2229621”.

!

> From (1), we can infer
> 
>     (3) DBPedia says that the capital of France has a population of 2229621.
> 
> because we know that Paris is the capital of France, so the two cited facts are equivalent.
> 
> but from (2), you can NOT infer
> 
>     (4) DBPedia says "the capital of France has a population of 2229621"
> 
> because those are not the terms used by DBPedia. The two statements are different (even if they describe equivalent facts).
> 
> Referential opacity is more flexible, because from the statement we can find the fact (by just interpreting the terms); the opposite is not true, because from a fact, we can not find the precise terms that were used to state it.
> 
>> 
>> The clarkKent/superman example to me always fails because to me
>> they are nor the same thing but two alter ego's of Kal-El. i.e.
>> :clarkKent is not the sameAs :superman.
> I see your point, but I would not go as far as saying that one way of modeling it is correct, and the other is wrong.
> 
> Am I *really* the same person when I play with my kids, when I teach my students, and when I discuss Semantic Web philosophy? In that respect, is it correct or not to state
> 
>     <http://champin.net/#pa> owl:sameAs <http://orcid.org/0000-0001-7046-4474>
> 
> in my FOAF profile?
> 
> To sum it up: granted, the Superman example is maybe not perfect, because it could obviously be modeled otherwise (or, say, more obviously than other examples).
> 
> But I think the proposed modelling is valid, and so the problem it depicts is real.
> 
>> 
>> Or alternatively, once
>> :louisLane infers/finds out :clarkKent owl:sameAs :superman, she does believe
>> :clarkKent :can :fly .
> Yes, but the whole point is: we don't want to assume that she knows it, even if WE know it.
>> In which case it is fine to not have opacity.
>> 
>> However, there are still ways arround to get the this behaviour
>> without referential opacity being a requirement. For example in graphdb by ontotext infered statements are in a different named graph. Which for the SPARQLstar side allows asking on demand if referential opacity is desired or not.
> That's an implementation detail.
> 
> RDF(S) semantics makes no distinction between "stated triples" and "inferred triples". So unless we change the semantics of RDF (!),

!!

Thomas


> the proposed strategy does not work in general (only with some specific implementations).

>> Another way to think about it is. If users don't want the consequences
>> of OWL reasoning, maybe they shouldn't use OWL reasoners ;) Because to me
>> wanting to have referential opacity in the face of reasoners is not a given.
>> For example, in the reasoner is a police detective frame of mind.
>> 
>> << :thief :stole :diamond >> a :criminalAct .
>> then
>> :butler owl:sameAs :thief .
>> infering
>> << :butler :stole :diamond >> a :criminalAct .
>> seems reasonable to me .
> Again, fact vs. statements.
> 
>   best
> 
>> 
>> 
>> Regards,
>> Jerven
>> 
>> 
>> 
>> 
>> 
>> 
>> On 2020-10-28 17:36, Pierre-Antoine Champin wrote:
>>> Jerven,
>>> 
>>> thanks for your feedback.
>>> On 28/10/2020 12:29, Jerven Bolleman wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Yes it can be defined as syntactic sugar, and IMO it should.
>>>> Considering in the beginning that RDF* was defined in terms of RDF
>>>> reification.
>>> 
>>> To be precise, RDF* was defined with an abstract syntax of its own,
>>> but no specific semantics (instead, a projection to RDF standard
>>> reification).
>>> 
>>> What motivated us to propose a dedicated semantics was, mostly
>>> 
>>> * that standard reification has no specific semantics in standard RDF,
>>> and
>>> 
>>> * to provision for referential opacity (more on this below).
>>> 
>>> I'll discuss the second item below (as you mention it in the following
>>> of your message).
>>> 
>>>> And it can be implemented that way e.g. as POC done for rdflib.
>>>> 
>>>> Which gets is back to the reason what do we want to achieve with
>>>> RDFstar? I think most of us want to have reification without the
>>>> hassle of writing out a quad all the time. Plus having optimizations
>>>> possible at the storage level/query engine.
>>>> 
>>>> A solution for the triples with all IRI and/or Literals is simply
>>>> generate an implicit IRI for them.
>>>> 
>>>> For the triples with a blank node I think the simplest is to
>>>> generate an
>>>> implicit new blank node for them. (Which would be skolemized in some
>>>> form in any triple store anyway).
>>>> 
>>>> SPARQL and RDF syntax wise this would be simple. API wise this would
>>>> 
>>>> also be easy in python rdflib. We have two new subclasses of URIRef,
>>>> and BNode basically.
>>>> 
>>>> class TripleUriRef(URIRef):
>>>> 
>>>> class TripleBnode(BNode):
>>>> 
>>>> in java rdf4j it would probably be something like
>>>> 
>>>> interface Triple{
>>>> }
>>>> interface TripleIRI extends IRI, Triple {
>>>> }
>>>> interface TripleBNode extends BNode, Triple {
>>>> }
>>>> 
>>>> Semantics stay the same, we only get a new syntax for reification.
>>>> Question about PG|SA stays unanswered by doing this.
>>> 
>>> I agree that this strategy would probably work if it wasn't for
>>> referential opacity.
>>> 
>>> However, you would still need to specify a precise semantics for
>>> rdf:subject, rdf:predicate and rdf:object, to specify how to handle
>>> corner cases such as:
>>> 
>>>     :alice :says [ rdf:subject :bob ].
>>> 
>>>     :alice :says [ rdf:subject :bob, :charlie;  rdf:predicate :age;
>>> rdf:age 42 ].
>>> 
>>>     :alice :says [ rdf:subject :bob;  rdf:predicate :age;  rdf:age 42
>>> ].
>>>     :bob :says [ rdf:subject :bob;  rdf:predicate :age;  rdf:age 42 ].
>>> 
>>>> For me as an user, what benefits would a different semantics for
>>>> RDFstar bring?
>>> 
>>> Calling it a "different semantics" is slightly unfair; it is an
>>> _extension_ of the standard semantics.
>>> 
>>> In any case, you need to extend the semantics, either of the new
>>> abstract syntax (the approach we chose),
>>> 
>>> or of the specific vocabulary (here, rdf:subject and co.) used to
>>> "encode" RDF* into RDF's abstract syntax.
>>> 
>>> Granted, the second option would be less disruptive. But it assumes
>>> that referential opacity is not a problem (again, see below).
>>> 
>>>> Going the syntactic sugar route we don't need a specification for
>>>> RDFstar, just for TurtleStar, RDF/XMLstar and SPARQLstar etc.
>>>> Which we would need anyway.
>>>> 
>>>> The problem here becomes, lack of a WG means we don't have a good
>>>> way
>>>> to determine consensus and actually record a decision.
>>>  The idea was to mimic a WG process as much as possible, to help
>>> things move forward (with a report, github issues to track discussions
>>> and decisions, and weekly calls).
>>> 
>>>> Still mapping RDFstar in terms of RDF reification leaves open the
>>>> issue of referential opacity. I think that this is a red herring, I
>>>> think the superman problem is an issue with datamodelling, which
>>>> should not complicate our entire tech stack. i.e the :superman
>>>> owl:sameAs :clarkKent is a faulty assertion and that fault leads to
>>>> the impossibility to correctly express what :louislane believes.
>>> 
>>> owl:sameAs is a can of worms of its own. If you are not convinced by
>>> the superman problem, what about:
>>> 
>>>     << dbr:Paris dbo:populationTotal 2229621>> :assertedBy
>>> <http://dbpedia.org> [2].
>>>     << geo:2229621 gn:population 2138551>> :assertedBy
>>> <http://geonames.org> [3].
>>>     dbr:Paris owl:sameAs geo:2229621.
>>> 
>>> Arguably, we don't want to infer that <http://geonames.org> [3] says
>>> anything about dbr:Paris (that is, if we want to preserve *which
>>> terms* was used by each source, not only *which resource* they are
>>> describing).
>>> 
>>>   best
>>> 
>>>> Regards,
>>>> Jerven
>>>> 
>>>> PS. Regarding the PG or SA mode I am a fan of going for PG, given
>>>> the UniProt experience with RDF/XML rdf:ID  which is a PG syntax.
>>>> rdf:ID being a PG syntax is important for our internal code being
>>>> needed to generate and read our rdf/xml. Not having this kind of
>>>> sugar in other syntaxes is why we are still preferably shipping
>>>> rdf/xml for UniProt.
>>>> 
>>>> PSS. about 15% of UniProt triples are reifcation quads. Being able
>>>> to get these out of the quad table would be nice. Especially the
>>>> consequences for reducing the joins to use them and how badly these
>>>> quads fit into current indexes.
>>>> 
>>>> On 10/28/20 9:57 AM, Pierre-Antoine Champin wrote:
>>>> Holger,
>>>> 
>>>> (I did what should have been done a long time ago: rename this
>>>> subthread to something more relevant)
>>>> 
>>>> On 27/10/2020 23:31, Holger Knublauch wrote:
>>>> On 10/28/2020 1:53 AM, Pierre-Antoine Champin wrote:
>>>> Holger,
>>>> 
>>>> Now I'm confused. This thread (which should have been renamed a long
>>>> 
>>>> time ago) is, in my understanding, about Martynas' question raised
>>>> here
>>>> 
>>>> 
>>> <https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com>
>>>> [1]
>>>> 
>>>> Does RDF* need new semantics at all?
>>>> While I believe the answer is "yes", I concede that answering "no"
>>>> to
>>>> that question would be convenient, because it would mean that
>>>> existing
>>>> implementations of RDF could handle RDF* at the syntactical level
>>>> only,
>>>> i.e. parse Turtle* and store it standard RDF triples.
>>>> 
>>>> In your examples below, however, you propose to extend existing
>>>> implementations -- which defeats the purpose of fitting RDF* into
>>>> standard RDF semantics...
>>> 
>>> The current RDF* draft requires introducing a 4th term type "RDF*
>>> triples":
>>> 
>>>> IRIs <https://www.w3.org/TR/rdf11-concepts/#dfn-iri> [4],literals
>>> <https://www.w3.org/TR/rdf11-concepts/#dfn-literal> [5],blank nodes
>>> <https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node> [6]andRDF*
>>> triples <https://w3c.github.io/rdf-star/#dfn-triple> [7]are
>>> collectively known asRDF* terms.
>>> 
>>>  Correct.
>>> 
>>> In my understanding, introducing a new subclass of Node (TripleNode)
>>> was the implementation counterpart of this extension of the abstract
>>> syntax, but it seems that I was misunderstanding.
>>> 
>>>> The approaches based on (long) URIs avoid this and therefore are
>>>> likely much less concerning w.r.t. existing implementations. A
>>>> syntactic mapping means that existing APIs can represent these
>>>> triples as normal URI nodes. From our own experience moving to this
>>>> design was not disruptive (although some users have raised concerns
>>>> about exposing the ugly long URIs in unexpected places such as
>>>> exporting them to plain Turtle).
>>>> 
>>>> Having said this, *some* implementations may represent these triple
>>>> nodes differently, e.g. using the internal data structure I outlined
>>>> below. This avoids ever storing these long URIs, makes it arguably
>>>> easier to index and search over them, and probably keeps the issues
>>>> with changing bnode identifiers at bay. But this is an
>>>> implementation detail to me.
>>>  Agreed, but conversely, implementing the current draft using long
>>> IRIs can be considered an implementation detail...
>>> 
>>> I think we also agree that the aim of the spec is to be as clear and
>>> simple as possible, and that implementations may depart use their own
>>> different internal models (for various reasons: backward
>>> compatibility, optimization...) as long as they behave according the
>>> the spec.
>>> 
>>>> The additional semantics of interpreting these special URI nodes
>>>> differently would be local to the RDF* specs and would not require
>>>> adaptations to existing specs.
>>>  Again, agreed: this would be much less disruptive to the entire
>>> ecosystem –and much less work for us in writing this document ;-).
>>> 
>>> But again I think that blank nodes in embedded triples make this
>>> approach very hard. Defining the correct behaviour of "long IRI" when
>>> they actually represent embedded blank nodes, if even possible, would
>>> be extremely cumbersome (as opposed to the clear and simple way that
>>> the spec should aim for). That being said, any PR proving me wrong is
>>> welcome.
>>> 
>>>> Holger
>>> 
>>> 
>>> 
>>> Links:
>>> ------
>>> [1]
>>> https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com
>>> [2] http://dbpedia.org
>>> [3] http://geonames.org
>>> [4] https://www.w3.org/TR/rdf11-concepts/#dfn-iri
>>> [5] https://www.w3.org/TR/rdf11-concepts/#dfn-literal
>>> [6] https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node
>>> [7] https://w3c.github.io/rdf-star/#dfn-triple
>>
Received on Thursday, 29 October 2020 00:15:00 UTC