On the TEP semantic extension [was: Re: Revisiting RDF-star semantics (was Re: Why is the RDF-star working group standardising RDF 1.2 and SPARQL 1.2?)] from Thomas Lörtsch on 2023-02-09 (public-rdf-star-wg@w3.org from February 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Thu, 9 Feb 2023 01:13:29 +0100
To: Olaf Hartig <olaf.hartig@liu.se>
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, public-rdf-star-wg@w3.org
Message-Id: <14DA2ED4-26D9-47C8-8D34-F0676ADB568A@rat.io>
Olaf,

> On 2. Feb 2023, at 00:26, Olaf Hartig <olaf.hartig@liu.se> wrote:
> 
> Thomas,
> 
> You are repeating the same two issues that you seem to have with TEPs

[0] 

> that you already tried to point out during the CG.

(see e.g.[1])

> Can you please elaborate on them a bit more because a) I still don't get them and b) we have some new people here who haven't been in the CG.

Yes. It’s not my most favorite topic as I would prefer the need for this semantic extension to just go away by defining embedded triples as referentially transparent, but here you go. Also please note that James Anderson also wasn’t lucky about the proposal and called it too involved for use in production (my words - I hope James is listening and corrects me if I’m not representing his assessment correctly).

One should, for those that haven’t followed the CG, say first that we seem to start from different assumptions about
- the kind of annotations that are to be expected,
- what is the right attack vector to implement the change in semantics and
- what is the useful default semantics of embedded triples

What kind of annotations are to be expected? It's my impression that you consider annotations mostly under the aspect of provenance (I could try to dig up your comments on GitHub that made me think that, but hopefully that’s not necessary), whereas I am operating under the assumption that quoted triples will be used for any kind of annotation, especially when data from Property Graph land is converted to RDF (annotations that to conservative leaning semanticists might qualify as unruly violations of monotonicity, but which I interpret as legitimate additional detail, just not in the form of an extra regular triple but of an extra annotating triple). RDF-star is intended to ease RDF/LPG interoperability, so it is important to have those use cases in mind. In other words: in my intuition any property can and might be used in annotations. 

W.r.t. to the right vector it's obvious that you think that the semantic extension should employ properties to target the embedded triples - the intuition seems to be that some properties are typically used on referentially transparent embedded triples. I however think that it would be much better if individual embedded triples could be declared to be referentially transparent - the intuition being that the need to quote often depends on what is stated, not on what annotations are attached to the statements, e.g. one might regularily want to record the provenance of statements, but only for specific (scandalous, famous, legal) statements would one want to record their exact syntactic representation.

But even more important than those assumptions is probably the underlying design philosophy: the TEP proposal approaches the problem as if it were perfectly normal that embedded triples a referentially opaque and just some corner cases need to employ the TEP approach. However, that intuition is very arguable and in my opinin plain wrong, even dangerous. The extremely restraint approach to entailments in the proposed semantics was justified with the fact that entailments, once made, can’t be taken back. However, it was said, it would be easy to lift the syntactic restriction and make embedded triples behave like any other regular triple. Only after I insisted on more than handwaving assurances was an actual proposal presented of how such an "easy" mechanism could work: the TEP semantic extension that you authored. I expect from RDF-star a semantic extension that does indeed make it easy to address the referentially transparent version of quoted triples: easy to define and easy to use, as easy as if it was ubiquituous - because the use cases needing it are ubiquituous. The TEP semantic extension proposal delivers none of that.


> One of your two points is that "the semantic extension has to be invoked per property per graph" and that "a property [cannot] be declared to always make quoted triples referentially transparent."
> I guess what you mean by this is that, if one wants to communicate to a receiver of an RDF-star graph that a particular property is meant to be a TEP, then that property needs to be declared to be a TEP (please confirm if my interpretation of your sentence here is correct).

Yes, correct.

> Yes that needs to be done.

See example 39 from the CG report [2]:

    << dbr:Linköping dbo:populationTotal "104232"^^xsd::nonNegativeInteger >>
        :measuredOn "2010-12-31"^^xsd:date ;
        :source <https://dbpedia.org/data/Linköping.ttl> .

The CG report ponders the declaration of :measuredOn as TEP:

    :measuredOn, rdf:type, rdf-star:TransparencyEnablingProperty

Imagine that data on the population of danish cities is integrated from different sources (e.g. the respective municipalities) and collected in named graphs per date. The population of Linköping is recorded every day. Consequently in every graph, every day you’ll have to define anew that :measuredOn is a TEP. That is tedious, and easy to forget too.

> But the same is true for properties that are, e.g., transitive or symmetric. So, you don't like owl:TransitiveProperty and owl:SymmetricProperty either?

I could now employ Pierre-Antoine’s style of argumentation that OWL is out of scope, but of course I won’t. The issue is that OWL properties are defined in an ontology that can be imported if needed (and also they don’t change their meaning). TEP-enabled properties are not available anywhere to import. They have to be declared in each graph anew. That makes quite a difference w.r.t. user convenience. 
If the TEP proposal tried to change that it would have to provide some elaborate mechanisms and conventions:
- it would need to define naming conventions of how to refer to the TEP-variant of :measuermentOf (":TEPmeasurementOf" ?),
- it would need to reflect on the need to either make ontologies publish such properties themselves. E.g. it would need to persuade schema.org to add a TEP variant to each of its hundreds of properties. But that might still be the easy task compared to persuading hundreds of other ontzology providers to update their property definition. And Wikipedia, DBpedia etc etc. Good luck with that.
- or it would need to strive to provide them as a service, e.g. like sameAs.cc,
- it would need to educate users, that queries for a property should include a potential TEP-variant etc. 
In this respect the TEP proposal is a stub at best, as it doesn’t address any of these issues. This is certainly not the way to foster interoperability, which should be the aim of standardization.

> Your other point is that a "property [cannot] be used in the same graph on referentially opaque quoted triples AND on referentially transparent quoted triples." Yes, that's true as well. But what is the issue with it? Why would you want to use the very same property with two different meanings? To me, your complaint here is the same as complaining that one cannot use the same property (within the same graph) as both a transitive property and a non-transitive property. Why do you want to do that?

I find it a bit strange that you need to be presented an example for a case that in the abstract is really very clear and straightforward, but here you go. Imagine that population data is integrated from different sources, about different cities in Denmark, and some sources define :measuredOn as TEP, whereas others don’t. How do you deal with that?

Another case: a popular example for the need for referential opacity is Morning Star and Evening Star - both refrering to the planet Venus. In a set of statements about celestial bodies I might want to make sure that references to those two stars (planets rather, but well…) are always presented in syntactic accuracy. However I would also like references to all other planets in the solar system to be referentially transparent, for the usual reason: it improves interoperability, connectivity, integration - the core interest of the semantic web.

Yet another example would be that I annotate statements issued during an expert exchange of data on some highly disputed matter. Some of those annotations might warrant a syntactically exact representation of the annotated statement, because very specific identifiers were used and each detail matters. All other statements would profit far more from referentially transparent statement representation because their intended meaning doesn’t depend on the chosen vocabulary. 

My underlying intuition is that, like in natural language, the reason that we actually go through the trouble to employ quotation (e.g. painting quote marks in the air when speaking) are A) rare and B) dependent on content, i.e. the embedded triple itself. And that unsurprisingly leads to a completely different take on what a useful un-quotation mechanism might look like. Even provenance, which is often considered as benefitting from the proposed semantics, does IMO most of the time NOT require referential opacity, simply because most of time we are perfectly fine with and even esteem the fact that the semantic web is referentially transparent. If I said 

    << :Alice :bought :Car >> 

and you quote me as 

    << :alice :bought :automobile >> :said :Thomas

I’m perfectly fine with it - it’s what I meant, and yes, I said it. Again, the initial value proposition of the proposed semantics was that some use cases need it whereas all other use cases can easily work on referentially transparent triples. The TEP proposal turns that proposition upisde down: now everybody is supposed to do well with referentially opaque quoted triples and those that absolutely want to work with referentially transparent quoted triples have to put in a lot of effort.

Let me state again that the whole semantic web is referentially transparent, and for good reason: this design does not only improve interoperability and integration, it is a pre-requisite to make decentralized vocabularies work. And decentralization of such an essential tool as vocabularies is indeed a very important aspect of the architecture of the semantic web. So why suddenly assume that all annotations would profit from the opposite design? 

There is also a history to this: Named Graphs were defined in the 2005 paper by Carroll et al (vulgo: by Pat Hayes) as referentailly opaque. That design decision was central to the design of the Named Graph semantics and it was motivated in the paper extensively. The paper preceded most implementations and certainly SPARQL. Its authors, especially Pat himself, were highly respected. Yet, did anybody care? No. Did any of the implementations discussed in the RDF 1.1 WG actually implement named graphs as referentially opaque? No. The idea was ignored in practice because the need for such fidelity in representation is much too infrequent to make everybody bother. So where does your certainty stem from that referential opacity is a good default semantics for embedded triples and that the TEP can be so restrictive and still be sufficient?

I need to be blunt: I called this approach "madness" on the GitHub wiki and that’s admittedly not a nice way to put it, but really, I find it
- in principle misguided
- in practice extremely involved to the point of being completely impractical and
- very sloppy in the details.
Given that by a very conservative estimate at least 95% of use cases would require to use it if they bothered to take the semantics serious, it would be quite catastrophic if the WG standardized this approach.

If the proposed TEP mechanism was supposed to work at scale and not just for one tiny example case, it would need a lot more effort and polishing, and still it would add yet one more level of complexity to RDF-star. We already seem to have come to the point that it is accepted that occurrences need their own statement, differentiating them from the quoted triple, which is defined to refer to the type. So queries for annotations will have to branch to types and occurrences. With the TEP, queries will also have to account for properties and their TEP-siblings. That makes 4 (four) branches for each query on an annotation. RDF-star seems elegant to newcomers only because it cuts so many corners. Taken serious (i.e. obeying by the proposed semantics) it becomes a real sucker in practice. Consequently the semantics will not be taken serious. Consequently the semantics, and the TEP, are worthless.


I made an effort to develop an Occurrences vocabulary in summer 2021 [3], but after a promising start you and Pierre-Antoine managed to bury it in FUD. It boils down to adding a property to RDF-star that identifies a referentially transparent occurrence of an embedded triple, e.g.

    _:x rdf-star:referentiallyTransparentOccurrenceOf << :a :b :c >> ;
        :annotation :A .

This would provide an identifier that is equivalent to the subject of an RDF standard reification (at the time I was quite convinced that a reference to the snippet of RDF that contains the occurrence should be added, but I don’t consider that essential anymore). 
IMHO it still is the only workable approach towards annotations on referentially transparent triples if one buys into the proposed semantics of RDF-star or is forced to live with it as a W3C standard (which I still hope I’ll never be).


Best,
Thomas




[0] https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html#selective-ref-transparency
[1] https://github.com/w3c/rdf-star/issues/170
[2] https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html#selective-ref-opacity
[3] https://github.com/w3c/rdf-star/issues/169













> Olaf
> 
> Feb 1, 2023 22:14:04 Thomas Lörtsch <tl@rat.io>:
> 
>> 
>> 
>>> On 1. Feb 2023, at 17:58, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote:
>>> 
>>> 
>>> 
>>> Le 01/02/2023 à 17:55, Thomas Lörtsch a écrit :
>>> [...] Almost all of them would have to employ the TEP mechanism to be semantically sound. [...]
>>> 
>>> 
>>> Excuse my ignorance but what's TEP?
>> 
>> Transparency Enabling Properties, the semantic extension proposed by the CG to lift the syntactic constraints on quoted triples and implement referentially transparent quoted triples, see Sections 6.4.5 'Selective referential transparency' [0] and 7.2 'Extended vocabulary' [1] of the RDF-star CG report.
>> 
>> Note that the semantic extension has to be invoked per property per graph. Neither can a property be declared to always make quoted triples referentially transparent nor can one property be used in the same graph on referentially opaque quoted triples AND on referentially transparent quoted triples.
>> 
>> [0] https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Frdf-star%2Fcg-spec%2F2021-12-17.html%23selective-ref-transparency&data=05%7C01%7Colaf.hartig%40liu.se%7Cf3b81f50831b4ce1057308db04993e66%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638108828446915944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fyuhJaB72a7H9kj1ahiWJQerUYRuM6efFfovtECRbuE%3D&reserved=0
>> [1] https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Frdf-star%2Fcg-spec%2F2021-12-17.html%23extended-vocabulary&data=05%7C01%7Colaf.hartig%40liu.se%7Cf3b81f50831b4ce1057308db04993e66%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638108828446915944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K09Hg9%2B95WCCOha%2FAk8cNrLhQlZucVvO6Rja5CdZ5wg%3D&reserved=0
>> 
>> 
>>> 
>>> -- 
>>> Antoine Zimmermann
>>> École des Mines de Saint-Étienne
>>> 158 cours Fauriel
>>> CS 62362
>>> 42023 Saint-Étienne Cedex 2
>>> France
>>> Tél:+33(0)4 77 49 97 02
>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.emse.fr%2F~zimmermann%2F&data=05%7C01%7Colaf.hartig%40liu.se%7Cf3b81f50831b4ce1057308db04993e66%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638108828446915944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=H8A%2F92r6yUyU1Eo1bjAiXsoSNG3FuHQNZ8ebrrq8OxM%3D&reserved=0
Received on Thursday, 9 February 2023 00:13:55 UTC