Re: Blog post about "Provenance in RDF-star" from Pierre-Antoine Champin on 2022-02-09 (public-rdf-star@w3.org from February 2022)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Wed, 9 Feb 2022 14:23:13 +0100
To: Anthony Moretti <anthony.moretti@gmail.com>
Cc: public-rdf-star@w3.org
Message-ID: <d0784a48-abe2-ff94-dacb-f171e4e90a6b@ercim.eu>

On 09/02/2022 07:34, Anthony Moretti wrote:
> Hi Pierre-Antoine
>
> Yes, both of your interpretations are correct.
>
>     Which I read as
>
>     "I am 20% confident that emp22 said that emp38 is an assistant
>     designer"
>     (confidence is about :accordingTo)
>
>     while the intended meaning was
>
>     "I am 20% confident in emp22's claim that emp38 is an assistant
>     designer"
>     (confidence is about :jobTitle)
>
>
> In this case, in my view, the first results in the second.

Possibly -- depending on the model of "confidence" that one uses.

But the fact that one statement entails the other does not mean that the 
two statements are the same (nor equivalent). So we need to be precise 
about what the "raw" meaning of a statement is, before any reasoning is 
involved.

> The final confidence level a modeler chooses to assert about a 
> statement can be arrived at in any number of ways as they can choose 
> to take into account any surrounding information they felt was 
> relevant. In the above case, if there was no information other than 
> what was stated, it wouldn't be unreasonable for the modeler to take 
> the 20% confidence that Employee22 made the claim and, based on that, 
> make the assertion themselves with 20% confidence, which is your 
> intended meaning.
>
>     The "three ideas" thread (as well as your answer below) is about a
>     potential alternative model, where time, space and confidence
>     would become core components of statements. Of course, in this
>     different model, the examples of the post would be modeled
>     differently. And yes, the pain points in one model will not be the
>     same as the pain points in the other model.
>
>
> Yes, so I guess that's the point I'm trying to make. It's a structural 
> argument about where time, space, and confidence/certainty should be 
> expressed.
>
> I think most people would agree that it would be a bad idea to model 
> like the following:
>
> {
>     type: Fraction,
>     numerator: 1,
> }
>     denominator: 2
>
> Or
>
> {
>     type: GeoCoordinate,
>     latitude: 38.9,
> }
>     longitude: -77.0
>
> Or
>
> {
>     type: PostalAddress,
>     street: 1600 Pennsylvania Avenue NW,
> }
>     city: Washington
>
> A better way would be:
>
> {
>     type: Fraction,
>     numerator: 1,
>     denominator: 2,
> }
>
> {
>     type: GeoCoordinate,
>     latitude: 38.9,
>     longitude: -77.0,
> }
>
> {
>     type: PostalAddress,
>     street: 1600 Pennsylvania Avenue NW,
>     city: Washington,
> }
>
> I think the same idea applies to a proposition, which is something 
> that should be true, false, or somewhere in-between if we're allowed 
> to express confidence/certainty.
>
> In my view, it's a bad idea to model it like:
>
> {
>     type: Proposition,
>     subject: S,
>     relation: R,
>     object: O,
> }
>     temporalBound: [T1, T2),
>     spatialBound: SB,
>     certainty: C
>
> A better way, it seems to me, would be:
>
> {
>     type: Proposition,
>     subject: S,
>     relation: R,
>     object: O,
>     temporalBound: [T1, T2),
>     spatialBound: SB,
>     certainty: C,
> }
>
> In the above, the proposition is structurally self-contained and can 
> be given a true/false/somewhere in-between value.
>
>     unless I missed something, we have not yet discussed about how
>     your alternative model could be reasoned about. Nor should be do
>     that before a clear semantics of the statements is described...
>
>
> When I say "easy to reason about" I'm mainly talking about a person 
> being able to look at a model and easily understand how the temporal 
> and spatial bounds apply. I wouldn't know where to start to write a 
> clear semantics, my only hope is that the ideas I've offered so far 
> might be simple enough for most people to understand and discuss 
> already. If there's more interest I could have a go at a semantics, 
> it's not something I've ever done before though and I'm happy for 
> anyone else to have a go, but is a full semantics already needed to 
> have this early discussion?
>
> Regards
> Anthony
>
> On Tue, Feb 8, 2022 at 12:59 AM Pierre-Antoine Champin 
> <pierre-antoine.champin@ercim.eu> wrote:
>
>     Hi Anthony,
>
>     On 27/01/2022 10:13, Anthony Moretti wrote:
>>     Hi Pierre-Antoine
>>
>>     Thank you for the post.
>>
>>         It is however important to understand that this basic design
>>         has limitations. Namely, each statement made about a
>>         particular triple must be interpretable independently of the
>>         other statements made about that triple. (This is actually a
>>         general feature of RDF, not just RDF-star: two statements
>>         about the same subject must always be interpretable
>>         independently from each other. On the open web, if we assume
>>         that another triple that we have not yet discovered could
>>         change the meaning of the triples that we know, then
>>         reasoning with what we know would become much more hazardous.)
>>
>>
>>     I don't know if I'm right, but I feel like this is highly related
>>     to the idea of statements being "simply true", as people have put
>>     it. To go back to the first email in the "Three ideas" thread, I
>>     feel like time, space, and confidence/certainty are the three
>>     annotations that make any statement "simply true", i.e. make any
>>     statement able to stand alone as a complete unit of description.
>>     It's definitely possible that I haven't thought deeply enough
>>     about this, if so, maybe someone can show me a counterexample
>>     where all those annotations are specified and the entire
>>     statement is not "simply true". But if I'm right, and these
>>     annotations are special, they should be given precedence and
>>     asserted first to avoid ambiguity like that described in the blog
>>     post.
>
>     I think the two discussions are orthogonal.
>
>     The post is about using the RDF-star model, as specified by the CG
>     report, and about the common pitfalls that people should avoid
>     with this model.
>
>     The "three ideas" thread (as well as your answer below) is about a
>     potential alternative model, where time, space and confidence
>     would become core components of statements. Of course, in this
>     different model, the examples of the post would be modeled
>     differently. And yes, the pain points in one model will not be the
>     same as the pain points in the other model.
>
>     Note also that I disagree with the way you rephrase the examples
>     from the blog post into your new model, see below.
>
>>
>>     To further simplify things, time, space, and certainty could be
>>     three positions, rather than four, if the temporal range is given
>>     typical "range" syntax:
>>
>>     Subject Relation Object [T1, T2] SpatialBound Certainty
>>
>>     Any datatype that makes sense for certainty/confidence can be
>>     used in the last position.
>>
>>     Then, with those three positions, the examples in the blog post
>>     could be modified like so:
>>
>>     *Original extended first example:*
>>
>>     << :employee38 :jobTitle "Assistant Designer" >>
>>         :accordingTo :employee22, :employee38 ;
>>         :confidence 0.8 .
>>
>>     Would become:
>>
>>     << :employee38 :jobTitle "Assistant Designer" _ _
>>     "0.8"^^ex:confidence >>
>>         :accordingTo :employee22, :employee38 .
>
>     It may be me misinterpreting your alternative model, but I don't
>     think that the two examples are conveying the same meaning. Let me
>     rephrase in plain English how I interpret them.
>
>     1st example :
>
>     "I am 80% certain that emp38 is an assistant designer, as claimed
>     by emp22 and emp38"
>     (confidence is asserted by me)
>
>     2nd example
>
>     "emp22 and emp38 both claim that they are 80% certain that emp38
>     is an assistant designer"
>     (confidence is asserted by emp22 and emp38 respectively)
>
>>
>>     *Original problematic example:*
>>
>>     << :employee38 :jobTitle "Assistant Designer" >>
>>         :accordingTo :employee22; :confidence 0.2 .
>>         # we don’t trust employee22 about someone else’s job title
>>
>>     << :employee38 :jobTitle "Assistant Designer" >>
>>         :accordingTo :employee38; :confidence 0.8 .
>>         # we quite trust employee38 about their own job title
>>
>>     Would become:
>>
>>     << :employee38 :jobTitle "Assistant Designer" >>
>>         :accordingTo :employee22 _ _ "0.2"^^ex:confidence .
>     Which I read as
>
>     "I am 20% confident that emp22 said that emp38 is an assistant
>     designer"
>     (confidence is about :accordingTo)
>
>     while the intended meaning was
>
>     "I am 20% confident in emp22's claim that emp38 is an assistant
>     designer"
>     (confidence is about :jobTitle)
>
>>
>>     << :employee38 :jobTitle "Assistant Designer" >>
>>         :accordingTo :employee38 _ _ "0.8"^^ex:confidence .
>>
>>     *It's easy to see what a more complex example might look like:*
>>
>>     << :employee38 :jobTitle "Assistant Designer" _ _
>>     "0.8"^^ex:confidence >>
>>         :accordingTo :employee22 _ _ "0.2"^^ex:confidence .
>>
>>     << :employee38 :jobTitle "Assistant Designer" _ _
>>     "0.8"^^ex:confidence >>
>>         :accordingTo :employee38 _ _ "0.8"^^ex:confidence .
>>
>>     Those three positions can be added to the other statement types I
>>     described, and the whole system becomes consistent, scalable, and
>>     easy to reason about.
>
>     unless I missed something, we have not yet discussed about how
>     your alternative model could be reasoned about. Nor should be do
>     that before a clear semantics of the statements is described...
>
>       best
>
>
>>     Apologies for being repetitive, but I really think the holistic
>>     approach has a lot of benefits.
>>
>>     Regards
>>     Anthony
>>
>>     On Thu, Jan 27, 2022 at 8:13 AM Kingsley Idehen
>>     <kidehen@openlinksw.com> wrote:
>>
>>         On 1/26/22 3:34 PM, Pierre-Antoine Champin wrote:
>>>         Dear all,
>>>
>>>         following a discussion during our two last calls, I
>>>         published a post about "Provenance in RDF-star":
>>>
>>>         https://www.w3.org/community/rdf-dev/2022/01/26/provenance-in-rdf-star/

>>>
>>>
>>>         quoting the intro:
>>>
>>>         > In this post, we present some lessons learned by the group
>>>         through discussions and exchanges. This is meant to give
>>>         some insight about the rationale behind RDF-star, and some
>>>         guidelines about how to best use it for modeling provenance
>>>         data.
>>>
>>>         Many thanks to all the participants of the RDF-star group
>>>         for their reviews and feedback on this post.
>>>
>>>           pa
>>>
>>
>>         Hi Pierre-Antoine,
>>
>>         An opening example in that blog post:
>>
>>         PREFIX : <http://www.example.org/> <http://www.example.org/>
>>
>>         << :employee38 :jobTitle "Assistant Designer" >>
>>             :accordingTo :employee22, :employee38 ;
>>             :confidence 0.8 .
>>
>>         My variant using RDF as it exists.
>>
>>         ## RDF-Turtle Start ##
>>
>>         # PREFIX : <http://www.example.org/> <http://www.example.org/>
>>         PREFIX schema: <http://schema.org/> <http://schema.org/>
>>         PREFIX : <#>
>>
>>         [
>>           :jobTitle "Assistant Designer" ;
>>           schema:identifier :employee38  # if desired,
>>         inverse-functional-property semantics can be applied to the
>>         schema:identifier relation.
>>         ] :accordingTo :employee22, :employee38 ;
>>           :confidence 0.8 .
>>
>>         ## RDF-Turtle End ##
>>
>>         What is the difference between both? Is it that your RDF-Star
>>         example expresses a statement (*utterance*) while mine
>>         expresses a fact (*proposition*)?
>>
>>         "A *statement* occurs at a particular time and place.  But a
>>         *fact* is independent of time and place." [1]
>>
>>
>>         Links:
>>
>>         [1]
>>         https://groups.google.com/d/msgid/ontolog-forum/d37df77c62aa4cdab97ad92a30821600%40bestweb.net

>>         -- John F. Sowa post about statements and facts
>>
>>
>>         -- 
>>         Regards,
>>
>>         Kingsley Idehen 
>>         Founder & CEO
>>         OpenLink Software
>>         Home Page:http://www.openlinksw.com

>>         Community Support:https://community.openlinksw.com

>>         Weblogs (Blogs):
>>         Company Blog:https://medium.com/openlink-software-blog

>>         Virtuoso Blog:https://medium.com/virtuoso-blog

>>         Data Access Drivers Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

>>
>>         Personal Weblogs (Blogs):
>>         Medium Blog:https://medium.com/@kidehen

>>         Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/

>>                        http://kidehen.blogspot.com

>>
>>         Profile Pages:
>>         Pinterest:https://www.pinterest.com/kidehen/

>>         Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen

>>         Twitter:https://twitter.com/kidehen

>>         Google+:https://plus.google.com/+KingsleyIdehen/about

>>         LinkedIn:http://www.linkedin.com/in/kidehen

>>
>>         Web Identities (WebID):
>>         Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i

>>                  :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

>>
Attachments

application/pgp-keys attachment: OpenPGP public key
Received on Wednesday, 9 February 2022 13:23:20 UTC