Re: Blog post about "Provenance in RDF-star" from Anthony Moretti on 2022-02-09 (public-rdf-star@w3.org from February 2022)

From: Anthony Moretti <anthony.moretti@gmail.com>
Date: Wed, 9 Feb 2022 17:04:36 +1030
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Cc: public-rdf-star@w3.org
Message-ID: <CACusdfQC9=92ymmMbbMT3auyxQtCr9vzam=LBmqR7pPxvpqiFQ@mail.gmail.com>
Hi Pierre-Antoine

Yes, both of your interpretations are correct.

Which I read as
>
> "I am 20% confident that emp22 said that emp38 is an assistant designer"
> (confidence is about :accordingTo)
>
> while the intended meaning was
>
> "I am 20% confident in emp22's claim that emp38 is an assistant designer"
> (confidence is about :jobTitle)
>

In this case, in my view, the first results in the second. The final
confidence level a modeler chooses to assert about a statement can be
arrived at in any number of ways as they can choose to take into account
any surrounding information they felt was relevant. In the above case, if
there was no information other than what was stated, it wouldn't be
unreasonable for the modeler to take the 20% confidence that Employee22
made the claim and, based on that, make the assertion themselves with 20%
confidence, which is your intended meaning.

The "three ideas" thread (as well as your answer below) is about a
> potential alternative model, where time, space and confidence would become
> core components of statements. Of course, in this different model, the
> examples of the post would be modeled differently. And yes, the pain points
> in one model will not be the same as the pain points in the other model.
>

Yes, so I guess that's the point I'm trying to make. It's a structural
argument about where time, space, and confidence/certainty should be
expressed.

I think most people would agree that it would be a bad idea to model like
the following:

{
    type: Fraction,
    numerator: 1,
}
    denominator: 2

Or

{
    type: GeoCoordinate,
    latitude: 38.9,
}
    longitude: -77.0

Or

{
    type: PostalAddress,
    street: 1600 Pennsylvania Avenue NW,
}
    city: Washington

A better way would be:

{
    type: Fraction,
    numerator: 1,
    denominator: 2,
}

{
    type: GeoCoordinate,
    latitude: 38.9,
    longitude: -77.0,
}

{
    type: PostalAddress,
    street: 1600 Pennsylvania Avenue NW,
    city: Washington,
}

I think the same idea applies to a proposition, which is something that
should be true, false, or somewhere in-between if we're allowed to express
confidence/certainty.

In my view, it's a bad idea to model it like:

{
    type: Proposition,
    subject: S,
    relation: R,
    object: O,
}
    temporalBound: [T1, T2),
    spatialBound: SB,
    certainty: C

A better way, it seems to me, would be:

{
    type: Proposition,
    subject: S,
    relation: R,
    object: O,
    temporalBound: [T1, T2),
    spatialBound: SB,
    certainty: C,
}

In the above, the proposition is structurally self-contained and can be
given a true/false/somewhere in-between value.

unless I missed something, we have not yet discussed about how your
> alternative model could be reasoned about. Nor should be do that before a
> clear semantics of the statements is described...
>

When I say "easy to reason about" I'm mainly talking about a person being
able to look at a model and easily understand how the temporal and spatial
bounds apply. I wouldn't know where to start to write a clear semantics, my
only hope is that the ideas I've offered so far might be simple enough for
most people to understand and discuss already. If there's more interest I
could have a go at a semantics, it's not something I've ever done before
though and I'm happy for anyone else to have a go, but is a full semantics
already needed to have this early discussion?

Regards
Anthony

On Tue, Feb 8, 2022 at 12:59 AM Pierre-Antoine Champin <
pierre-antoine.champin@ercim.eu> wrote:

> Hi Anthony,
> On 27/01/2022 10:13, Anthony Moretti wrote:
>
> Hi Pierre-Antoine
>
> Thank you for the post.
>
> It is however important to understand that this basic design has
>> limitations. Namely, each statement made about a particular triple must be
>> interpretable independently of the other statements made about that triple.
>> (This is actually a general feature of RDF, not just RDF-star: two
>> statements about the same subject must always be interpretable
>> independently from each other. On the open web, if we assume that another
>> triple that we have not yet discovered could change the meaning of the
>> triples that we know, then reasoning with what we know would become much
>> more hazardous.)
>
>
> I don't know if I'm right, but I feel like this is highly related to the
> idea of statements being "simply true", as people have put it. To go back
> to the first email in the "Three ideas" thread, I feel like time, space,
> and confidence/certainty are the three annotations that make any statement
> "simply true", i.e. make any statement able to stand alone as a complete
> unit of description. It's definitely possible that I haven't thought deeply
> enough about this, if so, maybe someone can show me a counterexample where
> all those annotations are specified and the entire statement is not "simply
> true". But if I'm right, and these annotations are special, they should be
> given precedence and asserted first to avoid ambiguity like that described
> in the blog post.
>
> I think the two discussions are orthogonal.
>
> The post is about using the RDF-star model, as specified by the CG report,
> and about the common pitfalls that people should avoid with this model.
>
> The "three ideas" thread (as well as your answer below) is about a
> potential alternative model, where time, space and confidence would become
> core components of statements. Of course, in this different model, the
> examples of the post would be modeled differently. And yes, the pain points
> in one model will not be the same as the pain points in the other model.
>
> Note also that I disagree with the way you rephrase the examples from the
> blog post into your new model, see below.
>
>
> To further simplify things, time, space, and certainty could be three
> positions, rather than four, if the temporal range is given typical "range"
> syntax:
>
> Subject Relation Object [T1, T2] SpatialBound Certainty
>
> Any datatype that makes sense for certainty/confidence can be used in the
> last position.
>
> Then, with those three positions, the examples in the blog post could be
> modified like so:
>
> *Original extended first example:*
>
> << :employee38 :jobTitle "Assistant Designer" >>
>     :accordingTo :employee22, :employee38 ;
>     :confidence 0.8 .
>
> Would become:
>
> << :employee38 :jobTitle "Assistant Designer" _ _ "0.8"^^ex:confidence >>
>     :accordingTo :employee22, :employee38 .
>
> It may be me misinterpreting your alternative model, but I don't think
> that the two examples are conveying the same meaning. Let me rephrase in
> plain English how I interpret them.
>
> 1st example :
>
> "I am 80% certain that emp38 is an assistant designer, as claimed by emp22
> and emp38"
> (confidence is asserted by me)
>
> 2nd example
>
> "emp22 and emp38 both claim that they are 80% certain that emp38 is an
> assistant designer"
> (confidence is asserted by emp22 and emp38 respectively)
>
>
> *Original problematic example:*
>
> << :employee38 :jobTitle "Assistant Designer" >>
>     :accordingTo :employee22; :confidence 0.2 .
>     # we don’t trust employee22 about someone else’s job title
>
> << :employee38 :jobTitle "Assistant Designer" >>
>     :accordingTo :employee38; :confidence 0.8 .
>     # we quite trust employee38 about their own job title
>
> Would become:
>
> << :employee38 :jobTitle "Assistant Designer" >>
>     :accordingTo :employee22 _ _ "0.2"^^ex:confidence .
>
> Which I read as
>
> "I am 20% confident that emp22 said that emp38 is an assistant designer"
> (confidence is about :accordingTo)
>
> while the intended meaning was
>
> "I am 20% confident in emp22's claim that emp38 is an assistant designer"
> (confidence is about :jobTitle)
>
>
> << :employee38 :jobTitle "Assistant Designer" >>
>     :accordingTo :employee38 _ _ "0.8"^^ex:confidence .
>
> *It's easy to see what a more complex example might look like:*
>
> << :employee38 :jobTitle "Assistant Designer" _ _ "0.8"^^ex:confidence >>
>     :accordingTo :employee22 _ _ "0.2"^^ex:confidence .
>
> << :employee38 :jobTitle "Assistant Designer" _ _ "0.8"^^ex:confidence >>
>     :accordingTo :employee38 _ _ "0.8"^^ex:confidence .
>
> Those three positions can be added to the other statement types I
> described, and the whole system becomes consistent, scalable, and easy to
> reason about.
>
> unless I missed something, we have not yet discussed about how your
> alternative model could be reasoned about. Nor should be do that before a
> clear semantics of the statements is described...
>
>   best
>
>
> Apologies for being repetitive, but I really think the holistic approach
> has a lot of benefits.
>
> Regards
> Anthony
>
> On Thu, Jan 27, 2022 at 8:13 AM Kingsley Idehen <kidehen@openlinksw.com>
> wrote:
>
>> On 1/26/22 3:34 PM, Pierre-Antoine Champin wrote:
>>
>> Dear all,
>>
>> following a discussion during our two last calls, I published a post
>> about "Provenance in RDF-star":
>>
>> https://www.w3.org/community/rdf-dev/2022/01/26/provenance-in-rdf-star/
>>
>> quoting the intro:
>>
>> > In this post, we present some lessons learned by the group through
>> discussions and exchanges. This is meant to give some insight about the
>> rationale behind RDF-star, and some guidelines about how to best use it for
>> modeling provenance data.
>>
>> Many thanks to all the participants of the RDF-star group for their
>> reviews and feedback on this post.
>>
>>   pa
>>
>>
>> Hi Pierre-Antoine,
>>
>> An opening example in that blog post:
>>
>> PREFIX : <http://www.example.org/> <http://www.example.org/>
>>
>> << :employee38 :jobTitle "Assistant Designer" >>
>>     :accordingTo :employee22, :employee38 ;
>>     :confidence 0.8 .
>>
>> My variant using RDF as it exists.
>>
>> ## RDF-Turtle Start ##
>>
>> # PREFIX : <http://www.example.org/> <http://www.example.org/>
>> PREFIX schema: <http://schema.org/> <http://schema.org/>
>> PREFIX : <#>
>>
>> [
>>   :jobTitle "Assistant Designer" ;
>>   schema:identifier :employee38  # if desired,
>> inverse-functional-property semantics can be applied to the
>> schema:identifier relation.
>> ] :accordingTo :employee22, :employee38 ;
>>   :confidence 0.8 .
>>
>> ## RDF-Turtle End ##
>>
>> What is the difference between both? Is it that your RDF-Star example
>> expresses a statement (*utterance*) while mine expresses a fact (
>> *proposition*)?
>>
>> "A *statement* occurs at a particular time and place.  But a *fact* is
>> independent of time and place." [1]
>>
>>
>> Links:
>>
>> [1]
>> https://groups.google.com/d/msgid/ontolog-forum/d37df77c62aa4cdab97ad92a30821600%40bestweb.net
>> -- John F. Sowa post about statements and facts
>>
>>
>> --
>> Regards,
>>
>> Kingsley Idehen 
>> Founder & CEO
>> OpenLink Software
>> Home Page: http://www.openlinksw.com
>> Community Support: https://community.openlinksw.com
>> Weblogs (Blogs):
>> Company Blog: https://medium.com/openlink-software-blog
>> Virtuoso Blog: https://medium.com/virtuoso-blog
>> Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>
>> Personal Weblogs (Blogs):
>> Medium Blog: https://medium.com/@kidehen
>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>               http://kidehen.blogspot.com
>>
>> Profile Pages:
>> Pinterest: https://www.pinterest.com/kidehen/
>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>> Twitter: https://twitter.com/kidehen
>> Google+: https://plus.google.com/+KingsleyIdehen/about
>> LinkedIn: http://www.linkedin.com/in/kidehen
>>
>> Web Identities (WebID):
>> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>         : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>
>>
Received on Wednesday, 9 February 2022 06:35:04 UTC