Re: gap analysis (input regarding PML) from Paulo Pinheiro da Silva on 2010-08-06 (public-xg-prov@w3.org from August 2010)

From: Paulo Pinheiro da Silva <paulo@utep.edu>
Date: Fri, 6 Aug 2010 05:29:13 -0600
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: Paul Groth <pgroth@gmail.com>, "public-xg-prov@w3.org" <public-xg-prov@w3.org>, "Arora, Jitin BTE" <jarora@miners.utep.edu>, Tim Lebo <lebot@rpi.edu>, "Deborah L. McGuinness" <dlm@cs.rpi.edu>
Message-ID: <4C5BF209.2020507@utep.edu>
Dear Luc,

Your answer below should definitely facilitate the reading of the papers 
listed by Paul.

Thank you very much,
Paulo.

> Paolo,
>
> Paul never said there was no solution out there. There is some (including
> pml,pstructure,opm,provenir, prov voc, etc, etc,) that address, more or
> less, technical
> gaps that have been identified. So, we're not ignoring the state of the
> art, far from it.
>
> However, *NONE* is widespread to the point that we found it on every
> desktop/handheld/web service!
>
> Hence, Paul is making the case for the need for a standard in this area.
>
> To answer your specific question, OPM artifacts are linked to "things"
> by means
> of the property "value". Things can be serialized as immediate values or
> passed by
> reference, and referred to by URIs.
>
> The pstructure follows the same approach, with a difference: it takes a
> message oriented
> view of the world (where any information is in "messages" between
> parties), and uses
> a structured key to refer to information in messages.  This key can also
> be expressed as
> a URI.
>
>
> So, both of them  can "connect derivation traces to information sources".
>
> I hope it helps,
> Cheers,
> Luc
>
> On 08/06/2010 10:13 AM, Paulo Pinheiro da Silva wrote:
>> Dear Luc and Paul,
>>
>> Thank you very much for your comments.
>>
>> I consider the connection between derivation traces and information
>> sources to be essential to support some of the identified requirements
>> for a “common” representation of provenance. For instance, I cannot see
>> how provenance can be used to support trust recommendations and result
>> understanding if one cannot use provenance to trace back information
>> sources used to derive a given result.
>>
>> In this case, I would like to refer back to my original message where I
>> identified some aspects of PML that I believe are not covered in OPM,
>> including in this case a supporting infrastructure.
>>
>> As a response to my message, Paul cited a European provenance project
>> that according to Luc is based on p-streucture, which pre-dates OPM. The
>> information that p-structure pre-dates OPM, however, is not enough for
>> me to know whether p-structure provides a solution for connecting
>> derivation traces to information sources that I cannot see in OPM of if
>> it does, why the connection was not propagated to OPM. Also, it does not
>> clarify the relation between p-structure/OPM and the technical issues in
>> the original gap analysis.
>>
>> So, one may be asking what is the relevance of my questions above
>> considering that we should focus in a common representation for
>> provenance. I would like to remind the group’s effort of mapping other
>> provenance notations to OPM, somehow implying that OPM provides the
>> basic constructs to support a common provenance notation. So, which
>> other important provenance aspects are we leaving out in our mapping?
>> Also, how can we create a common understanding of provenance aspects not
>> covered in OPM?
>>
>> Unfortunately I will not be able to attend the meeting this week. In
>> fact, I am very much interested to know some answers for my questions.
>>
>> Many thanks,
>> Paulo.
>>
>>> Paolo,
>>>
>>> The papers that Paul cited can be downloaded from www.pasoa.org (they
>>> are also on Mendeley).
>>> The p-structure model pre-dates OPM.
>>>
>>> Paul's point (I believe) is that many of the issues raised in the gap
>>> analysis were
>>> addressed by this model, but it is not widely deployed.
>>>
>>> Luc
>>>
>>> On 05/08/10 21:30, Paulo Pinheiro da Silva wrote:
>>>> Hi Paul,
>>>> Thank you very much for your prompt response.
>>>>
>>>> I am glad to see that we all agree that there is an urging need for a
>>>> common provenance standard. My  understanding is that the incubator
>>>> group is paving the way for the development of such standard.
>>>>
>>>> Regarding your message, I would like to better understand the technical
>>>> aspects of your gap analysis and to learn from it.  So, following your
>>>> mention of the European provenance project please let me know the
>>>> following:
>>>>
>>>>      1)Are you saying that one cannot see the technical issues of your
>>>> gap analysis in the European provenance project?
>>>>
>>>>      2) If the answer for (1) is yes, how can we learn from this
>>>> project?
>>>>
>>>>      3) Is OPM used in the European project?
>>>>
>>>>      4) If the answer for (3) is yes, I would like to understand how OPM
>>>> artifacts are tight to sources, how sources are identified, and how
>>>> provenance information about sources is represented;
>>>>
>>>>      5) If the answer for (3) is no, which provenance representation
>>>> language is used?
>>>>
>>>> In other words, I need to better understand the gap analysis (viz., the
>>>> points behind the analysis) and I believe we should not start from the
>>>> assumption that we don’t know anything about provenance (that
>>>> appears to
>>>> be the motivation for us to write a related work section).
>>>>
>>>> Many thanks,
>>>> Paulo.
>>>>
>>>> On 8/5/2010 1:08 PM, Paul Groth wrote:
>>>>> Hi Paulo,
>>>>>
>>>>> Thanks for the message. I think the important thing here is the word
>>>>> "common" in what I wrote. By way of illustration...
>>>>>
>>>>> As part of the  EU Provenance Project [1], we also designed and
>>>>> implemented an Architecture for Provenance Systems [2, 3]. This
>>>>> architecture included a data model, the p-structure [4] that
>>>>> allowed for
>>>>> the distributed linking and storage of provenance. It specified
>>>>> protocols for querying provenance information [5,6] and recording
>>>>> it as
>>>>> well. This was designed to work in a scalable setting [7].
>>>>>
>>>>> Obviously, I could go into more detail, this little description is
>>>>> just
>>>>> to point out that I _agree_ with you that there are solutions for many
>>>>> of these problems. However, these solutions are _not_ common and
>>>>> widely
>>>>> deployed. Where widely deployed = things like trackbacks, html, and
>>>>> probably dublin core and RDFa. The point is that while solutions exist
>>>>> within the research  community (and some in business), they are by no
>>>>> means common or standard.
>>>>>
>>>>> This is exactly why, personally, I think the W3C should have a
>>>>> standards
>>>>> committee devoted to provenance. There are enough commonalities
>>>>> between
>>>>> provenance technologies that having a standard would help push
>>>>> adoption
>>>>> of provenance on the Web. Furthermore, without a standard it makes it
>>>>> difficult to implement effectively something like the News Aggregator
>>>>> Scenario over the whole of the web.
>>>>>
>>>>> Cheers,
>>>>> Paul
>>>>>
>>>>>
>>>>> [1] http://www.gridprovenance.org
>>>>> [2] http://eprints.ecs.soton.ac.uk/13216/
>>>>> [3] Moreau, Luc and Groth, Paul and Miles, Simon and Vazquez,
>>>>> Javier and
>>>>> Jiang, Sheng and Munroe, Steve and Rana, Omer and Schreiber,
>>>>> Andreas and
>>>>> Tan, Victor and Varga, Laszlo (2007) The Provenance of Electronic
>>>>> Data.
>>>>> Communications of the ACM, 51 (4). pp. 52-58.
>>>>> [4] Paul Groth, Simon Miles, and Luc Moreau. A Model of Process
>>>>> Documentation to Determine Provenance in Mash-ups. Transactions on
>>>>> Internet Technology (TOIT), 9(1):1-31, 2009.
>>>>> [5] Simon Miles, Paul Groth, Steve Munroe, Sheng Jiang , Thibaut
>>>>> Assandri, and Luc Moreau. Extracting Causal Graphs from an Open
>>>>> Provenance Data Model. Concurrency and Computation: Practice and
>>>>> Experience, 2007.
>>>>> [6] Miles, Simon (2006) Electronically Querying for the Provenance of
>>>>> Entities. In: Proceedings of the International Provenance and
>>>>> Annotation
>>>>> Workshop, May 2006, Chicago, USA.
>>>>> [7] Groth, Paul and Miles, Simon and Fang, Weijian and Wong, Sylvia C.
>>>>> and Moreau, Luc (2005) Recording and Using Provenance in a Protein
>>>>> Compressibility Experiment. In: Proceedings of the 14th IEEE
>>>>> International Symposium on High Performance Distributed Computing
>>>>> (HPDC
>>>>> 2005). Item not available online.
>>>>>
>>>>>
>>>>> Paulo Pinheiro da Silva wrote:
>>>>>> Paul-- Thank you very much for your message.
>>>>>>
>>>>>> All-- I agree with Paul’s statement that there is not well-establish
>>>>>> guidelines for using/adopting provenance solutions and this is a part
>>>>>> of his message that I would like to see further discussion.
>>>>>> I like Luc’s suggestion of discussing these gaps in terms of queries.
>>>>>> For instance, if you go to
>>>>>>
>>>>>>        http://trust.utep.edu/sparql-pml/query/example
>>>>>>
>>>>>> you will see a large collection of sparql-pml queries answering many
>>>>>> of the questions that require bridging the gaps identified in Paul’s
>>>>>> message. Please note that the queries in the URL above are standard
>>>>>> SPARQL queries based on the use of PML vocabulary. The results
>>>>>> used in
>>>>>> the URL come from a repository of PML provenance knowledge in the
>>>>>> domains of earth science (using actual NSF Earthscope and IRIS
>>>>>> data in
>>>>>> support of seismology and USGS data in support of earth magnetism),
>>>>>> astronomy (using actual NCAR data in support of space weather), and
>>>>>> logical proofs in support of TPTP.  Anyone can actually go to
>>>>>> http://trust.utep.edu/sparql-pml/query/index and write your own
>>>>>> queries or use the basic or advanced use interface
>>>>>> (http://trust.utep.edu/sparql-pml/search/index). [the SPARQL-PML
>>>>>> queries have been developed by Jitin Arora
>>>>>> (http://trust.utep.edu/~jarora/)]
>>>>>>
>>>>>> I would like to emphasize two aspects of PML that may need to be
>>>>>> highlighted so that the group can further appreciate our work and
>>>>>> understand how PML bridges the technical gaps in Paul’s message:
>>>>>>
>>>>>>          1) PML is a collection of three ontologies: PML-Provenance
>>>>>> (or
>>>>>> PML-P), PML-Justification (PML-J)  and PML-Trust (PML-T). In this
>>>>>> case, most of the provenance concepts in OPM map into concepts
>>>>>> described in PML-J ontology. This means that most of the elements in
>>>>>> PML-P are concepts not covered in OPM. I will go further and say that
>>>>>> many of these concepts have the role of tying artifacts to sources as
>>>>>> identified in Paul’s message;
>>>>>>
>>>>>>          2) If you revisit our publications, for instance [1], you
>>>>>> will
>>>>>> see that PML is just a component (the language component) of a bigger
>>>>>> infrastructure called Inference Web
>>>>>> http://inference-web.org
>>>>>>
>>>>>> In fact, most of the concerns highlighted by Luc in his message about
>>>>>> having a well-defined API, services and other infrastructural
>>>>>> features
>>>>>> in support of provenance are exactly the kinds of things that one
>>>>>> should be able to see in the Inference Web.
>>>>>>
>>>>>> With (1) and (2) in mind, I would like to stress one point: most of
>>>>>> the provenance infrastructure mentioned in (2) is in support of
>>>>>> PML-P.
>>>>>> In fact, PML-P is the part of the provenance that gets reused across
>>>>>> multiple justification traces and as such needs to be discovered,
>>>>>> aligned, augmented, etc.  Further, one of our major mistakes was to
>>>>>> put a lot of effort trying to come up with a registration mechanism
>>>>>> for PML-P documents called IW-Base [2]. Later on, after a meeting
>>>>>> with
>>>>>> Tim Berners-Lee and his W3C team, we learned that we would need to
>>>>>> distribute this approach, reason why we developed an Inference Web
>>>>>> search mechanism for provenance called IWSearch [3].  Again, anyone
>>>>>> can try IWSearch at http://onto.rpi.edu/iwsearch/
>>>>>>
>>>>>> I would like to say that PML and Inference Web were developed from
>>>>>> day
>>>>>> 1 to support "linking provenance between sites (i.e. trackback but
>>>>>> for
>>>>>> the whole web).: That is the reason why PML has always had the
>>>>>> following properties:
>>>>>>
>>>>>> a) PML identifiers are URIs
>>>>>> b) PML content is in RDF/OWL (used to be in DAML+OIL before OWL)
>>>>>> c) PML justifications are combinable/decomposable [4]
>>>>>> d) RDF/OWL links are used to connect PML documents
>>>>>>
>>>>>> Another point that I would like to make is that the PML-P part of PML
>>>>>> is the one where we connect to many other well-known pieces of
>>>>>> information that we have discussed in this group. For instance, when
>>>>>> it comes to publications, PML-P defines a publication as a kind of
>>>>>> information source and is where we connect PML to Dublin Core
>>>>>> attributes for publications.
>>>>>>
>>>>>> As you see, I have reasons to be uncomfortable with statements that
>>>>>> there is no language “for expressing provenance information that
>>>>>> captures processes as well as the other content dimensions” or "API
>>>>>> for obtaining/querying provenance information" or " for linking
>>>>>> provenance between sites (i.e. trackback but for the whole web)".
>>>>>>
>>>>>> Regarding this month of August, I am unfortunately unable to attend
>>>>>> the meeting this week and next week (will be flying during the
>>>>>> time of
>>>>>> the meetings). Also, I believe Deborah will not attend as well due to
>>>>>> personal reasons. Thus, I am asking Tim Lebo from RPI to represent us
>>>>>> and to collect any request you may have from PML so that we can
>>>>>> address them later in case Tim cannot answer your questions right
>>>>>> away.
>>>>>>
>>>>>> Many thanks,
>>>>>> Paulo.
>>>>>>
>>>>>> [the publications below are part of the provenance collection in
>>>>>> Mendeley]
>>>>>>
>>>>>> [1] Deborah L. McGuinness and Paulo Pinheiro da Silva. Explaining
>>>>>> Answers from the Semantic Web: The Inference Web Approach. Journal of
>>>>>> Web Semantics, Vol. 1 No. 4. October 2004, pages 397-413.
>>>>>>
>>>>>> [2] Deborah L. McGuinness, Paulo Pinheiro da Silva, Cynthia Chang.
>>>>>> IWBase: Provenance Metadata Infrastructure for Explaining and
>>>>>> Trusting
>>>>>> Answers from the Web. Technical Report KSL-04-07, Knowledge Systems
>>>>>> Laboratory, Stanford University, USA, 2004.
>>>>>>
>>>>>> [3] Paulo Pinheiro da Silva, Geoff Sutcliffe, Cynthia Chang, Li Ding,
>>>>>> Nick del Rio and Deborah McGuinness. Presenting TSTP Proofs with
>>>>>> Inference Web Tools. In Proceedings of IJCAR '08 Workshop on
>>>>>> Practical
>>>>>> Aspects of Automated Reasoning (PAAR-2008), August 2008, Sydney,
>>>>>> Australia.
>>>>>>
>>>>>> [4] Paulo Pinheiro da Silva and Deborah L. McGuinness. Combinable
>>>>>> Proof Fragments for the Web. Technical Report KSL-03-04, Knowledge
>>>>>> Systems Laboratory, Stanford University, USA, 2003.
>>>>>>
>>>>>> ([3] is not a paper specifically about IWSearch although  it briefly
>>>>>> describes the tool)
>>>>>>
>>>>>>> Thanks Paul for this proposal for the gap analysis.
>>>>>>> Twice you mention 'exposing' and i thought we could introduce
>>>>>>> 'querying'
>>>>>>> provenance too.
>>>>>>>
>>>>>>> Also, maybe the gaps could be structured in content vs apis.
>>>>>>> Like this, maybe.
>>>>>>>
>>>>>>>
>>>>>>> Content:
>>>>>>> - No common standard for expressing provenance information that
>>>>>>> captures
>>>>>>> processes as well as the other content dimensions.
>>>>>>> - No guidance for how existing standards can be put together to
>>>>>>> provide
>>>>>>> provenance (e.g. linking to identity).
>>>>>>>
>>>>>>> APIs (or protocols):
>>>>>>> - No common API for obtaining/querying provenance information
>>>>>>> - No guidance for how application developers should go about
>>>>>>> exposing
>>>>>>> provenance in their web systems.
>>>>>>> - No well-defined standard for linking provenance between sites
>>>>>>> (i.e.
>>>>>>> trackback but for the whole web).
>>>>>>>
>>>>>>>
>>>>>>> I also wondered whether they should be structured according to the
>>>>>>> provenance dimensions (so instead of API, break
>>>>>>> this into Use/Management).
>>>>>>>
>>>>>>> Luc
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 08/02/2010 12:04 PM, Paul Groth wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> As discussed at last week's telecon, I came up with some ideas
>>>>>>>> about
>>>>>>>> the gaps necessary to realize the News Aggregator Scenario. I've
>>>>>>>> put
>>>>>>>> these in the wiki and I append them below to help start the
>>>>>>>> discussion. Let me know what you think.
>>>>>>>>
>>>>>>>> Gap Analysis- News Aggregator
>>>>>>>>
>>>>>>>> For each step within the News Aggregator scenario, there are
>>>>>>>> existing
>>>>>>>> technologies or relevant research that could solve that step. For
>>>>>>>> example, once can properly insert licensing information into a
>>>>>>>> photo
>>>>>>>> using a creative commons license and the Extensible Metadata
>>>>>>>> Platform.
>>>>>>>> One can track the origin of tweets either through retweets or using
>>>>>>>> some extraction technologies within twitter. However, the
>>>>>>>> problem is
>>>>>>>> that across multiple sites there is no common format and api to
>>>>>>>> access
>>>>>>>> and understand provenance information whether it is explicitly or
>>>>>>>> implicitly determined. To inquire about retweets or inquire about
>>>>>>>> trackbacks one needs to use different apis and understand different
>>>>>>>> formats. Furthermore, there is no (widely deployed) mechanism to
>>>>>>>> point
>>>>>>>> to provenance information on another site. For example, once a
>>>>>>>> tweet
>>>>>>>> is traced to the end of twitter there is no way to follow where
>>>>>>>> that
>>>>>>>> tweet came from.
>>>>>>>>
>>>>>>>> Systems largely do not document the software by which changes were
>>>>>>>> made to data and what those pieces of software did to data.
>>>>>>>> However,
>>>>>>>> there are existing technologies that allow this to be done. For
>>>>>>>> example, in a domain specific setting, XMP allows the
>>>>>>>> transformations
>>>>>>>> of images to be documented. More general formats such as OPM,
>>>>>>>> and PML
>>>>>>>> allow this to be expressed but are not currently widely deployed.
>>>>>>>>
>>>>>>>> Finally, while many sites provide for identity and their are
>>>>>>>> several
>>>>>>>> widely deployed standards for identity (OpenId), there are no
>>>>>>>> existing
>>>>>>>> mechanisms for tying identity to objects or provenance traces. This
>>>>>>>> directly ties to the attribution of objects and provenance.
>>>>>>>>
>>>>>>>> Summing up there are 4 existing gaps to realizing the News
>>>>>>>> Aggregator
>>>>>>>> scenario:
>>>>>>>>
>>>>>>>> - No common standard to target for exposing and expressing
>>>>>>>> provenance
>>>>>>>> information that captures processes as well as the other content
>>>>>>>> dimensions.
>>>>>>>> - No well-defined standard for linking provenance between sites
>>>>>>>> (i.e.
>>>>>>>> trackback but for the whole web).
>>>>>>>> - No guidance for how exisiting standards can be put together to
>>>>>>>> provide provenance (e.g. linking to identity).
>>>>>>>> - No guidance for how application developers should go about
>>>>>>>> exposing
>>>>>>>> provenance in there web systems.
>>>>> .
>>>>>
>>>>
>>> .
>>>
>>
>>
>
> --
> Professor Luc Moreau
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          fax:   +44 23 8059 2865
> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>
> .
>
Received on Friday, 6 August 2010 11:29:47 UTC