W3C home > Mailing lists > Public > public-xg-prov@w3.org > August 2010

Re: gap analysis (input regarding PML)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Thu, 05 Aug 2010 22:10:20 +0100
Message-ID: <EMEW3|72b3d6345a04e0202f84e50fb3a3481bm74MAn08L.Moreau|ecs.soton.ac.uk|4C5B28BC.4050204@ecs.soton.ac.uk>
To: Paulo Pinheiro da Silva <paulo@utep.edu>
CC: Paul Groth <pgroth@gmail.com>, "public-xg-prov@w3.org" <public-xg-prov@w3.org>, "Arora, Jitin BTE" <jarora@miners.utep.edu>, Tim Lebo <lebot@rpi.edu>, "Deborah L. McGuinness" <dlm@cs.rpi.edu>

The papers that Paul cited can be downloaded from www.pasoa.org (they 
are also on Mendeley).
The p-structure model pre-dates OPM.

Paul's point (I believe) is that many of the issues raised in the gap 
analysis were
addressed by this model, but it is not widely deployed.


On 05/08/10 21:30, Paulo Pinheiro da Silva wrote:
> Hi Paul,
> Thank you very much for your prompt response.
> I am glad to see that we all agree that there is an urging need for a
> common provenance standard. My  understanding is that the incubator
> group is paving the way for the development of such standard.
> Regarding your message, I would like to better understand the technical
> aspects of your gap analysis and to learn from it.  So, following your
> mention of the European provenance project please let me know the 
> following:
>    1)Are you saying that one cannot see the technical issues of your
> gap analysis in the European provenance project?
>    2) If the answer for (1) is yes, how can we learn from this project?
>    3) Is OPM used in the European project?
>    4) If the answer for (3) is yes, I would like to understand how OPM
> artifacts are tight to sources, how sources are identified, and how
> provenance information about sources is represented;
>    5) If the answer for (3) is no, which provenance representation
> language is used?
> In other words, I need to better understand the gap analysis (viz., the
> points behind the analysis) and I believe we should not start from the
> assumption that we donít know anything about provenance (that appears to
> be the motivation for us to write a related work section).
> Many thanks,
> Paulo.
> On 8/5/2010 1:08 PM, Paul Groth wrote:
>> Hi Paulo,
>> Thanks for the message. I think the important thing here is the word
>> "common" in what I wrote. By way of illustration...
>> As part of the  EU Provenance Project [1], we also designed and
>> implemented an Architecture for Provenance Systems [2, 3]. This
>> architecture included a data model, the p-structure [4] that allowed for
>> the distributed linking and storage of provenance. It specified
>> protocols for querying provenance information [5,6] and recording it as
>> well. This was designed to work in a scalable setting [7].
>> Obviously, I could go into more detail, this little description is just
>> to point out that I _agree_ with you that there are solutions for many
>> of these problems. However, these solutions are _not_ common and widely
>> deployed. Where widely deployed = things like trackbacks, html, and
>> probably dublin core and RDFa. The point is that while solutions exist
>> within the research  community (and some in business), they are by no
>> means common or standard.
>> This is exactly why, personally, I think the W3C should have a standards
>> committee devoted to provenance. There are enough commonalities between
>> provenance technologies that having a standard would help push adoption
>> of provenance on the Web. Furthermore, without a standard it makes it
>> difficult to implement effectively something like the News Aggregator
>> Scenario over the whole of the web.
>> Cheers,
>> Paul
>> [1] http://www.gridprovenance.org
>> [2] http://eprints.ecs.soton.ac.uk/13216/
>> [3] Moreau, Luc and Groth, Paul and Miles, Simon and Vazquez, Javier and
>> Jiang, Sheng and Munroe, Steve and Rana, Omer and Schreiber, Andreas and
>> Tan, Victor and Varga, Laszlo (2007) The Provenance of Electronic Data.
>> Communications of the ACM, 51 (4). pp. 52-58.
>> [4] Paul Groth, Simon Miles, and Luc Moreau. A Model of Process
>> Documentation to Determine Provenance in Mash-ups. Transactions on
>> Internet Technology (TOIT), 9(1):1-31, 2009.
>> [5] Simon Miles, Paul Groth, Steve Munroe, Sheng Jiang , Thibaut
>> Assandri, and Luc Moreau. Extracting Causal Graphs from an Open
>> Provenance Data Model. Concurrency and Computation: Practice and
>> Experience, 2007.
>> [6] Miles, Simon (2006) Electronically Querying for the Provenance of
>> Entities. In: Proceedings of the International Provenance and Annotation
>> Workshop, May 2006, Chicago, USA.
>> [7] Groth, Paul and Miles, Simon and Fang, Weijian and Wong, Sylvia C.
>> and Moreau, Luc (2005) Recording and Using Provenance in a Protein
>> Compressibility Experiment. In: Proceedings of the 14th IEEE
>> International Symposium on High Performance Distributed Computing (HPDC
>> 2005). Item not available online.
>> Paulo Pinheiro da Silva wrote:
>>> Paul-- Thank you very much for your message.
>>> All-- I agree with Paulís statement that there is not well-establish
>>> guidelines for using/adopting provenance solutions and this is a part
>>> of his message that I would like to see further discussion.
>>> I like Lucís suggestion of discussing these gaps in terms of queries.
>>> For instance, if you go to
>>>      http://trust.utep.edu/sparql-pml/query/example
>>> you will see a large collection of sparql-pml queries answering many
>>> of the questions that require bridging the gaps identified in Paulís
>>> message. Please note that the queries in the URL above are standard
>>> SPARQL queries based on the use of PML vocabulary. The results used in
>>> the URL come from a repository of PML provenance knowledge in the
>>> domains of earth science (using actual NSF Earthscope and IRIS data in
>>> support of seismology and USGS data in support of earth magnetism),
>>> astronomy (using actual NCAR data in support of space weather), and
>>> logical proofs in support of TPTP.  Anyone can actually go to
>>> http://trust.utep.edu/sparql-pml/query/index and write your own
>>> queries or use the basic or advanced use interface
>>> (http://trust.utep.edu/sparql-pml/search/index). [the SPARQL-PML
>>> queries have been developed by Jitin Arora
>>> (http://trust.utep.edu/~jarora/)]
>>> I would like to emphasize two aspects of PML that may need to be
>>> highlighted so that the group can further appreciate our work and
>>> understand how PML bridges the technical gaps in Paulís message:
>>>        1) PML is a collection of three ontologies: PML-Provenance (or
>>> PML-P), PML-Justification (PML-J)  and PML-Trust (PML-T). In this
>>> case, most of the provenance concepts in OPM map into concepts
>>> described in PML-J ontology. This means that most of the elements in
>>> PML-P are concepts not covered in OPM. I will go further and say that
>>> many of these concepts have the role of tying artifacts to sources as
>>> identified in Paulís message;
>>>        2) If you revisit our publications, for instance [1], you will
>>> see that PML is just a component (the language component) of a bigger
>>> infrastructure called Inference Web
>>> http://inference-web.org
>>> In fact, most of the concerns highlighted by Luc in his message about
>>> having a well-defined API, services and other infrastructural features
>>> in support of provenance are exactly the kinds of things that one
>>> should be able to see in the Inference Web.
>>> With (1) and (2) in mind, I would like to stress one point: most of
>>> the provenance infrastructure mentioned in (2) is in support of PML-P.
>>> In fact, PML-P is the part of the provenance that gets reused across
>>> multiple justification traces and as such needs to be discovered,
>>> aligned, augmented, etc.  Further, one of our major mistakes was to
>>> put a lot of effort trying to come up with a registration mechanism
>>> for PML-P documents called IW-Base [2]. Later on, after a meeting with
>>> Tim Berners-Lee and his W3C team, we learned that we would need to
>>> distribute this approach, reason why we developed an Inference Web
>>> search mechanism for provenance called IWSearch [3].  Again, anyone
>>> can try IWSearch at http://onto.rpi.edu/iwsearch/
>>> I would like to say that PML and Inference Web were developed from day
>>> 1 to support "linking provenance between sites (i.e. trackback but for
>>> the whole web).: That is the reason why PML has always had the
>>> following properties:
>>> a) PML identifiers are URIs
>>> b) PML content is in RDF/OWL (used to be in DAML+OIL before OWL)
>>> c) PML justifications are combinable/decomposable [4]
>>> d) RDF/OWL links are used to connect PML documents
>>> Another point that I would like to make is that the PML-P part of PML
>>> is the one where we connect to many other well-known pieces of
>>> information that we have discussed in this group. For instance, when
>>> it comes to publications, PML-P defines a publication as a kind of
>>> information source and is where we connect PML to Dublin Core
>>> attributes for publications.
>>> As you see, I have reasons to be uncomfortable with statements that
>>> there is no language ďfor expressing provenance information that
>>> captures processes as well as the other content dimensionsĒ or "API
>>> for obtaining/querying provenance information" or " for linking
>>> provenance between sites (i.e. trackback but for the whole web)".
>>> Regarding this month of August, I am unfortunately unable to attend
>>> the meeting this week and next week (will be flying during the time of
>>> the meetings). Also, I believe Deborah will not attend as well due to
>>> personal reasons. Thus, I am asking Tim Lebo from RPI to represent us
>>> and to collect any request you may have from PML so that we can
>>> address them later in case Tim cannot answer your questions right away.
>>> Many thanks,
>>> Paulo.
>>> [the publications below are part of the provenance collection in
>>> Mendeley]
>>> [1] Deborah L. McGuinness and Paulo Pinheiro da Silva. Explaining
>>> Answers from the Semantic Web: The Inference Web Approach. Journal of
>>> Web Semantics, Vol. 1 No. 4. October 2004, pages 397-413.
>>> [2] Deborah L. McGuinness, Paulo Pinheiro da Silva, Cynthia Chang.
>>> IWBase: Provenance Metadata Infrastructure for Explaining and Trusting
>>> Answers from the Web. Technical Report KSL-04-07, Knowledge Systems
>>> Laboratory, Stanford University, USA, 2004.
>>> [3] Paulo Pinheiro da Silva, Geoff Sutcliffe, Cynthia Chang, Li Ding,
>>> Nick del Rio and Deborah McGuinness. Presenting TSTP Proofs with
>>> Inference Web Tools. In Proceedings of IJCAR '08 Workshop on Practical
>>> Aspects of Automated Reasoning (PAAR-2008), August 2008, Sydney,
>>> Australia.
>>> [4] Paulo Pinheiro da Silva and Deborah L. McGuinness. Combinable
>>> Proof Fragments for the Web. Technical Report KSL-03-04, Knowledge
>>> Systems Laboratory, Stanford University, USA, 2003.
>>> ([3] is not a paper specifically about IWSearch although  it briefly
>>> describes the tool)
>>>> Thanks Paul for this proposal for the gap analysis.
>>>> Twice you mention 'exposing' and i thought we could introduce 
>>>> 'querying'
>>>> provenance too.
>>>> Also, maybe the gaps could be structured in content vs apis.
>>>> Like this, maybe.
>>>> Content:
>>>> - No common standard for expressing provenance information that 
>>>> captures
>>>> processes as well as the other content dimensions.
>>>> - No guidance for how existing standards can be put together to 
>>>> provide
>>>> provenance (e.g. linking to identity).
>>>> APIs (or protocols):
>>>> - No common API for obtaining/querying provenance information
>>>> - No guidance for how application developers should go about exposing
>>>> provenance in their web systems.
>>>> - No well-defined standard for linking provenance between sites (i.e.
>>>> trackback but for the whole web).
>>>> I also wondered whether they should be structured according to the
>>>> provenance dimensions (so instead of API, break
>>>> this into Use/Management).
>>>> Luc
>>>> On 08/02/2010 12:04 PM, Paul Groth wrote:
>>>>> Hi All,
>>>>> As discussed at last week's telecon, I came up with some ideas about
>>>>> the gaps necessary to realize the News Aggregator Scenario. I've put
>>>>> these in the wiki and I append them below to help start the
>>>>> discussion. Let me know what you think.
>>>>> Gap Analysis- News Aggregator
>>>>> For each step within the News Aggregator scenario, there are existing
>>>>> technologies or relevant research that could solve that step. For
>>>>> example, once can properly insert licensing information into a photo
>>>>> using a creative commons license and the Extensible Metadata 
>>>>> Platform.
>>>>> One can track the origin of tweets either through retweets or using
>>>>> some extraction technologies within twitter. However, the problem is
>>>>> that across multiple sites there is no common format and api to 
>>>>> access
>>>>> and understand provenance information whether it is explicitly or
>>>>> implicitly determined. To inquire about retweets or inquire about
>>>>> trackbacks one needs to use different apis and understand different
>>>>> formats. Furthermore, there is no (widely deployed) mechanism to 
>>>>> point
>>>>> to provenance information on another site. For example, once a tweet
>>>>> is traced to the end of twitter there is no way to follow where that
>>>>> tweet came from.
>>>>> Systems largely do not document the software by which changes were
>>>>> made to data and what those pieces of software did to data. However,
>>>>> there are existing technologies that allow this to be done. For
>>>>> example, in a domain specific setting, XMP allows the transformations
>>>>> of images to be documented. More general formats such as OPM, and PML
>>>>> allow this to be expressed but are not currently widely deployed.
>>>>> Finally, while many sites provide for identity and their are several
>>>>> widely deployed standards for identity (OpenId), there are no 
>>>>> existing
>>>>> mechanisms for tying identity to objects or provenance traces. This
>>>>> directly ties to the attribution of objects and provenance.
>>>>> Summing up there are 4 existing gaps to realizing the News Aggregator
>>>>> scenario:
>>>>> - No common standard to target for exposing and expressing provenance
>>>>> information that captures processes as well as the other content
>>>>> dimensions.
>>>>> - No well-defined standard for linking provenance between sites (i.e.
>>>>> trackback but for the whole web).
>>>>> - No guidance for how exisiting standards can be put together to
>>>>> provide provenance (e.g. linking to identity).
>>>>> - No guidance for how application developers should go about exposing
>>>>> provenance in there web systems.
>> .
Received on Thursday, 5 August 2010 21:12:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:55:59 UTC