- From: Paulo Pinheiro da Silva <paulo@utep.edu>
- Date: Fri, 6 Aug 2010 03:13:10 -0600
- To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- CC: Paul Groth <pgroth@gmail.com>, "public-xg-prov@w3.org" <public-xg-prov@w3.org>, "Arora, Jitin BTE" <jarora@miners.utep.edu>, Tim Lebo <lebot@rpi.edu>, "Deborah L. McGuinness" <dlm@cs.rpi.edu>
Dear Luc and Paul, Thank you very much for your comments. I consider the connection between derivation traces and information sources to be essential to support some of the identified requirements for a “common” representation of provenance. For instance, I cannot see how provenance can be used to support trust recommendations and result understanding if one cannot use provenance to trace back information sources used to derive a given result. In this case, I would like to refer back to my original message where I identified some aspects of PML that I believe are not covered in OPM, including in this case a supporting infrastructure. As a response to my message, Paul cited a European provenance project that according to Luc is based on p-streucture, which pre-dates OPM. The information that p-structure pre-dates OPM, however, is not enough for me to know whether p-structure provides a solution for connecting derivation traces to information sources that I cannot see in OPM of if it does, why the connection was not propagated to OPM. Also, it does not clarify the relation between p-structure/OPM and the technical issues in the original gap analysis. So, one may be asking what is the relevance of my questions above considering that we should focus in a common representation for provenance. I would like to remind the group’s effort of mapping other provenance notations to OPM, somehow implying that OPM provides the basic constructs to support a common provenance notation. So, which other important provenance aspects are we leaving out in our mapping? Also, how can we create a common understanding of provenance aspects not covered in OPM? Unfortunately I will not be able to attend the meeting this week. In fact, I am very much interested to know some answers for my questions. Many thanks, Paulo. > Paolo, > > The papers that Paul cited can be downloaded from www.pasoa.org (they > are also on Mendeley). > The p-structure model pre-dates OPM. > > Paul's point (I believe) is that many of the issues raised in the gap > analysis were > addressed by this model, but it is not widely deployed. > > Luc > > On 05/08/10 21:30, Paulo Pinheiro da Silva wrote: >> Hi Paul, >> Thank you very much for your prompt response. >> >> I am glad to see that we all agree that there is an urging need for a >> common provenance standard. My understanding is that the incubator >> group is paving the way for the development of such standard. >> >> Regarding your message, I would like to better understand the technical >> aspects of your gap analysis and to learn from it. So, following your >> mention of the European provenance project please let me know the >> following: >> >> 1)Are you saying that one cannot see the technical issues of your >> gap analysis in the European provenance project? >> >> 2) If the answer for (1) is yes, how can we learn from this project? >> >> 3) Is OPM used in the European project? >> >> 4) If the answer for (3) is yes, I would like to understand how OPM >> artifacts are tight to sources, how sources are identified, and how >> provenance information about sources is represented; >> >> 5) If the answer for (3) is no, which provenance representation >> language is used? >> >> In other words, I need to better understand the gap analysis (viz., the >> points behind the analysis) and I believe we should not start from the >> assumption that we don’t know anything about provenance (that appears to >> be the motivation for us to write a related work section). >> >> Many thanks, >> Paulo. >> >> On 8/5/2010 1:08 PM, Paul Groth wrote: >>> Hi Paulo, >>> >>> Thanks for the message. I think the important thing here is the word >>> "common" in what I wrote. By way of illustration... >>> >>> As part of the EU Provenance Project [1], we also designed and >>> implemented an Architecture for Provenance Systems [2, 3]. This >>> architecture included a data model, the p-structure [4] that allowed for >>> the distributed linking and storage of provenance. It specified >>> protocols for querying provenance information [5,6] and recording it as >>> well. This was designed to work in a scalable setting [7]. >>> >>> Obviously, I could go into more detail, this little description is just >>> to point out that I _agree_ with you that there are solutions for many >>> of these problems. However, these solutions are _not_ common and widely >>> deployed. Where widely deployed = things like trackbacks, html, and >>> probably dublin core and RDFa. The point is that while solutions exist >>> within the research community (and some in business), they are by no >>> means common or standard. >>> >>> This is exactly why, personally, I think the W3C should have a standards >>> committee devoted to provenance. There are enough commonalities between >>> provenance technologies that having a standard would help push adoption >>> of provenance on the Web. Furthermore, without a standard it makes it >>> difficult to implement effectively something like the News Aggregator >>> Scenario over the whole of the web. >>> >>> Cheers, >>> Paul >>> >>> >>> [1] http://www.gridprovenance.org >>> [2] http://eprints.ecs.soton.ac.uk/13216/ >>> [3] Moreau, Luc and Groth, Paul and Miles, Simon and Vazquez, Javier and >>> Jiang, Sheng and Munroe, Steve and Rana, Omer and Schreiber, Andreas and >>> Tan, Victor and Varga, Laszlo (2007) The Provenance of Electronic Data. >>> Communications of the ACM, 51 (4). pp. 52-58. >>> [4] Paul Groth, Simon Miles, and Luc Moreau. A Model of Process >>> Documentation to Determine Provenance in Mash-ups. Transactions on >>> Internet Technology (TOIT), 9(1):1-31, 2009. >>> [5] Simon Miles, Paul Groth, Steve Munroe, Sheng Jiang , Thibaut >>> Assandri, and Luc Moreau. Extracting Causal Graphs from an Open >>> Provenance Data Model. Concurrency and Computation: Practice and >>> Experience, 2007. >>> [6] Miles, Simon (2006) Electronically Querying for the Provenance of >>> Entities. In: Proceedings of the International Provenance and Annotation >>> Workshop, May 2006, Chicago, USA. >>> [7] Groth, Paul and Miles, Simon and Fang, Weijian and Wong, Sylvia C. >>> and Moreau, Luc (2005) Recording and Using Provenance in a Protein >>> Compressibility Experiment. In: Proceedings of the 14th IEEE >>> International Symposium on High Performance Distributed Computing (HPDC >>> 2005). Item not available online. >>> >>> >>> Paulo Pinheiro da Silva wrote: >>>> Paul-- Thank you very much for your message. >>>> >>>> All-- I agree with Paul’s statement that there is not well-establish >>>> guidelines for using/adopting provenance solutions and this is a part >>>> of his message that I would like to see further discussion. >>>> I like Luc’s suggestion of discussing these gaps in terms of queries. >>>> For instance, if you go to >>>> >>>> http://trust.utep.edu/sparql-pml/query/example >>>> >>>> you will see a large collection of sparql-pml queries answering many >>>> of the questions that require bridging the gaps identified in Paul’s >>>> message. Please note that the queries in the URL above are standard >>>> SPARQL queries based on the use of PML vocabulary. The results used in >>>> the URL come from a repository of PML provenance knowledge in the >>>> domains of earth science (using actual NSF Earthscope and IRIS data in >>>> support of seismology and USGS data in support of earth magnetism), >>>> astronomy (using actual NCAR data in support of space weather), and >>>> logical proofs in support of TPTP. Anyone can actually go to >>>> http://trust.utep.edu/sparql-pml/query/index and write your own >>>> queries or use the basic or advanced use interface >>>> (http://trust.utep.edu/sparql-pml/search/index). [the SPARQL-PML >>>> queries have been developed by Jitin Arora >>>> (http://trust.utep.edu/~jarora/)] >>>> >>>> I would like to emphasize two aspects of PML that may need to be >>>> highlighted so that the group can further appreciate our work and >>>> understand how PML bridges the technical gaps in Paul’s message: >>>> >>>> 1) PML is a collection of three ontologies: PML-Provenance (or >>>> PML-P), PML-Justification (PML-J) and PML-Trust (PML-T). In this >>>> case, most of the provenance concepts in OPM map into concepts >>>> described in PML-J ontology. This means that most of the elements in >>>> PML-P are concepts not covered in OPM. I will go further and say that >>>> many of these concepts have the role of tying artifacts to sources as >>>> identified in Paul’s message; >>>> >>>> 2) If you revisit our publications, for instance [1], you will >>>> see that PML is just a component (the language component) of a bigger >>>> infrastructure called Inference Web >>>> http://inference-web.org >>>> >>>> In fact, most of the concerns highlighted by Luc in his message about >>>> having a well-defined API, services and other infrastructural features >>>> in support of provenance are exactly the kinds of things that one >>>> should be able to see in the Inference Web. >>>> >>>> With (1) and (2) in mind, I would like to stress one point: most of >>>> the provenance infrastructure mentioned in (2) is in support of PML-P. >>>> In fact, PML-P is the part of the provenance that gets reused across >>>> multiple justification traces and as such needs to be discovered, >>>> aligned, augmented, etc. Further, one of our major mistakes was to >>>> put a lot of effort trying to come up with a registration mechanism >>>> for PML-P documents called IW-Base [2]. Later on, after a meeting with >>>> Tim Berners-Lee and his W3C team, we learned that we would need to >>>> distribute this approach, reason why we developed an Inference Web >>>> search mechanism for provenance called IWSearch [3]. Again, anyone >>>> can try IWSearch at http://onto.rpi.edu/iwsearch/ >>>> >>>> I would like to say that PML and Inference Web were developed from day >>>> 1 to support "linking provenance between sites (i.e. trackback but for >>>> the whole web).: That is the reason why PML has always had the >>>> following properties: >>>> >>>> a) PML identifiers are URIs >>>> b) PML content is in RDF/OWL (used to be in DAML+OIL before OWL) >>>> c) PML justifications are combinable/decomposable [4] >>>> d) RDF/OWL links are used to connect PML documents >>>> >>>> Another point that I would like to make is that the PML-P part of PML >>>> is the one where we connect to many other well-known pieces of >>>> information that we have discussed in this group. For instance, when >>>> it comes to publications, PML-P defines a publication as a kind of >>>> information source and is where we connect PML to Dublin Core >>>> attributes for publications. >>>> >>>> As you see, I have reasons to be uncomfortable with statements that >>>> there is no language “for expressing provenance information that >>>> captures processes as well as the other content dimensions” or "API >>>> for obtaining/querying provenance information" or " for linking >>>> provenance between sites (i.e. trackback but for the whole web)". >>>> >>>> Regarding this month of August, I am unfortunately unable to attend >>>> the meeting this week and next week (will be flying during the time of >>>> the meetings). Also, I believe Deborah will not attend as well due to >>>> personal reasons. Thus, I am asking Tim Lebo from RPI to represent us >>>> and to collect any request you may have from PML so that we can >>>> address them later in case Tim cannot answer your questions right away. >>>> >>>> Many thanks, >>>> Paulo. >>>> >>>> [the publications below are part of the provenance collection in >>>> Mendeley] >>>> >>>> [1] Deborah L. McGuinness and Paulo Pinheiro da Silva. Explaining >>>> Answers from the Semantic Web: The Inference Web Approach. Journal of >>>> Web Semantics, Vol. 1 No. 4. October 2004, pages 397-413. >>>> >>>> [2] Deborah L. McGuinness, Paulo Pinheiro da Silva, Cynthia Chang. >>>> IWBase: Provenance Metadata Infrastructure for Explaining and Trusting >>>> Answers from the Web. Technical Report KSL-04-07, Knowledge Systems >>>> Laboratory, Stanford University, USA, 2004. >>>> >>>> [3] Paulo Pinheiro da Silva, Geoff Sutcliffe, Cynthia Chang, Li Ding, >>>> Nick del Rio and Deborah McGuinness. Presenting TSTP Proofs with >>>> Inference Web Tools. In Proceedings of IJCAR '08 Workshop on Practical >>>> Aspects of Automated Reasoning (PAAR-2008), August 2008, Sydney, >>>> Australia. >>>> >>>> [4] Paulo Pinheiro da Silva and Deborah L. McGuinness. Combinable >>>> Proof Fragments for the Web. Technical Report KSL-03-04, Knowledge >>>> Systems Laboratory, Stanford University, USA, 2003. >>>> >>>> ([3] is not a paper specifically about IWSearch although it briefly >>>> describes the tool) >>>> >>>>> Thanks Paul for this proposal for the gap analysis. >>>>> Twice you mention 'exposing' and i thought we could introduce >>>>> 'querying' >>>>> provenance too. >>>>> >>>>> Also, maybe the gaps could be structured in content vs apis. >>>>> Like this, maybe. >>>>> >>>>> >>>>> Content: >>>>> - No common standard for expressing provenance information that >>>>> captures >>>>> processes as well as the other content dimensions. >>>>> - No guidance for how existing standards can be put together to >>>>> provide >>>>> provenance (e.g. linking to identity). >>>>> >>>>> APIs (or protocols): >>>>> - No common API for obtaining/querying provenance information >>>>> - No guidance for how application developers should go about exposing >>>>> provenance in their web systems. >>>>> - No well-defined standard for linking provenance between sites (i.e. >>>>> trackback but for the whole web). >>>>> >>>>> >>>>> I also wondered whether they should be structured according to the >>>>> provenance dimensions (so instead of API, break >>>>> this into Use/Management). >>>>> >>>>> Luc >>>>> >>>>> >>>>> >>>>> On 08/02/2010 12:04 PM, Paul Groth wrote: >>>>>> Hi All, >>>>>> >>>>>> As discussed at last week's telecon, I came up with some ideas about >>>>>> the gaps necessary to realize the News Aggregator Scenario. I've put >>>>>> these in the wiki and I append them below to help start the >>>>>> discussion. Let me know what you think. >>>>>> >>>>>> Gap Analysis- News Aggregator >>>>>> >>>>>> For each step within the News Aggregator scenario, there are existing >>>>>> technologies or relevant research that could solve that step. For >>>>>> example, once can properly insert licensing information into a photo >>>>>> using a creative commons license and the Extensible Metadata >>>>>> Platform. >>>>>> One can track the origin of tweets either through retweets or using >>>>>> some extraction technologies within twitter. However, the problem is >>>>>> that across multiple sites there is no common format and api to >>>>>> access >>>>>> and understand provenance information whether it is explicitly or >>>>>> implicitly determined. To inquire about retweets or inquire about >>>>>> trackbacks one needs to use different apis and understand different >>>>>> formats. Furthermore, there is no (widely deployed) mechanism to >>>>>> point >>>>>> to provenance information on another site. For example, once a tweet >>>>>> is traced to the end of twitter there is no way to follow where that >>>>>> tweet came from. >>>>>> >>>>>> Systems largely do not document the software by which changes were >>>>>> made to data and what those pieces of software did to data. However, >>>>>> there are existing technologies that allow this to be done. For >>>>>> example, in a domain specific setting, XMP allows the transformations >>>>>> of images to be documented. More general formats such as OPM, and PML >>>>>> allow this to be expressed but are not currently widely deployed. >>>>>> >>>>>> Finally, while many sites provide for identity and their are several >>>>>> widely deployed standards for identity (OpenId), there are no >>>>>> existing >>>>>> mechanisms for tying identity to objects or provenance traces. This >>>>>> directly ties to the attribution of objects and provenance. >>>>>> >>>>>> Summing up there are 4 existing gaps to realizing the News Aggregator >>>>>> scenario: >>>>>> >>>>>> - No common standard to target for exposing and expressing provenance >>>>>> information that captures processes as well as the other content >>>>>> dimensions. >>>>>> - No well-defined standard for linking provenance between sites (i.e. >>>>>> trackback but for the whole web). >>>>>> - No guidance for how exisiting standards can be put together to >>>>>> provide provenance (e.g. linking to identity). >>>>>> - No guidance for how application developers should go about exposing >>>>>> provenance in there web systems. >>> . >>> >> > . >
Received on Friday, 6 August 2010 09:13:44 UTC