- From: Paulo Pinheiro da Silva <paulo@utep.edu>
- Date: Fri, 6 Aug 2010 05:29:13 -0600
- To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- CC: Paul Groth <pgroth@gmail.com>, "public-xg-prov@w3.org" <public-xg-prov@w3.org>, "Arora, Jitin BTE" <jarora@miners.utep.edu>, Tim Lebo <lebot@rpi.edu>, "Deborah L. McGuinness" <dlm@cs.rpi.edu>
Dear Luc, Your answer below should definitely facilitate the reading of the papers listed by Paul. Thank you very much, Paulo. > Paolo, > > Paul never said there was no solution out there. There is some (including > pml,pstructure,opm,provenir, prov voc, etc, etc,) that address, more or > less, technical > gaps that have been identified. So, we're not ignoring the state of the > art, far from it. > > However, *NONE* is widespread to the point that we found it on every > desktop/handheld/web service! > > Hence, Paul is making the case for the need for a standard in this area. > > To answer your specific question, OPM artifacts are linked to "things" > by means > of the property "value". Things can be serialized as immediate values or > passed by > reference, and referred to by URIs. > > The pstructure follows the same approach, with a difference: it takes a > message oriented > view of the world (where any information is in "messages" between > parties), and uses > a structured key to refer to information in messages. This key can also > be expressed as > a URI. > > > So, both of them can "connect derivation traces to information sources". > > I hope it helps, > Cheers, > Luc > > On 08/06/2010 10:13 AM, Paulo Pinheiro da Silva wrote: >> Dear Luc and Paul, >> >> Thank you very much for your comments. >> >> I consider the connection between derivation traces and information >> sources to be essential to support some of the identified requirements >> for a “common” representation of provenance. For instance, I cannot see >> how provenance can be used to support trust recommendations and result >> understanding if one cannot use provenance to trace back information >> sources used to derive a given result. >> >> In this case, I would like to refer back to my original message where I >> identified some aspects of PML that I believe are not covered in OPM, >> including in this case a supporting infrastructure. >> >> As a response to my message, Paul cited a European provenance project >> that according to Luc is based on p-streucture, which pre-dates OPM. The >> information that p-structure pre-dates OPM, however, is not enough for >> me to know whether p-structure provides a solution for connecting >> derivation traces to information sources that I cannot see in OPM of if >> it does, why the connection was not propagated to OPM. Also, it does not >> clarify the relation between p-structure/OPM and the technical issues in >> the original gap analysis. >> >> So, one may be asking what is the relevance of my questions above >> considering that we should focus in a common representation for >> provenance. I would like to remind the group’s effort of mapping other >> provenance notations to OPM, somehow implying that OPM provides the >> basic constructs to support a common provenance notation. So, which >> other important provenance aspects are we leaving out in our mapping? >> Also, how can we create a common understanding of provenance aspects not >> covered in OPM? >> >> Unfortunately I will not be able to attend the meeting this week. In >> fact, I am very much interested to know some answers for my questions. >> >> Many thanks, >> Paulo. >> >>> Paolo, >>> >>> The papers that Paul cited can be downloaded from www.pasoa.org (they >>> are also on Mendeley). >>> The p-structure model pre-dates OPM. >>> >>> Paul's point (I believe) is that many of the issues raised in the gap >>> analysis were >>> addressed by this model, but it is not widely deployed. >>> >>> Luc >>> >>> On 05/08/10 21:30, Paulo Pinheiro da Silva wrote: >>>> Hi Paul, >>>> Thank you very much for your prompt response. >>>> >>>> I am glad to see that we all agree that there is an urging need for a >>>> common provenance standard. My understanding is that the incubator >>>> group is paving the way for the development of such standard. >>>> >>>> Regarding your message, I would like to better understand the technical >>>> aspects of your gap analysis and to learn from it. So, following your >>>> mention of the European provenance project please let me know the >>>> following: >>>> >>>> 1)Are you saying that one cannot see the technical issues of your >>>> gap analysis in the European provenance project? >>>> >>>> 2) If the answer for (1) is yes, how can we learn from this >>>> project? >>>> >>>> 3) Is OPM used in the European project? >>>> >>>> 4) If the answer for (3) is yes, I would like to understand how OPM >>>> artifacts are tight to sources, how sources are identified, and how >>>> provenance information about sources is represented; >>>> >>>> 5) If the answer for (3) is no, which provenance representation >>>> language is used? >>>> >>>> In other words, I need to better understand the gap analysis (viz., the >>>> points behind the analysis) and I believe we should not start from the >>>> assumption that we don’t know anything about provenance (that >>>> appears to >>>> be the motivation for us to write a related work section). >>>> >>>> Many thanks, >>>> Paulo. >>>> >>>> On 8/5/2010 1:08 PM, Paul Groth wrote: >>>>> Hi Paulo, >>>>> >>>>> Thanks for the message. I think the important thing here is the word >>>>> "common" in what I wrote. By way of illustration... >>>>> >>>>> As part of the EU Provenance Project [1], we also designed and >>>>> implemented an Architecture for Provenance Systems [2, 3]. This >>>>> architecture included a data model, the p-structure [4] that >>>>> allowed for >>>>> the distributed linking and storage of provenance. It specified >>>>> protocols for querying provenance information [5,6] and recording >>>>> it as >>>>> well. This was designed to work in a scalable setting [7]. >>>>> >>>>> Obviously, I could go into more detail, this little description is >>>>> just >>>>> to point out that I _agree_ with you that there are solutions for many >>>>> of these problems. However, these solutions are _not_ common and >>>>> widely >>>>> deployed. Where widely deployed = things like trackbacks, html, and >>>>> probably dublin core and RDFa. The point is that while solutions exist >>>>> within the research community (and some in business), they are by no >>>>> means common or standard. >>>>> >>>>> This is exactly why, personally, I think the W3C should have a >>>>> standards >>>>> committee devoted to provenance. There are enough commonalities >>>>> between >>>>> provenance technologies that having a standard would help push >>>>> adoption >>>>> of provenance on the Web. Furthermore, without a standard it makes it >>>>> difficult to implement effectively something like the News Aggregator >>>>> Scenario over the whole of the web. >>>>> >>>>> Cheers, >>>>> Paul >>>>> >>>>> >>>>> [1] http://www.gridprovenance.org >>>>> [2] http://eprints.ecs.soton.ac.uk/13216/ >>>>> [3] Moreau, Luc and Groth, Paul and Miles, Simon and Vazquez, >>>>> Javier and >>>>> Jiang, Sheng and Munroe, Steve and Rana, Omer and Schreiber, >>>>> Andreas and >>>>> Tan, Victor and Varga, Laszlo (2007) The Provenance of Electronic >>>>> Data. >>>>> Communications of the ACM, 51 (4). pp. 52-58. >>>>> [4] Paul Groth, Simon Miles, and Luc Moreau. A Model of Process >>>>> Documentation to Determine Provenance in Mash-ups. Transactions on >>>>> Internet Technology (TOIT), 9(1):1-31, 2009. >>>>> [5] Simon Miles, Paul Groth, Steve Munroe, Sheng Jiang , Thibaut >>>>> Assandri, and Luc Moreau. Extracting Causal Graphs from an Open >>>>> Provenance Data Model. Concurrency and Computation: Practice and >>>>> Experience, 2007. >>>>> [6] Miles, Simon (2006) Electronically Querying for the Provenance of >>>>> Entities. In: Proceedings of the International Provenance and >>>>> Annotation >>>>> Workshop, May 2006, Chicago, USA. >>>>> [7] Groth, Paul and Miles, Simon and Fang, Weijian and Wong, Sylvia C. >>>>> and Moreau, Luc (2005) Recording and Using Provenance in a Protein >>>>> Compressibility Experiment. In: Proceedings of the 14th IEEE >>>>> International Symposium on High Performance Distributed Computing >>>>> (HPDC >>>>> 2005). Item not available online. >>>>> >>>>> >>>>> Paulo Pinheiro da Silva wrote: >>>>>> Paul-- Thank you very much for your message. >>>>>> >>>>>> All-- I agree with Paul’s statement that there is not well-establish >>>>>> guidelines for using/adopting provenance solutions and this is a part >>>>>> of his message that I would like to see further discussion. >>>>>> I like Luc’s suggestion of discussing these gaps in terms of queries. >>>>>> For instance, if you go to >>>>>> >>>>>> http://trust.utep.edu/sparql-pml/query/example >>>>>> >>>>>> you will see a large collection of sparql-pml queries answering many >>>>>> of the questions that require bridging the gaps identified in Paul’s >>>>>> message. Please note that the queries in the URL above are standard >>>>>> SPARQL queries based on the use of PML vocabulary. The results >>>>>> used in >>>>>> the URL come from a repository of PML provenance knowledge in the >>>>>> domains of earth science (using actual NSF Earthscope and IRIS >>>>>> data in >>>>>> support of seismology and USGS data in support of earth magnetism), >>>>>> astronomy (using actual NCAR data in support of space weather), and >>>>>> logical proofs in support of TPTP. Anyone can actually go to >>>>>> http://trust.utep.edu/sparql-pml/query/index and write your own >>>>>> queries or use the basic or advanced use interface >>>>>> (http://trust.utep.edu/sparql-pml/search/index). [the SPARQL-PML >>>>>> queries have been developed by Jitin Arora >>>>>> (http://trust.utep.edu/~jarora/)] >>>>>> >>>>>> I would like to emphasize two aspects of PML that may need to be >>>>>> highlighted so that the group can further appreciate our work and >>>>>> understand how PML bridges the technical gaps in Paul’s message: >>>>>> >>>>>> 1) PML is a collection of three ontologies: PML-Provenance >>>>>> (or >>>>>> PML-P), PML-Justification (PML-J) and PML-Trust (PML-T). In this >>>>>> case, most of the provenance concepts in OPM map into concepts >>>>>> described in PML-J ontology. This means that most of the elements in >>>>>> PML-P are concepts not covered in OPM. I will go further and say that >>>>>> many of these concepts have the role of tying artifacts to sources as >>>>>> identified in Paul’s message; >>>>>> >>>>>> 2) If you revisit our publications, for instance [1], you >>>>>> will >>>>>> see that PML is just a component (the language component) of a bigger >>>>>> infrastructure called Inference Web >>>>>> http://inference-web.org >>>>>> >>>>>> In fact, most of the concerns highlighted by Luc in his message about >>>>>> having a well-defined API, services and other infrastructural >>>>>> features >>>>>> in support of provenance are exactly the kinds of things that one >>>>>> should be able to see in the Inference Web. >>>>>> >>>>>> With (1) and (2) in mind, I would like to stress one point: most of >>>>>> the provenance infrastructure mentioned in (2) is in support of >>>>>> PML-P. >>>>>> In fact, PML-P is the part of the provenance that gets reused across >>>>>> multiple justification traces and as such needs to be discovered, >>>>>> aligned, augmented, etc. Further, one of our major mistakes was to >>>>>> put a lot of effort trying to come up with a registration mechanism >>>>>> for PML-P documents called IW-Base [2]. Later on, after a meeting >>>>>> with >>>>>> Tim Berners-Lee and his W3C team, we learned that we would need to >>>>>> distribute this approach, reason why we developed an Inference Web >>>>>> search mechanism for provenance called IWSearch [3]. Again, anyone >>>>>> can try IWSearch at http://onto.rpi.edu/iwsearch/ >>>>>> >>>>>> I would like to say that PML and Inference Web were developed from >>>>>> day >>>>>> 1 to support "linking provenance between sites (i.e. trackback but >>>>>> for >>>>>> the whole web).: That is the reason why PML has always had the >>>>>> following properties: >>>>>> >>>>>> a) PML identifiers are URIs >>>>>> b) PML content is in RDF/OWL (used to be in DAML+OIL before OWL) >>>>>> c) PML justifications are combinable/decomposable [4] >>>>>> d) RDF/OWL links are used to connect PML documents >>>>>> >>>>>> Another point that I would like to make is that the PML-P part of PML >>>>>> is the one where we connect to many other well-known pieces of >>>>>> information that we have discussed in this group. For instance, when >>>>>> it comes to publications, PML-P defines a publication as a kind of >>>>>> information source and is where we connect PML to Dublin Core >>>>>> attributes for publications. >>>>>> >>>>>> As you see, I have reasons to be uncomfortable with statements that >>>>>> there is no language “for expressing provenance information that >>>>>> captures processes as well as the other content dimensions” or "API >>>>>> for obtaining/querying provenance information" or " for linking >>>>>> provenance between sites (i.e. trackback but for the whole web)". >>>>>> >>>>>> Regarding this month of August, I am unfortunately unable to attend >>>>>> the meeting this week and next week (will be flying during the >>>>>> time of >>>>>> the meetings). Also, I believe Deborah will not attend as well due to >>>>>> personal reasons. Thus, I am asking Tim Lebo from RPI to represent us >>>>>> and to collect any request you may have from PML so that we can >>>>>> address them later in case Tim cannot answer your questions right >>>>>> away. >>>>>> >>>>>> Many thanks, >>>>>> Paulo. >>>>>> >>>>>> [the publications below are part of the provenance collection in >>>>>> Mendeley] >>>>>> >>>>>> [1] Deborah L. McGuinness and Paulo Pinheiro da Silva. Explaining >>>>>> Answers from the Semantic Web: The Inference Web Approach. Journal of >>>>>> Web Semantics, Vol. 1 No. 4. October 2004, pages 397-413. >>>>>> >>>>>> [2] Deborah L. McGuinness, Paulo Pinheiro da Silva, Cynthia Chang. >>>>>> IWBase: Provenance Metadata Infrastructure for Explaining and >>>>>> Trusting >>>>>> Answers from the Web. Technical Report KSL-04-07, Knowledge Systems >>>>>> Laboratory, Stanford University, USA, 2004. >>>>>> >>>>>> [3] Paulo Pinheiro da Silva, Geoff Sutcliffe, Cynthia Chang, Li Ding, >>>>>> Nick del Rio and Deborah McGuinness. Presenting TSTP Proofs with >>>>>> Inference Web Tools. In Proceedings of IJCAR '08 Workshop on >>>>>> Practical >>>>>> Aspects of Automated Reasoning (PAAR-2008), August 2008, Sydney, >>>>>> Australia. >>>>>> >>>>>> [4] Paulo Pinheiro da Silva and Deborah L. McGuinness. Combinable >>>>>> Proof Fragments for the Web. Technical Report KSL-03-04, Knowledge >>>>>> Systems Laboratory, Stanford University, USA, 2003. >>>>>> >>>>>> ([3] is not a paper specifically about IWSearch although it briefly >>>>>> describes the tool) >>>>>> >>>>>>> Thanks Paul for this proposal for the gap analysis. >>>>>>> Twice you mention 'exposing' and i thought we could introduce >>>>>>> 'querying' >>>>>>> provenance too. >>>>>>> >>>>>>> Also, maybe the gaps could be structured in content vs apis. >>>>>>> Like this, maybe. >>>>>>> >>>>>>> >>>>>>> Content: >>>>>>> - No common standard for expressing provenance information that >>>>>>> captures >>>>>>> processes as well as the other content dimensions. >>>>>>> - No guidance for how existing standards can be put together to >>>>>>> provide >>>>>>> provenance (e.g. linking to identity). >>>>>>> >>>>>>> APIs (or protocols): >>>>>>> - No common API for obtaining/querying provenance information >>>>>>> - No guidance for how application developers should go about >>>>>>> exposing >>>>>>> provenance in their web systems. >>>>>>> - No well-defined standard for linking provenance between sites >>>>>>> (i.e. >>>>>>> trackback but for the whole web). >>>>>>> >>>>>>> >>>>>>> I also wondered whether they should be structured according to the >>>>>>> provenance dimensions (so instead of API, break >>>>>>> this into Use/Management). >>>>>>> >>>>>>> Luc >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 08/02/2010 12:04 PM, Paul Groth wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> As discussed at last week's telecon, I came up with some ideas >>>>>>>> about >>>>>>>> the gaps necessary to realize the News Aggregator Scenario. I've >>>>>>>> put >>>>>>>> these in the wiki and I append them below to help start the >>>>>>>> discussion. Let me know what you think. >>>>>>>> >>>>>>>> Gap Analysis- News Aggregator >>>>>>>> >>>>>>>> For each step within the News Aggregator scenario, there are >>>>>>>> existing >>>>>>>> technologies or relevant research that could solve that step. For >>>>>>>> example, once can properly insert licensing information into a >>>>>>>> photo >>>>>>>> using a creative commons license and the Extensible Metadata >>>>>>>> Platform. >>>>>>>> One can track the origin of tweets either through retweets or using >>>>>>>> some extraction technologies within twitter. However, the >>>>>>>> problem is >>>>>>>> that across multiple sites there is no common format and api to >>>>>>>> access >>>>>>>> and understand provenance information whether it is explicitly or >>>>>>>> implicitly determined. To inquire about retweets or inquire about >>>>>>>> trackbacks one needs to use different apis and understand different >>>>>>>> formats. Furthermore, there is no (widely deployed) mechanism to >>>>>>>> point >>>>>>>> to provenance information on another site. For example, once a >>>>>>>> tweet >>>>>>>> is traced to the end of twitter there is no way to follow where >>>>>>>> that >>>>>>>> tweet came from. >>>>>>>> >>>>>>>> Systems largely do not document the software by which changes were >>>>>>>> made to data and what those pieces of software did to data. >>>>>>>> However, >>>>>>>> there are existing technologies that allow this to be done. For >>>>>>>> example, in a domain specific setting, XMP allows the >>>>>>>> transformations >>>>>>>> of images to be documented. More general formats such as OPM, >>>>>>>> and PML >>>>>>>> allow this to be expressed but are not currently widely deployed. >>>>>>>> >>>>>>>> Finally, while many sites provide for identity and their are >>>>>>>> several >>>>>>>> widely deployed standards for identity (OpenId), there are no >>>>>>>> existing >>>>>>>> mechanisms for tying identity to objects or provenance traces. This >>>>>>>> directly ties to the attribution of objects and provenance. >>>>>>>> >>>>>>>> Summing up there are 4 existing gaps to realizing the News >>>>>>>> Aggregator >>>>>>>> scenario: >>>>>>>> >>>>>>>> - No common standard to target for exposing and expressing >>>>>>>> provenance >>>>>>>> information that captures processes as well as the other content >>>>>>>> dimensions. >>>>>>>> - No well-defined standard for linking provenance between sites >>>>>>>> (i.e. >>>>>>>> trackback but for the whole web). >>>>>>>> - No guidance for how exisiting standards can be put together to >>>>>>>> provide provenance (e.g. linking to identity). >>>>>>>> - No guidance for how application developers should go about >>>>>>>> exposing >>>>>>>> provenance in there web systems. >>>>> . >>>>> >>>> >>> . >>> >> >> > > -- > Professor Luc Moreau > Electronics and Computer Science tel: +44 23 8059 4487 > University of Southampton fax: +44 23 8059 2865 > Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk > United Kingdom http://www.ecs.soton.ac.uk/~lavm > > . >
Received on Friday, 6 August 2010 11:29:47 UTC