- From: Paul Groth <pgroth@gmail.com>
- Date: Mon, 26 Jul 2010 10:28:10 +0200
- To: Paulo Pinheiro da Silva <paulo@utep.edu>
- CC: "public-xg-prov@w3.org" <public-xg-prov@w3.org>
Hi Paulo, Thanks for the explanation and the pointers. This will be useful in doing the gap analysis. In general, I have the feeling that what makes the news aggregator scenario hard is the fact that no one system is controlled by the same user/organization and that they don't all use the same technology. So I guess the problem is not necessarily technology but more one of interoperability. (I may be wrong on that but you provide some evidence that I may be right). Anyway, thanks for taking the time, Paul Paulo Pinheiro da Silva wrote: > Hi All, > > Sorry for the long message below but I would like to move beyond the > tags and to further describe the connection of PML and PML-related > publications to the news aggregation scenario. > > The news aggregation scenario says that “many web users would like to > have mechanisms to automatically determine whether a web document or > resource can be used, based on the original source of the content.” > This exact claim is part of the collaborative research that Stanford, > IBM and Pacific Northwest National Lab developed in the period between > 2003 and 2005 as briefly described in this IBM report: > > http://www.research.ibm.com/UIMA/SUKI/index.html > > This project was about aggregation of news from a given domain (e.g., > news about a "#panda being moved from Chicago Zoo to Florida”) but > also about extracting knowledge from this corpus of news articles and > to use the extracted knowledge to answer complex questions in the > domain of discussion. > > In terms of PML, our goal was to encode the provenance of every piece > of extracted, derived knowledge and to be able to always track back to > the original news articles. This approach is in line with the news > aggregation scenario that “wants to ensure that the news that it > aggregates are correctly attributed to the right person so that they > may receive credit.” > > PML was used to capture the following provenance information: > 1) How spans of text were extracted from sources on the web; > 2) How knowledge was extracted from the spans of text; > 3) How knowledge was aggregated (for example, dealing with > co-resolution of identified entities within documents and across > documents); > 4) How knowledge was used to derive answers for complex questions > (for example, explaining the decision of moving the panda from Chicago > Zoo to Florida); > 5) More importantly, how was the flow of information from > unstructured, asserted text to structured, derived data. > > For example, using another corpus in another domain, we asked ‘Who is > the manager of the Mississippi Automated System Project?’ and it was > answered that ‘Julian Allen is the director of the project’. The > provenance of the answer is encoded in PML and presented in IWBrowser, > a web-based PML browser. The link below is going to show you the > provenance trace (you may need to scroll around to see the entire trace). > > http://browser.inference-web.org/iwbrowser/NodeSetBrowser?w=1600&mg=999&st=Dag&fm=Raw&url=http%3A%2F%2Finference-web.org%2Fproofs%2FMississippiAutomatedSystem%2Fns36.owl%23ns36 > > > The provenance shows how the answer that ‘Julian Allen was the > director’ was derived step by step through a bunch of information > extraction and integration tools. It is relevant to mention that this > provenance was captured from IBM UIMA where different information > extraction technologies would compete to produce the best answer for a > given question. Furthermore, sometimes the information extraction > technologies would produce different and even conflicting answers for > the question, which was actually one of the interesting aspects of the > process for the intelligence community and a challenge for a > provenance language if PML was not ready to accommodate alternative > explanations. > > The following ISWC 2006 paper provides an overview of the scenario above: > > J. William Murdock, Deborah McGuinness, Paulo Pinheiro da Silva, Chris > Welty, and David Ferrucci. Explaining Conclusions from Diverse > Knowledge Sources. In Proceedings of the 5th International Semantic > Web Conference (ISWC2006), Athens, GA, USA, p. 861-872, November 2006. > http://www.cs.utep.edu/paulo/papers/Murdock_ISWC_2006.pdf > > An explanation concerning the alignment of multiple processes to > explain the common goal of extracting information from unstructured > data is available in the paper below: > > J. William Murdock, Paulo Pinheiro da Silva, David Ferrucci, > Christopher Welty and Deborah L. McGuinness. Encoding Extraction as > Inferences. In Proceedings of AAAI Spring Symposium on Metacognition > on Computation, AAAI Press, Stanford University, USA, pages 92-97, > 2005. http://www.ksl.stanford.edu/people/pp/papers/Murdock_SSS_2005.pdf > > The scenario also mentions that “unfortunately for BlogAgg, the source > of the information is not often apparent from the data that it > aggregates from the web. In particular, it must employ teams of people > to check that selected content is both high-quality and can be used > legally. The site would like this quality control process to be > handled automatically.” Regarding trust, we first point to the > following paper that described how one may compute trust based on > provenance encoded in PML: > > Ilya Zaihrayeu, Paulo Pinheiro da Silva and Deborah L. McGuinness. > IWTrust: Improving User Trust in Answers from the Web. In Proceedings > of 3rd International Conference on Trust Management (iTrust2005), > Springer, Rocquencourt, France, pages 384-392, 2005. > http://www.cs.utep.edu/paulo/papers/Zaihrayeu_iTrust_2005.pdf > > Now, the exact representation and dimensions of trust to be considered > vary and we would refer to the following paper: > > Patricia Victor, Chris Cornelis, Martine De Cock, Paulo Pinheiro da > Silva. Gradual Trust and Distrust in Recommender Systems. In Fuzzy > Sets and Systems 160(10): 1367-1382, 2009. > http://www.cs.utep.edu/paulo/papers/Victor_FSS_2007.pdf > > I hope you all can see the connections between our work and the news > aggregation scenario. > > Many thanks, > Paulo.
Received on Monday, 26 July 2010 08:33:01 UTC