- From: Olaf Hartig <hartig@informatik.hu-berlin.de>
- Date: Tue, 18 May 2010 10:51:01 +0200
- To: Irini Fundulaki <fundul@ics.forth.gr>
- Cc: public-xg-prov@w3.org
- Message-Id: <201005181051.02097.hartig@informatik.hu-berlin.de>
Hey Irini, On Thursday 13 May 2010 22:17:43 Irini Fundulaki wrote: > Dear all, > > For your information, you can find at > http://www.csd.uoc.gr/~theohari/files/OnProvenanceOfQueriesForWebData.pdf > a short paper summarizing the preliminary results of our work on > representing data provenance for SPARQL queries evaluated on Linked Data. > In the paper we (a) summarize the benefits of using abstract provenance > models > compared to annotation-based systems, > (b) present the different models of data provenance developed so > far in the relational world and (c) their limitations > in capturing implicit provenance of SPARQL queries. > > Any comments/suggestions are more than welcome. > Looking forward to hearing from you Interesting work, thanks for sharing. I have several comments, starting with the most important one (that you may not like to hear). Your whole argumentation in Sec.4.3 is based on an interpretation of the "trust semantics" from my trust-aware SPARQL extension tSPARQL. You should explain this semantics in the paper because it is fundamental to everything that follows. However, that's not the main problem. The problem is that your interpretation of the tSPARQL semantics is wrong. Sorry to say that. The tSPARQL algebra is an extension of the SPARQL algebra that does not affect the results of the original SPARQL algebra operators. This means, for the same input mappings the tSPARQL operators produce exactly the same output mappings as the corresponding SPARQL operator does. Not more, not less. The tSPARQL algebra only augments the semantics of the SPARQL operators by defining the trust values that have to be associated with results of an operator based on the trust values associated with the corresponding input mappings. Therefore, if you apply tSPARQL semantics to the example in your paper (Tables 8) for LeftJoin ( Omega_1 , Omega_4 ) where Omega_1 contains two trust weighted solution mappings: mu_1 := ( {(?x,d),(?y,b)} , true ) mu_2 := ( {(?x,f),(?y,g)} , true ) and Omega_4 contains: mu_17 := ( {(?y,b),(?z,c)} , false ) then the LeftJoin operator (Definition 2.8 in the tSPARQL spec) returns two results: mu'_19 := ( {(?x,d),(?y,b),(?z,c)} , tm(true,false) ) mu_20 := ( {(?x,f),(?y,g)} , true ) That's it. Nothing more. If you assume a pessimistic trust merge function that defines tm(true,false) := false then mu'_19 becomes mu_19 as in your example. But there is no mu_21 as in your example. How do you determine this mapping? Maybe, you mixed up your example with an example that additionally uses the ensure trust operator EnTrust that is introduced by tSPARQL (there is nothing like this in pure SPARQL). Given you apply such an EnTrust to Omega_4 *before* you do the LeftJoin then you get mu_21 (but no mu_19 anymore). However, such an algebra expression (with an EnTrust operator) has a different semantics than the expression without the EnTrust. Further comments: * It is not clear what you mean by "abstract provenance model". Sec.2.1 does not give an appropriate definition. It is also not clear what a "provenance expression" is or what you understand of a "abstract operations" (second sentence in Sec.2.2). * As an alternative approach to the use of an "abstract provenance model" you discuss the annotation of source data with values (Sec.2.2). That's, basically, what I do with tSPARQL, only that the trust values in my case are not assumed to be fixed (i.e. calculated once) and global (i.e. not subjective). However, it might seem a bit strange that you base your whole argumentation in Sec.4.3 on my annotation based approach when you state in Sec.2.2 that annotation based approaches are unsuitable in the context of Linked Data. * In the first paragraph of Sec.3.1 you introduce a boolean trust based example. You may want to mention here that a more sceptical/pessimistic user may associate both operators \oplus and \odot with the a logical AND. * In Sec.4.4 you write that for DESCRIBE queries "the output contains all triples that have that value as a subject, predicate or object." That's not true. The SPARQL spec does not prescribe what exactly the result of a DESCRIBE query is. * In the paper you mention several times that you aim to find a provenance model for Linked Data or for SPARQL queries over Linked Data. However, that's not what you discuss in the paper. Everything that you do in the paper is related to SPARQL queries. There is nothing Linked Data specific about this. The execution of SPARQL queries over Linked Data is only a good use case for your work. Please don't take these comments as a general deprecation of your work. I really like your analysis; it is a very valuable contribution! Greetings, Olaf
Received on Tuesday, 18 May 2010 08:53:04 UTC