- From: Irini Fundulaki <fundul@ics.forth.gr>
- Date: Thu, 20 May 2010 11:50:05 +0300
- To: Olaf Hartig <hartig@informatik.hu-berlin.de>
- CC: public-xg-prov@w3.org, Vassilis Christophides <christop@ics.forth.gr>, Grigoris Karvounarakis <gregkar@gmail.com>, Yannis Theoharis <ytheohar@gmail.com>
- Message-ID: <4BF4F7BD.7090702@ics.forth.gr>
First of all, apologies to the list for the long email. Olaf, all, you can find our comments inlined. On 5/18/10 11:51 AM, Olaf Hartig wrote: > > Interesting work, thanks for sharing. > > I have several comments, starting with the most important one (that you may > not like to hear). > > Your whole argumentation in Sec.4.3 is based on an interpretation of the > "trust semantics" from my trust-aware SPARQL extension tSPARQL. You should > explain this semantics in the paper because it is fundamental to everything > that follows. However, that's not the main problem. The problem is that your > interpretation of the tSPARQL semantics is wrong. Sorry to say that. > The tSPARQL algebra is an extension of the SPARQL algebra that does not affect > the results of the original SPARQL algebra operators. This means, for the same > input mappings the tSPARQL operators produce exactly the same output mappings > as the corresponding SPARQL operator does. Not more, not less. The tSPARQL > algebra only augments the semantics of the SPARQL operators by defining the > trust values that have to be associated with results of an operator based on > the trust values associated with the corresponding input mappings. Therefore, > if you apply tSPARQL semantics to the example in your paper (Tables 8) for > > LeftJoin ( Omega_1 , Omega_4 ) > > where Omega_1 contains two trust weighted solution mappings: > > mu_1 := ( {(?x,d),(?y,b)} , true ) > mu_2 := ( {(?x,f),(?y,g)} , true ) > > and Omega_4 contains: > > mu_17 := ( {(?y,b),(?z,c)} , false ) > > then the LeftJoin operator (Definition 2.8 in the tSPARQL spec) returns two > results: > > mu'_19 := ( {(?x,d),(?y,b),(?z,c)} , tm(true,false) ) > mu_20 := ( {(?x,f),(?y,g)} , true ) > > That's it. Nothing more. If you assume a pessimistic trust merge function that > defines > > tm(true,false) := false > > then mu'_19 becomes mu_19 as in your example. > But there is no mu_21 as in your example. How do you determine this mapping? > > Maybe, you mixed up your example with an example that additionally uses the > ensure trust operator EnTrust that is introduced by tSPARQL (there is nothing > like this in pure SPARQL). Given you apply such an EnTrust to Omega_4 *before* > you do the LeftJoin then you get mu_21 (but no mu_19 anymore). However, such > an algebra expression (with an EnTrust operator) has a different semantics than > the expression without the EnTrust. We believe that you might have misread the paper. Our objective is to understand the requirements for the boolean trust application when the SPARQL OPTIONAL operator is concerned (because of the lack of support for the left outer join in the relational context). For this, we adopt the semantics of the EnTrust operator for the **boolean trust application only** but not in general the annotation mechanism that you propose in your work. We believe that the main point worth noting here, is that a mapping annotated with the false value is treated as absent from the mapping set, and vice versa, a mapping that does not appear in the mapping set is considered as untrusted (i.e., annotated with the false value). This holds because of the strict semantics of the boolean trust application. By no means, do we change the semantics of SPARQL!. As you can see in Table VIII-b \mu_17 in \Omega_4 is annotated with false, i.e. \Omega_4 is deemed as empty. Therefore, the result of \Omega_1 LeftOuterJoin \Omega_4 = \Omega_1 as shown in Table VIII-c. Mappings annotated with false can be ommitted from the Tables, since they are treated as absent (as we explain in the paper), however we included them for presentation reasons. Finally, note that in the paper we use boolean and ranked trust as indicative examples from a large body of applications ( bag semantics, view maintenance & update, access control, probabilistic databases) to discuss requirements of provenance models. These cannot be supported by an annotation mechanism defined for the trust assessment application only. > Further comments: > > * It is not clear what you mean by "abstract provenance model". Sec.2.1 does > not give an appropriate definition. It is also not clear what a "provenance > expression" is or what you understand of a "abstract operations" (second > sentence in Sec.2.2). > We introduce the term "abstract provenance model" to distinguish the provenance models from the different annotation models. As we state in the paper, "[...] abstract provenance models to capture the relationship of query results with source data along with the query operators that combined them". An abstract provenance model consists of provenance tokens (i.e., source data annotations) and provenance operators that record the query operators. A provenance expression is an expression that involves those abstract tokens and operators. Depending on the application, one can evaluate these abstract provenance expressions by substituting the tokens with concrete values, and the abstract operators with concrete ones. As we discuss in the paper, in the case of boolean trust, the former can be substituted with true/false, and the latter with conjunction or disjunction. For ranked trust, the former can be positive integers whereas the latter the max, and '+' functions on those. The model that you propose in tSPARQL resembles to the best of our understanding to an abstract provenance model since at the end, the triples are annotated with expressions and not with values true/false. The EnTrust model is closer to annotation computation than the former. > * As an alternative approach to the use of an "abstract provenance model" you > discuss the annotation of source data with values (Sec.2.2). That's, > basically, what I do with tSPARQL, only that the trust values in my case are > not assumed to be fixed (i.e. calculated once) and global (i.e. not > subjective). However, it might seem a bit strange that you base your whole > argumentation in Sec.4.3 on my annotation based approach when you state in > Sec.2.2 that annotation based approaches are unsuitable in the context of > Linked Data. > * In the first paragraph of Sec.3.1 you introduce a boolean trust based > example. You may want to mention here that a more sceptical/pessimistic user > may associate both operators \oplus and \odot with the a logical AND. This is exactly the benefit of an abstract provenance model when compared to annotation-based models! One does not need for every application and user to compute and store the provenance of the result! as it would be the case with annotation computations. One simply has to choose the appropriate tokens and operators and perform the computation once. > * In Sec.4.4 you write that for DESCRIBE queries "the output contains all > triples that have that value as a subject, predicate or object." That's not > true. The SPARQL spec does not prescribe what exactly the result of a > DESCRIBE query is. As far as the semantics of the DESCRIBE SPARQL operator are concerned: there is no formal definition of the semantics, but according to the SPARQL standard, the informal semantics are captured by what we discuss in the paper. [See 10.4 Descriptions of Resources at http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050419/] > * In the paper you mention several times that you aim to find a provenance > model for Linked Data or for SPARQL queries over Linked Data. However, that's > not what you discuss in the paper. Everything that you do in the paper is > related to SPARQL queries. There is nothing Linked Data specific about this. > The execution of SPARQL queries over Linked Data is only a good use case for > your work. Well, Linked Data is expressed in RDF which are queried with SPARQL. Linked Data is a global dataspace where data from different sources are integrated and accessed by a large set of users. Consequently, Linked Data is an excellent motivation for provenance applications with requirements that cannot be fully addressed by annotation-based models as we clearly discuss in the paper. > Please don't take these comments as a general deprecation of your work. > I really like your analysis; it is a very valuable contribution! > > Greetings, > Olaf Best, Irini
Received on Thursday, 20 May 2010 08:45:12 UTC