- From: Irini Fundulaki <fundul@ics.forth.gr>
- Date: Fri, 21 May 2010 16:25:01 +0300
- To: Olaf Hartig <hartig@informatik.hu-berlin.de>
- CC: public-xg-prov@w3.org, Vassilis Christophides <christop@ics.forth.gr>, Grigoris Karvounarakis <gregkar@gmail.com>, Yannis Theoharis <ytheohar@gmail.com>
Olaf, all On 5/20/10 5:34 PM, Olaf Hartig wrote: > Hey Irini, > > On Thursday 20 May 2010 10:50:05 Irini Fundulaki wrote: > >> We believe that the main point worth noting here, is that a mapping >> annotated with the false value is treated as absent from the mapping set, >> and vice versa, a mapping that does not appear in the mapping set is >> considered as untrusted (i.e., annotated with the false value). This >> holds because of the strict semantics of the boolean trust >> application. >> > If that's you assumption, fine. But it's a wrong adaptation of the tSPARQL > semantics. A boolean-based adaptation of the tSPARQL algebra would not remove > a solution mapping only because it is associated with the trust value 'false'. > Removing these mapping would only happen if there are EnTrust operators in the > algebra expression. That's not the case in your examples. > Users of tSPARQL have to add EnTrust operators (by adding ENSURE TRUST clauses > to the query) to explicitly declare for which parts of their query the output > mappings have to be ignored if they are not trustworthy. Hence, users have the > choice. That's not the case in your adaptation where each mapping that is not > trustworthy is ignored immediately. What you seem to assume for you adaptation > of the tSPARQL semantics is that each operator in an algebra expression is > wrapped in an EnTrust operator. However, that's not the idea of tSPARQL and, > thus, that's not what should be referred to as the "trust semantics" > introduced by the tSPARQL document as you do in your paper. > > We would like to clarify that we are using the semantics of the EnTrust operator (defined in the tSPARQL specification document) for the boolean trust application when SPARQL OPTIONAL is used. At this point, we are discussing the requirements that boolean trust application prescribes to a provenance model. The semantics for the boolean trust application are clear to us from existing work on relational data for positive relational algebra queries (and consequently the SPARQL fragment without OPTIONAL). To this end, we adopted the semantics of the EnTrust Operator for OPTIONAL. >> By no means, do we change the semantics of SPARQL!. As you can see in >> Table VIII-b \mu_17 in \Omega_4 is annotated with false, i.e. \Omega_4 >> is deemed as empty. Therefore, the result of \Omega_1 LeftOuterJoin >> \Omega_4 = \Omega_1 as shown in Table VIII-c. Mappings annotated with >> false can be ommitted from the Tables, since they are treated as absent >> (as we explain in the paper), however we included them for >> presentation reasons. >> > Ah, now I see your point: in the case mu_17 is assigned the trust value > 'false' during a trust assessment that implements your provenance based > approach (i.e. that makes use of the provenance expression), you assume this > mapping did not exist during query execution and that's why the query engine > would calculate mu_21 and mu_20. > However, as far as I understand you propose to do these trust assessments at > an arbitrary time when the query results have been determined. This means, > when the query was executed mu_17 did exist (because by that time it is not > clear what trust value would be associated to it in a later trust assessment > procedure). For that reason, the query engine would never calculate mu_21. > Hence, the query could never attach a provenance expression to mu_21 - there > is no mu_21. > > In Table VIII we discuss only the requirements that boolean trust application prescribes to a provenance model. Now, concerning the evaluation of queries for provenance aware applications and for abstract provenance models: In general, query language operators work with the support of relations (mapping bags in the case of SPARQL): the relation's tuples. As shown in [1], the support of an annotated relation consists of all the tuples that are not annotated with the neutral element ("0") of abstract sum ("+"). "0" is substituted by "false" in the case of the boolean trust semiring (see [1]). In this respect, the support of the relation changes when the annotation tokens change but not the semantics of the employed query language. The solution that we propose is to capture the semantics of the OPTIONAL operator by recording the provenance expressions for some mappings that do not belong to the support of the query result (that will eventually evaluate to "false" for boolean trust). At the end, the evaluation of the provenance expressions determines the support of the query result. To conclude, first we evaluate the query under the query language semantics, and then the evaluation of provenance expressions will determine the support of the query result. [1] T. J. Green, G. Karvounarakis, V. Tannen. Provenance Semirings. PODS 2007. >> [...] >> We introduce the term "abstract provenance model" to distinguish the >> provenance models from the different annotation models. >> > Okay. However, aren't "abstract provenance models" a special kind of > annotation models. They annotate the source data and solutions with a > provenance expression. > > True. But we tried to make the distinction more clear since in the case of abstract provenance models the annotations are expressions on tokens. > >> > * In Sec.4.4 you write that for DESCRIBE queries "the output contains >> > all triples that have that value as a subject, predicate or object." >> > That's not true. The SPARQL spec does not prescribe what exactly the >> > result of a DESCRIBE query is. >> >> As far as the semantics of the DESCRIBE SPARQL operator are concerned: >> there is no formal definition of the semantics, but according to the >> SPARQL standard, the informal semantics are captured by what we >> discuss in the paper. [See 10.4 Descriptions of Resources at >> http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050419/] >> > No. Even the informal semantics (http://www.w3.org/TR/rdf-sparql- > query/#descriptionsOfResources) says: "The RDF returned [...] may include > information about other resources: for example, the RDF data for a book may > also include details about the author." > The query result in the example in 10.4.3 of the SPARQL spec includes triples > that don't contain the blank node that was bound to ?x. > > We will have a closer look at the formal semantics for DESCRIBE when they will be available. Thanks for letting us know of this detail. >> > * In the paper you mention several times that you aim to find a provenance >> > model for Linked Data or for SPARQL queries over Linked Data. However, >> > that's not what you discuss in the paper. Everything that you do in the >> > paper is related to SPARQL queries. There is nothing Linked Data specific >> > about this. >> > The execution of SPARQL queries over Linked Data is only a good use case >> > for your work. >> >> Well, Linked Data is expressed in RDF which are queried with SPARQL. >> Linked Data is a global dataspace where data from different sources are >> integrated and accessed by a large set of users. Consequently, Linked Data >> is an excellent motivation for provenance applications with requirements >> that cannot be fully addressed by annotation-based models as we clearly >> discuss in the paper. >> > Sure, it is an excellent motivation. However, you don't work on a provenance > model for Linked Data as you write in your Conclusions section. > > So, what are the data provenance requirements for Linked Data that are not addressed by the provenance models discussed in the paper? > Greetings, > Olaf >
Received on Friday, 21 May 2010 13:20:24 UTC