- From: Olaf Hartig <ohartig@uwaterloo.ca>
- Date: Thu, 8 Aug 2013 09:21:23 -0400
- To: <public-lod@w3.org>
- CC: Axel Ngonga <ngonga@informatik.uni-leipzig.de>
Hi Axel, I have received a private invitation for participating in this survey before. I declined participation because some of the key questions are too imprecise or misleading. In a private response to the invitation I raised my concerns and pointed out the particular issues that I have with the survey. Since these issues still exist in the current version of the survey, I would like to make my concerns public: Your question about "Support for catalog/index update" comes with the claim that "automatic index updates [...] ensure the retrieval of complete results." First, there is an implicit assumption in this claim that I don't agree with; namely, the assumption that there exists a dependency between i) completenes of those query results that can be computed by using an index and ii) up-to- dateness of the index. I do not see how such a dependency can be justified. What I would instead assume is that the up-to-dateness of query results (rather than their completeness) is dependent on the up-to-dateness of the index (of course, the respective notions of "up-to-dateness" need to be defined more precisely). However, what worries me more is the term "complete results" - it is not clear what such a thing is. More precisely: In the context of "Question1: Result completeness" you speak about "all solutions to [a given SPARQL 1.0] query (100% recall)." To my knowledge there does not exist a definition of what a solution for evaluating a SPARQL 1.0 query over a federation of SPARQL endpoints is (and, hence, not what all of them are). Without such a definition it makes no sense to talk about "complete results." BTW, mentioning "100% recall" does not help either. The notion of recall in information retrieval (that I assume you refer to here) is defined based on an understanding of relevance. It is not entirely clear (let alone well-defined) what SPARQL solution mappings are "relevant" for a given pair consisting of a SPARQL 1.0 query and a federation of SPARQL endpoints. I can of course assume what your understanding of relevance in this case is, but this may not match with your actual understanding (or with that of any other participant in the survey). The issue with set semantics vs. bag semantics that I raise in the following is an example for such a mismatch. "Question 7: Duplicate Detection" is based on the implicit assumptions that i) some form of set-based query semantics is used and that ii) set-based semantics is the only reasonable choice. While the first assumption is not worth commenting on any further (I mentioned the lack of a well-defined query semantics before), I would certainly disagree with the latter. A bag semantics that allows me to obtain duplicates (each of which has different provenance of course) is not always a bad thing. For instance, I may chose to trust a solution more that appears multiple times in the query result than a solution that appears only once. Finally, I see that you have adjusted "Question 2: Privacy." However, I still find it misleading. In particular, authentication and access rights are hardly "privacy information." Instead, those are mechanisms for ensuring data security. Best, Olaf On Thursday 08 August 2013 09:47:35 Axel Ngonga wrote: > Dear all, > > We are currently carrying out a survey on SPARQL Query Federation > Systems, which can be found at > > https://docs.google.com/forms/d/1J5r_VOxDdTPhgxDnPsKJw8hjpXE4dj2gGGujco-xoLM > /edit > > The aim of this survey is to provide an overview of the functionality of > state-of-the-art systems. If you happen to have developed such a system, > could you please fill in the information on this system into the survey > form? This should take less than 10 minutes/system. The online survey > form will remain online until August 23rd. Please do not hesitate to > contact me if you have any questions. > > Best regards and thanks, > Axel
Received on Thursday, 8 August 2013 13:21:52 UTC