Re: SPARQL Query Federation Survey

Hi Axel,

I have received a private invitation for participating in this survey before. 
I declined participation because some of the key questions are too 
imprecise or misleading. In a private response to the invitation I raised my 
concerns and pointed out the particular issues that I have with the survey. 
Since these issues still exist in the current version of the survey, I would 
like to make my concerns public:

Your question about "Support for catalog/index update" comes with the claim  
that "automatic index updates [...] ensure the retrieval of complete results."  
First, there is an implicit assumption in this claim that I don't agree with; 
namely, the assumption that there exists a dependency between i) completenes 
of those query results that can be computed by using an index and ii) up-to-
dateness of the index. I do not see how such a dependency can be justified. 
What I would instead assume is that the up-to-dateness of query results 
(rather than their completeness) is dependent on the up-to-dateness of the 
index (of course, the respective notions of "up-to-dateness" need to be defined 
more precisely).

However, what worries me more is the term "complete results" - it is not clear 
what such a thing is. More precisely: In the context of "Question1: Result 
completeness" you speak about "all solutions to [a given SPARQL 1.0] query 
(100% recall)." To my knowledge there does not exist a definition of what a 
solution for evaluating a SPARQL 1.0 query over a federation of SPARQL 
endpoints is (and, hence, not what all of them are). Without such a definition 
it makes no sense to talk about "complete results."
BTW, mentioning "100% recall" does not help either. The notion of recall in 
information retrieval (that I assume you refer to here) is defined based on an 
understanding of relevance. It is not entirely clear (let alone well-defined) 
what SPARQL solution mappings are "relevant" for a given pair consisting of a 
SPARQL 1.0 query and a federation of SPARQL endpoints. I can of course assume 
what your understanding of relevance in this case is, but this may not match 
with your actual understanding (or with that of any other participant in the 
survey). The issue with set semantics vs. bag semantics that I raise in the 
following is an example for such a mismatch.

"Question 7: Duplicate Detection" is based on the implicit assumptions that i) 
some form of set-based query semantics is used and that ii) set-based 
semantics is the only reasonable choice. While the first assumption is not 
worth commenting on any further (I mentioned the lack of a well-defined query 
semantics before), I would certainly disagree with the latter. A bag semantics 
that allows me to obtain duplicates (each of which has different provenance of 
course) is not always a bad thing. For instance, I may chose to trust a 
solution more that appears multiple times in the query result than a solution 
that appears only once.

Finally, I see that you have adjusted "Question 2: Privacy." However, I still 
find it misleading. In particular, authentication and access rights are hardly 
"privacy information." Instead, those are mechanisms for ensuring data 
security.

Best,
Olaf


On Thursday 08 August 2013 09:47:35 Axel Ngonga wrote:
> Dear all,
> 
> We are currently carrying out a survey on SPARQL Query Federation
> Systems, which can be found at
> 
> https://docs.google.com/forms/d/1J5r_VOxDdTPhgxDnPsKJw8hjpXE4dj2gGGujco-xoLM
> /edit
> 
> The aim of this survey is to provide an overview of the functionality of
> state-of-the-art systems. If you happen to have developed such a system,
> could you please fill in the information on this system into the survey
> form? This should take less than 10 minutes/system. The online survey
> form will remain online until August 23rd. Please do not hesitate to
> contact me if you have any questions.
> 
> Best regards and thanks,
> Axel

Received on Thursday, 8 August 2013 13:21:52 UTC