- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 8 Aug 2006 06:54:37 -0700
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
>On Aug 7, 2006, at 8:40 PM, Pat Hayes wrote: > >>>Slight emendation: >>> >>>On Aug 7, 2006, at 5:22 PM, Bijan Parsia wrote: >>>[snip] >>> >>>>"""The answer set of a query is the largest set of query answers >>>>that are entailed by the answer KB such that no answer in the set >>>>is entailed by any other answer in the set.""" >>>> >>>>Non-redundancy. >>>[snip] >>> >>>DQL distinguishes between the answer set and the response set: >>> >>>"""Response Set >>>While there are no global requirements on a response set other >>>than that all its members are correct answers, it is recommended >>>that servers ensure that answer bundles do not contain duplicate >>>or redundant answers, i.e., answers which are subsumed by other >>>answers. One answer subsumes another if its bindings are a >>>superset of the bindings in the other answer. Servers that are >>>able to guarantee that their response sets contain no duplicate >>>answers can be called non-repeating. Servers that are able to >>>guarantee that their response sets contain no duplicate or >>>redundant answers can be called terse. Servers that are able to >>>guarantee that their response sets will be correctly terminated >>>with 'none' can be called complete.""" >>> >>>OWLQL (<http://ksl-web.stanford.edu/KSL_Abstracts/KSL-03-14.html>) >>>as a more elaborate discussion. >>> >>>I think I prefer the way that SPARQL does it, if DISTINCT gets >>>fixed. I certainly don't want to have the granularity of >>>redundancy placed at the server level. >> >>I still think this is the best stance for the standard to adopt. > >It's silly. We can easily do this on a query by query level and let >servers do the best they can and communicate when they can't do >better. > >>I can see a perfectly good utility for servers which run fast but >>do not *guarantee* non-redundancy. > >So they should fault if the user requests it. Which I think they are >by saying "DISTINCT". OK, I see now what your point is. I agree, though see below. >>Im quite sanguine with this because the economic pressures on >>servers and customers seem to converge on eliminating redundancy >>where practicable: there is no motivation for anyone to >>gratuitously introduce redundancy for no reason, > >And yet they do. Plus it's not always gratuitious, but yet not >desired. E.g., aggregation, or just multiple people entering data >over time. True, it happens. But the question is, whose responsibility is it to fix snarky data? I don't see why a query engine should take on the responsibility to do this. I see a SPARQL engine as basically a broker between someone who has information stored and someone else who needs information. Its not the broker's job, necessarily, to fix up the data perfectly. Although I agree with your point above about if the users asks for a form of perfection, then the engine should deliver it or fail. >>only to save the considerable work involved in checking for >>non-redundancy when an absolute guarantee of nonredundancy is not >>required. > >Then don't include the DISTINCT keyword and all conforming servers >will behave as you like. > >>I would vastly prefer to use such a server than one that times out >>trying to establish a minimal answer set, particularly when we >>might be talking about answer sets with high orders of magnitude. > >Pat, I guess what you don't understand is Indeed, I did *not* understand that this was your position. OK, we are much closer than it seemed. >that, as I've said several times now Sorry I missed that. >, it's perfectly reasonable to allow for (reasonable) redundancy >with a plain select clause (this is how SQL works, I believe) but >have a non-redundant answer set *WHEN THE USER REQUESTS IT*. I agree. >And the natural way for a user to request it is with DISTINCT. But >then we have to define what a non-redundant answer *is*. True. >And in the standard conforming reading of the RDF Semantics, it's >going to involve some work and cannot involve treating BNodes as >denoting terms. At least, it would not be sensible to do so. I take your point, but let me respond to it. IF we want to allow for the possibility of 'told bnodes', ie if we want to allow bnodes to be delivered as answer bindings and the subsequently re-used in later queries, with the intention of asking for more information about the thing asserted to exist; then what appears to be redundancy in answers might not really be redundant, because there may be answers ex:a and _:x given which are not redundant because the KB contains information about _:x which was not mentioned in the query but which can be used to distinguish it from ex:a. (What this amounts to, formally, is allowing bnode scopes to extend across multiple answer documents, by the way, hence bnodes act even more like true names: in effect, the entire sequence of transactions takes place inside the scope of the existential quantifier.) I am not sure what we should do about DISTINCT in these circumstances. It seems to be asking too much to require that the semantics actually establishes (not (= a b)) for every pair of bindings. Maybe it is simply too complicated to be both nonredundant and also play told-bnode games, and we should simply ignore this matter and apply DISTINCT to the actual answer bindings, regardless of what other information is available in the KB (?) >And you can get your desired behavior from *EVERY* SPARQL server by >not including the DISTINCT. Why do you want to go shopping around >for servers etc. Honestly, I did not appreciate that this was the position you were arguing for. I thought you wanted to impose DISTINCT on every SELECT query in every conforming engine, in effect: and it was that position which I was opposed to. But it seems we have been violently agreeing about this for some time. Sorry about the misunderstanding. >>>If I can't compute a non-redundant answer because I've run out of >>>resources, I should timeout/fault with out of memory, whatever. If >>>I have an imcomplete minimizer, I should be able to verify that >>>that my answer set is minimal, or fault. >> >>*You* should, yes. That is, the user has the option of computing a >>minimal answer set if it is absolutely required. > >Obviously, I was speaking as a server. I misunderstood you. Again, I apologize. >Frankly, I think this sort of behavior is exactly the sort of thing >that should be standardize and in the query language. I sincerely >doubt most users have the sophistication to get it right, and I >don't see why they should have to. > >*Who* was going on about putting burdens on the implementers instead >of the users? This is a clear case of a ridiculous burden on the >users. And there's a nice optout for the implementers: Don't support >DISTINCT and advertize that. I agree, that is exactly the kind of 'free market' options I would like to allow. Purely as a political point, it would be nice if an engine could do this and still count as conforming with the spec. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 8 August 2006 13:54:59 UTC