- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Mon, 14 Aug 2006 04:32:02 +0100
- To: Pat Hayes <phayes@ihmc.us>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
On Aug 13, 2006, at 5:01 AM, Pat Hayes wrote: >> On Aug 11, 2006, at 9:30 PM, Pat Hayes wrote: [snip] >> Of course, their major use is in definitions of alternative >> semantics. I think this point should not be put aside. Even within >> the bounds of our charter, we have simple, rdf, rdfs, and let's >> call it "assertional"/matching (possible) entailment relations to >> consider. I think the "virtual graph"/deductive closure approach >> suffers from a number of problems > > It's not universal, but I think it has many merits in the cases > where it applies. The fact that it does apply is itself a kind of > badge of computational simplicity that can be worn with pride for > some applications, and I gather (to my great surprise) that for > RDFS at least, it is in fact a computationally feasible way to > proceed, even on quite large KBs. I don't think we should entirely > trash it for the simpler entailment cases. Whatever approach we sanction should be clearly laid out. Virtual graphs are, in fact, rather tricky things, for example, in the RDFS case. (And I've built and worked with such, i.e., forward chaining RDFS systems.) Basically, they are all incomplete, and typically a bit suspect. None of them, afaik, treat BNodes existentially. (And how should they deal with redundancy? The story needs to be told there since we'll be treating certain semantic conditions differently, at least than a naive approach.) I think this all goes hand in hand with being able to specify which under which semantics you intend your query to be interpreted. We can provide a "hooky" framework that doesn't necessarily have a unified semantics across every variant. Though if we can maintain the BGP matching/algebra split, things will be *much* easier. >> , and prefer an approach which refers to the semantics of the >> relations, which opens the door for extensions to OWL. Or I prefer >> an approach where each is defined distinctly (though I prefer that >> less than the general approach). >> >>> If anything, the definitions we have been crafting simply make >>> the basic idea of pattern-matching more and more opaque. I do not >>> mean to argue against having a semantic analysis, in support of >>> semantic interoperability: but interoperability is served by the >>> answer set being (1) unique, and (2) semantically coherent. For >>> the latter, it is not necessary that the entire answer set be >>> *defined* semantically, only that the answers in it be required >>> to *satisfy* appropriate semantic conditions. >> >> This is true. But we want some guarantees that the procedural >> definition yields the semantically correct answers. > > Agreed. We would need something close to a formal proof of this in > the spec. I'm glad we agree on this key point. >> [snip] >>> I have another motive for this suggestion. As you may have been >>> able to figure out from my recent emails with Bijan, I am >>> strongly in favor of allowing SPARQL to deliver redundant answers >>> when it is dealing with a redundant KB, hence not requiring >>> absolutely irredundant answers. >> >> Though these should, IMHO, be available from the language. > > If we can possibly do this, I agree. But as a last resort, a spec > without this is better than no spec, or a spec so hard to follow > that nobody reads it. Well, the functionality and how it's presented are somewhat orthogonal. Non-redundancy of answers can be specified in the section on distinct as a post processing (I believe). That seems easy enough to understand. >> Otherwise, it's very hard to say that we are truly doing RDF >> query. Of course, there are cases where you want access to the non- >> redundant graph *per se* (e.g., editors), but I'm not as sanguine >> about that as i use to be. Browsers (in the sense of portals) and >> editors are very different. Thus, I believe the problem of >> ensuring bnode stability thoughout a "session" and getting exactly >> the redundancy in a graph should be separated. > > They seem to be related, though. The two cases in the first problem > (session scope vs. answer document scope, AKA told bnodes vs. no > told bnodes) seem to suggest different notions of redundancy, is > what bothers me. See the final example in my message to Jorge to > make the point. If the bnodes have document scope, and cannot be > referred to again, then ?x/A , ?x/_:b seems clearly redundant. But > if that can be a told bnode, you may be able to find out more about > it than you can about A. As in my :mochi example in my DISTINCT issue raising. Clearly the algebra can introduce redundancy that the graph didn't have. In the context of a single query, I think it's correct to eliminate that introduced redundancy (in the context of a DISTINCT) since, really, that's what you asked for. I.e., consider: SELECT DISTINCT ?x ?y {?x ?p ?y} :bob :loves :mochi. :bob :hates :mochi. The redundant answer set is: ?x ?y :bob :mochi :bob :mochi And I think having that as a result of DISTINCT is enormously counterintuitive. So I think we can generalize it to the BNode case, at least in some mode. (I'm arguing for inclusion.) If you don't have BNode stability, then it's all moot anyway. You could always recover distinct answers (in the mochi cases) by projecting an additional variable. > And by phrasing everything so that either option is possible (which > I think is good, BTW, and would not like to go back on), we have > kind of shot ourselves in the foot as far as being able to decide > on one of these as being THE single correct way is concerned, seems > to me. Or perhaps, only the second option is feasible, so that > there can be answer sets which *seem* redundant when you don't have > told bnodes (but aren't, er, really). Here's a problem, in general, I have about told BNodes, they are not really part of RDF as specified. They *are* part of RDF as practiced. But part of our job is to make specs cohere and to make things, generally, less confusing than more. So, I would prefer to tackle this head on rather than slide around it. Tackling it head on, imho, means either respecting the current semantics, or *changing* them. Now we aren't chartered to change them, so I would like to request that the Chair talk to the coordination group to see what possible options are. >> In fact, we should only use the latter to allow for specifying a >> standard for "acceptable" redundancy that is consistent across >> implementations (as Pat discusses below). >> >>> On the other hand, it is clearly desireable to limit the >>> unbounded amount of redundancy that a simple entailment condition >>> would allow. Stating an appropriate compromise position between >>> these extremes is difficultwhen we limited to using only >>> semantic notions and terminology, IMO largely because the very >>> idea of redundancy here is 'semantically invisible', i.e. a >>> redundant graph and its lean subgraphs are semantically >>> indistinguishable. Hence the need to protect the entailment-based >>> definitions with notions which really are not semantic at all, >>> such as the scoping set. In contrast, these scoping ideas are >>> trivial to express in an algorithmic framework, and the results >>> are intuitively very clear and easy to understand. So I think >>> that this way of dividing up the definitional work between a >>> semantic necessity and a syntactic/algorithmic sufficiency might >>> allow us to quickly and easily find a way to deal with redundancy >>> in answers which will be more or less right for practical >>> deployment. >> [snip] >> >> I do think that this should be *available* and the *default*, but >> i think too that if we don't make the irredundent answer sets >> available (if only by special user demand) then we aren't cohering >> with the semantics of RDF. > > Hmm. Not sure I agree with the "semantics of RDF" point exactly, > but never mind. Let me put it this way, either redundancy is semantically significant, or it isn't. (Or significant in some cases and not in others.) There's one story where redundancy is like redundancy in SQL, that is, a *purely* computational hack/convenience/whathaveyou. DISTINCT gives you the "real" answer set, in some sense (at least from the POV of the underlying formalism). So, if we treat BNodes as told systematically throughout the processing of a query (and even unto separate queries), always, and only, then we are treating redundancy which is *not* significant, by the semantics of the underlying formalism, as very significant. I think it would be a legitimate objection to the spec if that were the case. I think if the group wants to do that, then it should face the tension with the RDF Semantics document head on. (Personally, I would prefer that BNodes were *not* existential variables, but local names that have specific renaming conventions. Most software wouldn't have to change at all. It would cohere with most actual use and treatment of BNodes. It would be computationally saner and easier to mesh with deductive database systems. Etc. But I don't see that we just get to *do* that in SPARQL.) > OK, lets say I agree, provisionally :-). Lets move to a discussion > of exactly how to do this using DISTINCT, and therefore exactly > what it means (see above). > >> I'll raise an issue and explain the point. > > Sounds good. See my other message. Cheers, Bijan.
Received on Monday, 14 August 2006 03:32:19 UTC