- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Mon, 14 Aug 2006 13:21:35 +0100
- To: andy.seaborne@hp.com
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
On Aug 14, 2006, at 9:51 AM, Seaborne, Andy wrote: > We seem to have lost some text at some time in the past: it used to > say: > > """ > Definition: Distinct Solution Sequence > > A Distinct Solution Sequence has no two solutions the same. > > For a solution sequence S = ( S1, S2, . . . , Sn), then write set > (S) for the > set of solution sequences in S. > > distinct(S) = (Si | Si != Sj for all i != j) and set(distinct > (S)) = set(S) > """ That doesn't help (though it is nicer) until "!=" is defined. > There is a layering with > > * modifiers > * algebra > * BGP matching > > so DISTINCT is not directly referring to the matching but to the > solutions. Er...I don't understand what you mean here. I only think of DISTINCT as referring to the end solutions, that is, what is ultimately reported back from the evaluation of a query. This may require work at various stages of the processing, I suppose, but I'd imagine that that would be merely optimization. > So it's that "!=" :: I think it would be better to use language > here and not > "!=" because it might imply a specific relationship to "!=" in > filters. > "not the same" should mean "not the same when doing graph pattern > matching" I don't understand this, though I agree for the need of a specific definition instead of relying on undefined symbols or words. > D-entailment is not required of all systems. Then I think we need a mechanism to indicate when this is required or not. If D-entailment is not done, does that mean all tests involving numeric entities fail? > So, if D-entailment were done in BGP matching, it should include > that; if > D-entailment were not done, it should not include that. > > Data: > :x :p 1 . > :y :p "01"^^xsd:integer . > > Query: > SELECT DISTINCT ?v { ?a :p ?v } > > should have one solution if > > { :x :p ?v . :y :p ?v . } > > matches else it should have two. Hmm. Again, I would have done it by analysis of the results. Need to think more about it. This is not an unreasonable approach but it seems to lead to counterintuitive results. > Bijan Parsia wrote: [snip] > > BNODES: > > > > BNodes are much harder, overall. Consider the following answer set: > > > > 1) ?x ?y > > _:x :mochi > > :Bijan :mochi > > > > One (distinct) answer, or two? > > Can't tell - it depends on the data and isn't a characteristic of > the result set alone. This is what I don't understand. It seems clearly a characteristic of the result set alone. Consider a Constructed graph from that result set: _:x :loves :mochi. :Bijan :loves :mochi. (Template ?x loves ?y) This is clearly redundant. We can tell by the results alone. > Placing the burden on the calculation of redundancy that requires > inspecting the whole dataset is too much of a burden as we have > discussed before. We've discussed it in the context of the default answers (i.e., of non-DISTINCT). I don't recall discussing it in the context of DISTINCT. A pointer to that discussion would be helpful, thanks. If you want to be permissive, why not take the attitude that the spec has to D-entailment? Personally, I think we cannot avoid dealing with multiple sorts of entailment, even in the RDF case, even aside from RDFS. [snip] > > I would also like to be a very strong push in for a strong > > anti-redundancy reading. I think 1) and 2) should have only one > answer > > (if DISTINCT). The principle is that no DISTINCT set of answers > should > > contain redundancy. This is akin to a lean graph, and is likely > > similarly computationally expensive. (Note that source graph > leanness is > > not sufficient, as 3) shows). Thus, I think this is more > characterisitc > > of the semantics of RDF. I would encourage also text that made the > > decision parallel that of what I've seen of SQL ALL vs. DISTICT, > to wit, > > that ALL is a *computational* computational compromised and not > intended > > to correspond to the "math" of the situation. For many purposes, of > > course, that's just fine. Redundancy for time is a sensible > tradeoff. > > And I applaud have predicable, "minimal" redundancy, that is, no > more > > than what is in the graph. That's computationally and > implementationally > > straightfoward. However, I think we should *not* encourage a > "semantic" > > reading of that redundancy, where in people interpret the > redundancy as > > a *significant* part of the data. > > > > In other words, we're not supporting editors that care about the > > specific assertional structure of a graph. > > The structural access is an important use case. For *DISTINCT* queries? I'd be surprised. However, we have to balance that use case against others, and against consistency with exisiting specifications yes? > Supporting editors wanting to access the structural and redundant > nature of the graph is reasonable. Surely that's a pretty small market, I would think. > It is also one that people expect to work. But if they expect wrongly? The giving *semantic weight* to structural redundancy pretty clearly, I would argue, violates the semantics of RDF. And is inconsistent, since we do not respect URI redundancy (why not?). Editors are a *very* specialized use case and a rather dangerous one to generalize from (portals are different, I think). I think it's very important that the query language not give *misleading* answers. Thus, I think we should have a non-redundant mode in some shape or form (we could have multiple semantics, for example, as I proposed back in the day), or we should challenge the current RDF semantics *explicitly*. Obviously, this is not in our charter, so we have to at least kick it up a level. I think, from a deployment and practice point of view, that the existential reading of BNodes is *wrong*. That is, the RDF working group made the *wrong* choice in formalizing them that way. But it *is* the choice made, and there are some interesting aspects of it. But I don't think we get to eat our cake and say that we're toasting marshmallows. Cheers, Bijan.
Received on Monday, 14 August 2006 12:21:46 UTC