- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Mon, 14 Aug 2006 13:21:35 +0100
- To: andy.seaborne@hp.com
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
On Aug 14, 2006, at 9:51 AM, Seaborne, Andy wrote:
> We seem to have lost some text at some time in the past: it used to
> say:
>
> """
> Definition: Distinct Solution Sequence
>
> A Distinct Solution Sequence has no two solutions the same.
>
> For a solution sequence S = ( S1, S2, . . . , Sn), then write set
> (S) for the
> set of solution sequences in S.
>
> distinct(S) = (Si | Si != Sj for all i != j) and set(distinct
> (S)) = set(S)
> """
That doesn't help (though it is nicer) until "!=" is defined.
> There is a layering with
>
> * modifiers
> * algebra
> * BGP matching
>
> so DISTINCT is not directly referring to the matching but to the
> solutions.
Er...I don't understand what you mean here. I only think of DISTINCT
as referring to the end solutions, that is, what is ultimately
reported back from the evaluation of a query. This may require work
at various stages of the processing, I suppose, but I'd imagine that
that would be merely optimization.
> So it's that "!=" :: I think it would be better to use language
> here and not
> "!=" because it might imply a specific relationship to "!=" in
> filters.
> "not the same" should mean "not the same when doing graph pattern
> matching"
I don't understand this, though I agree for the need of a specific
definition instead of relying on undefined symbols or words.
> D-entailment is not required of all systems.
Then I think we need a mechanism to indicate when this is required or
not. If D-entailment is not done, does that mean all tests involving
numeric entities fail?
> So, if D-entailment were done in BGP matching, it should include
> that; if
> D-entailment were not done, it should not include that.
>
> Data:
> :x :p 1 .
> :y :p "01"^^xsd:integer .
>
> Query:
> SELECT DISTINCT ?v { ?a :p ?v }
>
> should have one solution if
>
> { :x :p ?v . :y :p ?v . }
>
> matches else it should have two.
Hmm. Again, I would have done it by analysis of the results. Need to
think more about it. This is not an unreasonable approach but it
seems to lead to counterintuitive results.
> Bijan Parsia wrote:
[snip]
> > BNODES:
> >
> > BNodes are much harder, overall. Consider the following answer set:
> >
> > 1) ?x ?y
> > _:x :mochi
> > :Bijan :mochi
> >
> > One (distinct) answer, or two?
>
> Can't tell - it depends on the data and isn't a characteristic of
> the result set alone.
This is what I don't understand. It seems clearly a characteristic of
the result set alone.
Consider a Constructed graph from that result set:
_:x :loves :mochi.
:Bijan :loves :mochi.
(Template ?x loves ?y)
This is clearly redundant. We can tell by the results alone.
> Placing the burden on the calculation of redundancy that requires
> inspecting the whole dataset is too much of a burden as we have
> discussed before.
We've discussed it in the context of the default answers (i.e., of
non-DISTINCT). I don't recall discussing it in the context of
DISTINCT. A pointer to that discussion would be helpful, thanks.
If you want to be permissive, why not take the attitude that the spec
has to D-entailment?
Personally, I think we cannot avoid dealing with multiple sorts of
entailment, even in the RDF case, even aside from RDFS.
[snip]
> > I would also like to be a very strong push in for a strong
> > anti-redundancy reading. I think 1) and 2) should have only one
> answer
> > (if DISTINCT). The principle is that no DISTINCT set of answers
> should
> > contain redundancy. This is akin to a lean graph, and is likely
> > similarly computationally expensive. (Note that source graph
> leanness is
> > not sufficient, as 3) shows). Thus, I think this is more
> characterisitc
> > of the semantics of RDF. I would encourage also text that made the
> > decision parallel that of what I've seen of SQL ALL vs. DISTICT,
> to wit,
> > that ALL is a *computational* computational compromised and not
> intended
> > to correspond to the "math" of the situation. For many purposes, of
> > course, that's just fine. Redundancy for time is a sensible
> tradeoff.
> > And I applaud have predicable, "minimal" redundancy, that is, no
> more
> > than what is in the graph. That's computationally and
> implementationally
> > straightfoward. However, I think we should *not* encourage a
> "semantic"
> > reading of that redundancy, where in people interpret the
> redundancy as
> > a *significant* part of the data.
> >
> > In other words, we're not supporting editors that care about the
> > specific assertional structure of a graph.
>
> The structural access is an important use case.
For *DISTINCT* queries? I'd be surprised. However, we have to balance
that use case against others, and against consistency with exisiting
specifications yes?
> Supporting editors wanting to access the structural and redundant
> nature of the graph is reasonable.
Surely that's a pretty small market, I would think.
> It is also one that people expect to work.
But if they expect wrongly? The giving *semantic weight* to
structural redundancy pretty clearly, I would argue, violates the
semantics of RDF. And is inconsistent, since we do not respect URI
redundancy (why not?). Editors are a *very* specialized use case and
a rather dangerous one to generalize from (portals are different, I
think).
I think it's very important that the query language not give
*misleading* answers. Thus, I think we should have a non-redundant
mode in some shape or form (we could have multiple semantics, for
example, as I proposed back in the day), or we should challenge the
current RDF semantics *explicitly*. Obviously, this is not in our
charter, so we have to at least kick it up a level.
I think, from a deployment and practice point of view, that the
existential reading of BNodes is *wrong*. That is, the RDF working
group made the *wrong* choice in formalizing them that way. But it
*is* the choice made, and there are some interesting aspects of it.
But I don't think we get to eat our cake and say that we're toasting
marshmallows.
Cheers,
Bijan.
Received on Monday, 14 August 2006 12:21:46 UTC