Re: SPARQL, named graphs and default graph

Andy Seaborne wrote:
> Nuutti Kotivuori wrote:
>> Nuutti Kotivuori wrote:
>>> - There is no way to explicitly match statements that have an
>>> unknown origin, since the origins are just distinct blank nodes.
>> Hey! I just thought of something.
>> This ofcourse assumes that SPARQL would allow blank nodes to be graph
>> names.
>> SELECT ?s ?p ?o
>> WHERE { GRAPH ?g { ?s ?p ?o }
>> FILTER isBlank(?g) }
>> Would this work?
>
> Would this work to avoid relying on blank nodes for graphs because,
> strictly, does not allow blank nodes as GRAPH names.

This didn't quite make sense to me. But if I understood correctly,
you'd like to avoid relying on blank nodes as being graph names as
SPARQL spec doesn't allow them.

> Minting URIs is always possible so giving things explicit names is
> always possible.  (Analogy: This is a bit like Steve's one-extra slot
> for system management; sometimes using that extra indirection helps.
> It is the system name for the graph.).

Yes, minting URIs is always possible - and regardless of the choice
done, the user of an API can always do that himself.

> One way to use this is to make one graph the manifest of what's in the
> store. That can be the default graph.
>
> _:a :origin   <somePlace> .
> _:a :assigned <myChoiceOfURI>
>
> SELECT ?s ?p ?o
> WHERE {
> GRAPH ?g { ?s ?p ?o }
> ?blank :assigned ?g .
> ?blank :origin ?origin . }
>
> finds all the statements with a known origin (if "unknown" becomes no
> :origin triple, then the query can find them by an idiom of finding
> where triple isn't mentioned:
>
> SELECT ?s ?p ?o
> WHERE {
> GRAPH ?g { ?s ?p ?o }
> ?b :assigned ?g .
> OPTIONAL { ?b :origin ?origin }
> FILTER (!bound(?origin))
> }
> (All triples in graphs with no associated :origin recorded)

Yes, this is possible. But, it would be one more stage of indirection
which would complicate things - I would want to make sure that the
user can still do:

SELECT ?s ?p ?o
WHERE { GRAPH <http://www.example.com/#something> { ?s ?p ?o } }

And it would work as assumed. So this solution comes closer to the one
I mentioned in the original mail.

But the real problem with approaches like this is that I need to be
able to store several million triples - and it might be that the
majority of them have "no origin". I don't want to bear the storage
cost of several million generated URIs (UUID URIs or otherwise). So
I'd have to come up with some really inventive way of encoding the
intermediate nodes without incurring any storage costs.

> Statements of unknown origin can go into separate graphs or all into
> one graphs as the app chooses.  One decision to be made is around
> whether
>
> :s :p :a .
> :a :q :b .
>
> is supposed to match the pattern { ?s :p ?o . ?o :q ?v }
> where :s :p :a and :a :q :b are unknown or of different origins.
>
> This is one way to differentiate context mechanisms for statements
> from named graphs.

Yes, this decision was what I had earlier on the choice of one shared
or several distinct blank nodes. I'm still not sure which would be
better for the user in the common case.

> Blank nodes are first class objects in RDF but as existential
> variables you have to know what they range over.  The thing I like
> about named graphs is that it makes the minimal assumption as to the
> semantics across graphs.  There is no implied relationship between the
> domains being described by the graphs. That is, it has no fixed
> semantics.  It's the way the app uses the graphs that makes the
> connections.  In the absence of a standard approach these kinds of
> provenance issues, that's as far as it can go and be general.
>
> [[
> I find these two useful items by Pat Hayes about blank nodes:
> http://www.ihmc.us/users/phayes/RDFGraphSyntax.html
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0153.html
> ]]

Oh, cool! Those were a nice read. And I'm glad to see that my
comprehension of blank nodes was rather correct. Now that I'm certain
of that, I can go and try to grok entailment, graph merges and all
that again :-)

-- Naked

Received on Thursday, 14 September 2006 12:22:44 UTC