Re: SPARQL, named graphs and default graph from Nuutti Kotivuori on 2006-09-13 (public-sparql-dev@w3.org from July to September 2006)

From: Nuutti Kotivuori <naked@iki.fi>
Date: Wed, 13 Sep 2006 17:20:39 +0300
To: Richard Cyganiak <richard@cyganiak.de>
Cc: public-sparql-dev@w3.org
Message-ID: <87ejufalvs.fsf@aka.i.naked.iki.fi>
Richard Cyganiak wrote:
> Without having thought through all the consequences ...

Discussion is good! All input is appreciated.

> Some of your options are not really possible with named graphs
> because graphs need to be *named*, that is, the name *must* be a URI
> and not a blank node. Blank nodes are always scoped to a single
> graph, and using blank nodes as graph labels would make it impossible
> to refer to a named graph from the outside world. This excludes #3
> and #4.

The true reality of blank nodes isn't really clear to me at all, so I
will have to try and bend my mind over them some more. Atleast at the
program level, I don't have this kind of a restriction - blank nodes
are scoped to a store (actually even more, but that's irrelevant) in
my case - so different named graphs could even share a blank node, if
necessary.

> In SPARQL, the default graph is structurally and syntactically
> handled so differently from the other graphs that I wouldn't consider
> using it for the same kind of data. That is, I tend to reserve the
> default graph for metadata or the merge of all named graphs. This
> excludes #1 and #5.

Yes, I'd rather not force the default graph to be reserved for this
purpose only.

> #6 has the problem of re-using a single URI for many different things
> -- the statements of unknown origin in Alice's store, *and* the
> statements of unknown origin in Bob's store. While workable, this is
> not an elegant solution.

Yes, it definitely isn't an elegant solution - but if everything else
fails, that atleast works somewhat :-)

> I would suggest that Alice and Bob each mint a new URI for the graph
> containing the statements of unknown origin *in their own store*. Or
> mint a new URI to hold each individual statement, or anything in
> between. Since the owner of a URI gets to say what the meaning of the
> URI is, they can declare that this chunk of URI space is reserved for
> this purpose (assuming Alice and Bob each own a chunk of URI space).
>
> I wonder why you discounted this solution?
>
> I also question the existence of "statements without a known origin".
> They surely didn't just pop up magically inside your triple store,
> eh? I guess it's more like "statements whose origin I don't want to
> model".

I did think of this solution and I did discount it for a reason. I'm
thinking this at the level of a Store API designer, not the end user
of the store (end user being the programmer that uses the API).

If Alice and Bob wanted to mint a new URI for such a graph, or for
each invidual statement, they can do so. Nothing is preventing them
from doing it.

But, there are several use cases where Alice and Bob don't want the
burden of getting such an URI themselves. They just want to add
statements to a store and perhaps separate only some special external
data in a separate named graph. The statements may be added from a
stream of statements without any origin information, or even
information if the stream is an aggregate of several graphs or not. Or
the statements may be added completely separately just by some
application software.

So, I don't want to *force* Alice and Bob to always think about this
issue. I don't want them to have to declare new URIs for just this
purpose when all they want to do is use a plain-old-rdf store with
some added spices. If I forced them, then I'd pretty much make all
statements quadlets with the origin as a mandatory piece of
information.

There are ofcourse other solutions somewhat similar to this way of
thinking.

I could automatically generate an URI for each statement store and
assign all the added statements with that as an origin. But that's not
exactly right, as I don't know if the statements belong to the same
graph or not. Also, it might make combining information from multiple
stores a bit tricky as we lose the bit of information that told us
that we didn't know the origin of these statements. And I'm pretty
certain there'll be weird corner cases when the origin is just
magically decided like that.

Also, I could somewhat force the user to decide the origin himself,
but help him as much as possible in that. If the data is read from a
file, then always use the file path as an origin. If they are read
from a stream, generate an URI for the stream. If they are added
separately then just generate an URI separately. But I dislike this
approach even more than forcing the user to give the URI. This is
because we might accidentally lump several statements from distinct
sources into the same uri if we just come up with something directly
based on the source - like if reading from a file, the file might be
just an intermediate file with a fixed name and doesn't identify the
original source. The one thing I'd like to avoid is making the user
feel uncertain about the magic of deciding a source.

In any case, I'm still kinda undecided on what's the best way to go
forward. I was already thinking of making a magic blank node that
would always be distinct (used only in one triplet) that would be
stored without a blank node identifier at all. When a second statement
would be made with the same origin, this blank node would then have to
be converted to a normal blank node that could be shared between the
statements. But this again seems a bit beyond basic RDF, although it
would be just an implementation detail, an optimization, kind of.

-- Naked
Received on Wednesday, 13 September 2006 14:21:04 UTC