- From: Nuutti Kotivuori <naked@iki.fi>
- Date: Wed, 13 Sep 2006 17:20:39 +0300
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: public-sparql-dev@w3.org
Richard Cyganiak wrote: > Without having thought through all the consequences ... Discussion is good! All input is appreciated. > Some of your options are not really possible with named graphs > because graphs need to be *named*, that is, the name *must* be a URI > and not a blank node. Blank nodes are always scoped to a single > graph, and using blank nodes as graph labels would make it impossible > to refer to a named graph from the outside world. This excludes #3 > and #4. The true reality of blank nodes isn't really clear to me at all, so I will have to try and bend my mind over them some more. Atleast at the program level, I don't have this kind of a restriction - blank nodes are scoped to a store (actually even more, but that's irrelevant) in my case - so different named graphs could even share a blank node, if necessary. > In SPARQL, the default graph is structurally and syntactically > handled so differently from the other graphs that I wouldn't consider > using it for the same kind of data. That is, I tend to reserve the > default graph for metadata or the merge of all named graphs. This > excludes #1 and #5. Yes, I'd rather not force the default graph to be reserved for this purpose only. > #6 has the problem of re-using a single URI for many different things > -- the statements of unknown origin in Alice's store, *and* the > statements of unknown origin in Bob's store. While workable, this is > not an elegant solution. Yes, it definitely isn't an elegant solution - but if everything else fails, that atleast works somewhat :-) > I would suggest that Alice and Bob each mint a new URI for the graph > containing the statements of unknown origin *in their own store*. Or > mint a new URI to hold each individual statement, or anything in > between. Since the owner of a URI gets to say what the meaning of the > URI is, they can declare that this chunk of URI space is reserved for > this purpose (assuming Alice and Bob each own a chunk of URI space). > > I wonder why you discounted this solution? > > I also question the existence of "statements without a known origin". > They surely didn't just pop up magically inside your triple store, > eh? I guess it's more like "statements whose origin I don't want to > model". I did think of this solution and I did discount it for a reason. I'm thinking this at the level of a Store API designer, not the end user of the store (end user being the programmer that uses the API). If Alice and Bob wanted to mint a new URI for such a graph, or for each invidual statement, they can do so. Nothing is preventing them from doing it. But, there are several use cases where Alice and Bob don't want the burden of getting such an URI themselves. They just want to add statements to a store and perhaps separate only some special external data in a separate named graph. The statements may be added from a stream of statements without any origin information, or even information if the stream is an aggregate of several graphs or not. Or the statements may be added completely separately just by some application software. So, I don't want to *force* Alice and Bob to always think about this issue. I don't want them to have to declare new URIs for just this purpose when all they want to do is use a plain-old-rdf store with some added spices. If I forced them, then I'd pretty much make all statements quadlets with the origin as a mandatory piece of information. There are ofcourse other solutions somewhat similar to this way of thinking. I could automatically generate an URI for each statement store and assign all the added statements with that as an origin. But that's not exactly right, as I don't know if the statements belong to the same graph or not. Also, it might make combining information from multiple stores a bit tricky as we lose the bit of information that told us that we didn't know the origin of these statements. And I'm pretty certain there'll be weird corner cases when the origin is just magically decided like that. Also, I could somewhat force the user to decide the origin himself, but help him as much as possible in that. If the data is read from a file, then always use the file path as an origin. If they are read from a stream, generate an URI for the stream. If they are added separately then just generate an URI separately. But I dislike this approach even more than forcing the user to give the URI. This is because we might accidentally lump several statements from distinct sources into the same uri if we just come up with something directly based on the source - like if reading from a file, the file might be just an intermediate file with a fixed name and doesn't identify the original source. The one thing I'd like to avoid is making the user feel uncertain about the magic of deciding a source. In any case, I'm still kinda undecided on what's the best way to go forward. I was already thinking of making a magic blank node that would always be distinct (used only in one triplet) that would be stored without a blank node identifier at all. When a second statement would be made with the same origin, this blank node would then have to be converted to a normal blank node that could be shared between the statements. But this again seems a bit beyond basic RDF, although it would be just an implementation detail, an optimization, kind of. -- Naked
Received on Wednesday, 13 September 2006 14:21:04 UTC