- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Wed, 13 Sep 2006 17:49:23 +0200
- To: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
- Cc: Nuutti Kotivuori <naked@iki.fi>, public-sparql-dev@w3.org
Hi Chimezie,
On 13 Sep 2006, at 15:45, Chimezie Ogbuji wrote:
> On Wed, 13 Sep 2006, Richard Cyganiak wrote:
>> Some of your options are not really possible with named graphs
>> because graphs need to be *named*, that is, the name *must* be a
>> URI and not a blank node.
>
> I don't agree. What's the source of this assertion?
The discussion is about SPARQL, so I assumed the definition of Named
Graphs from the SPARQL spec would apply. See also various papers from
Bizer et al., e.g. [1]. As Dan pointed out, there's no community
consensus on wether Named Graphs are a good thing or not, but the
definitions that use this very term seem to require URIs as graph
names. Contexts are not Named Graphs.
[snip]
> Well, Blank nodes used within a graph can't be referred to directly
> but they can still be matched by SPARQL - doesn't make them any
> less useful. The problem isn't the use of Blank nodes for graph
> names but
> a the lack of a mechanism [2] to match the graph name(s) associated
> with a node. Given how closely coupled SPARQL is with (admittedly
> informal) named graph semantics, I would expect to be able to
> answer questions such as:
>
> "What are the graph names in which all the statements about
> <someIRI> are asserted?"
I'm afraid I'm missing the point here. Why not this?
SELECT DISTINCT ?graph WHERE { GRAPH ?graph { <someIRI> [] [] } }
(Now of course the problem is that when I allow blank nodes as graph
labels, then the answer to this query might be: "a blank node, a
blank node, and another blank node".)
[snip]
> If BNodes are used for existential assertions about nodes, why
> wouldn't they be used as existential assertions about graphs?
I can offer my personal and subjective viewpoint: If you extend RDF
triples with a fourth element that works exactly as the others, then
it instantly raises the question why not to add a fifth element? Or a
sixth?
I think that three is the sweet spot, but in practice triples often
occur in "bags", and sometimes it's useful to be able to talk about
these "bags", and I find that Named Graphs provide exactly the
minimum of machinery necessary to do that, and nothing more.
I'm sure that a full-blown fourth element (and fifth) would offer
lots of interesting possibilities, but personally I haven't come
across any urgent need for it. Named Graphs, as defined in [1] and
SPARQL, work well for me. YMMV, of course.
Yours,
Richard
[1] http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/Carroll_etall-
TrustWorkshop-ISWC2004.pdf
> And if there is some semantic consequence, it furthers the argument
> that the formalisms for named graphs should be well articulated
> before they are tightly integrated into a query language.
>
>> I would suggest that Alice and Bob each mint a new URI for the
>> graph containing the statements of unknown origin *in their own
>> store*. Or mint a new URI to hold each individual statement, or
>> anything in between. Since the owner of a URI gets to say what the
>> meaning of the URI is, they can declare that this chunk of URI
>> space is reserved for this purpose (assuming Alice and Bob each
>> own a chunk of URI space).
>>
>> I wonder why you discounted this solution?
>
> I don't think it's an elegant solution when we already have the
> means (within 'vanilla' RDF Model Theory) to express existential
> assertions - which is exactly the scenario here.
>
> If a graph label is nothing but a name associated with a set of
> graphs, why should it not behave the same as the name associated
> with a node within a graph?
>
>> I also question the existence of "statements without a known
>> origin". They surely didn't just pop up magically inside your
>> triple store, eh? I guess it's more like "statements whose origin
>> I don't want to model".
>
> How different is this from "nodes whose names I don't care to
> maintain / model?"
>
> [1] http://ninebynine.org/RDFNotes/
> UsingContextsWithRDF.html#xtocid-6303976
> [2] http://copia.ogbuji.net/blog/2006-07-14/querying-named-rdf-
> graph-aggregate
>
> Chimezie Ogbuji
> Lead Systems Analyst
> Thoracic and Cardiovascular Surgery
> Cleveland Clinic Foundation
> 9500 Euclid Avenue/ W26
> Cleveland, Ohio 44195
> Office: (216)444-8593
> ogbujic@ccf.org
>
>
>>
>>
>> On 11 Sep 2006, at 19:51, Nuutti Kotivuori wrote:
>>
>>> This isn't exactly a SPARQL question, but it is very closely
>>> related. I will first outline the question context.
>>> Assume an RDF statement store, which has a mechanism for tracking
>>> statement origin (scope, context, graph, source whatever). Many
>>> of the
>>> statements have a distinct origin, or source graph, they were
>>> imported
>>> from. But there are also those which either seemingly have no
>>> origin,
>>> or the origin is not known. The origin of these statements have
>>> to be
>>> handled somehow. We'll come to the specific choices later on.
>>> This statement store offers a SPARQL query interface into it. The
>>> facilities for querying named graphs in SPARQL would obviously be
>>> used
>>> to query the different origins in the store. But there are two
>>> things
>>> to decide. First, how should statements without an origin be
>>> accessed
>>> in SPARQL? There are several choices on this, which I will outline
>>> below. And related to the first one, second, what should the default
>>> graph be for the queries if none is given explicitly.
>>> I will list a few possibilities and mention the problems and
>>> benefits
>>> that seem to result from them as a basis for discussion.
>>> 1. Unknown origin is a distinct node, but separate from all uris,
>>> blank nodes or literals. The default graph for the query is the
>>> graph of the unknown origin nodes.
>>> - Separation of identifier spaces, no fear of any overlap. The
>>> graph of statements with unknown origin is separate from any
>>> named graph.
>>> - Since there is no way to represent the unknown origin in
>>> SPARQL
>>> syntax, the default graph is the only way to access the
>>> nodes in
>>> that graph.
>>> - The nodes in the unknown origin graph are not matched by any
>>> graph query, since the name of the graph could not be returned
>>> reasonably. That is:
>>> SELECT ?g ?s ?o ?p
>>> WHERE { GRAPH ?g { ?s ?p ?o } }
>>> cannot return ?g for the unknown origin graph.
>>> 2. Unknown origin is a distinct node, as above. The default
>>> graph is
>>> the RDF merge of all graphs in the store, including the
>>> statements
>>> with an unknown origin.
>>> - The problems above.
>>> - In addition, there is no way to select nodes that explicitly
>>> have an unknown origin. (Or is there? Could one match all the
>>> statements for which there is no graph with the same
>>> statement?
>>> In any case, this would be quite contorted.)
>>> 3. Unknown origin is represented by a distinct blank node; that is,
>>> every statement has it's own blank node as the graph name, which
>>> is not shared with any of the other statements. The default
>>> graph
>>> is the RDF merge of all graphs in the store, including the
>>> statements with an unknown origin.
>>> - This is probably closest to accurate modelling of the
>>> situation. We know every statement has an origin, we just
>>> don't
>>> know what it is - a situation commonly modelled with a blank
>>> node. Also, we don't know which statements might share an
>>> origin, so until we know better, we make them all distinct.
>>> - The origin of the statements is nicely queryable with SPARQL
>>> queries and every statement has an origin, even if unknown.
>>> - Queries which specify several statements from a single graph
>>> will not match the statements with unknown origins as it
>>> cannot
>>> be confirmed that they would be from the same graph.
>>> - There is no way to match the origin of a single statement as
>>> there is no way to match a certain blank node explicitly. The
>>> current SPARQL treats it as an open variable(?).
>>> - There is no way to explicitly match statements that have an
>>> unknown origin, since the origins are just distinct blank
>>> nodes.
>>> - Possibly hard to implement, because of the number of distinct
>>> blank nodes.
>>> 4. Unknown origin is represented by a singleton blank node; that
>>> is,
>>> every statement with an unknown origin shares one single blank
>>> node as the graph name. The default graph is the RDF merge of
>>> all
>>> graphs in the store.
>>> - Lumps all statements with an unknown origin under a single
>>> named
>>> graph. Queries which match several statements from a single
>>> graph will match statement sets from unknown origin as well.
>>> - The origin of the statements is nicely queryable with SPARQL
>>> queries and every statement has an origin, even if unknown.
>>> - There is no way to explicitly match statements that have an
>>> unknown origin, since the origin is a single blank node. If
>>> the
>>> application provided a magic type for this blank node (_:x a
>>> rdfx:UnknownOrigin), this could be matched with:
>>> SELECT ?s ?o ?p
>>> WHERE { ?g a rdfx:UnknownOrigin .
>>> GRAPH ?g { ?s ?o ?p } }
>>> But this again is quite contorted. (The same could be
>>> applied to
>>> the third case as well, but the implementation of that
>>> would be
>>> really tricky to be effecient.)
>>> 5. Unknown origin is represented by a singleton blank node as
>>> above. The default graph is the singleton blank node of unknown
>>> origin.
>>> - Mostly as above, but in the common case, explictly matching
>>> statements that have an unknown origin would be easy in just
>>> matching the statements from the default graph.
>>> 6. Unknown origin is represented by a well known URI that is shared
>>> universally. The default graph is the RDF merge of all graphs in
>>> the store.
>>> - Somewhat incorrectly asserts that the statements have a
>>> certain
>>> origin, even though we don't know the origin.
>>> - The origin of the statements is nicely queryable with SPARQL.
>>> - Statements with an unknown origin can be easily explicitly
>>> matched by comparing them against the well known URI.
>>> - Assigns a special meaning to an URI.
>>> - Hard to coordinate with a number of people implementing
>>> similar
>>> solutions if not standardized.
>>> Some other variants of the above were omitted, since their problems
>>> and benefits are easily reasoned.
>>> On irc, 'chimenzie' outlined the problem as such:
>>> 17:35 chimezie:#swig => Hmm.. well, seems like what is missing is
>>> a good
>>> definition of a 'name for nodes that don't have an explicit
>>> context'
>>> 17:36 chimezie:#swig => or rather 'a name for the context of
>>> nodes that aren't
>>> assigned to a context explicitely'
>>> So, I'm out for some input on what might be the sanest route to
>>> through this.
>>> TIA,
>>> -- Naked
>>
>>
>
Received on Wednesday, 13 September 2006 15:49:31 UTC