Re: Indirect Graph Identification

On Fri, Dec 9, 2011 at 6:42 AM, Kjetil Kjernsmo <kjekje@ifi.uio.no> wrote:
> Fredag 09 desember 2011 10:51, skrev Gregg Reynolds:
>
> In my opinion, what makes this non-RESTful (RESTless?) is that the
> Request-URI is different from the (embedded) graph URI. You're manipulating
> a different resource than you're identifying.

Hm, I don't think so.  The resource is identified by the entire URI,
not just the embedded URI.

>  Now, that wouldn't be a
> problem in practice if nobody ever saw the embedded URI. If that had always
> been the case, I really don't care what URI you're using, but in practice,
> I strongly suspect that's not going to be the case.
>
> So, there's a graph http://example.org/foo. We both make a copy of it and

That's a very strong assumption.  More (or just as) likely is that we
will construct our graphs from different sources, using different
techniques, etc.

> stuff it into our public endpoints with the same URI. Then, we use the
> Graph Store protocol to manipulate these graphs using indirect graph
> manipulation. Now, there are three different resources with the same URI.

Ah, but they're not the same URI!  Only the embedded URI is the same,
and it's only a part of the composite name that actually identifies
the resource.  By itself it is not sufficient to identify the
resource.

> Eeeeevil! In this case, the
> http://example.com/rdf-graphs/service?graph=http://example.org/foo URI is
> irrelevant, even though, yes they could easily be made different and yes,
> then each of these URIs correctly identifies different graphs. But that
> only takes the attention away from the real issue: At different endpoints,
> http://example.org/foo identifies different graphs.
>
> I fear that this will cause very significant problems in the future for
> SPARQL federation, where agents try to get an idea of what is on different
> endpoints by exploring service descriptions, etc. They will not be able to
> rely on that two identical URIs identifying the same graph. Therefore,
> Request-URI should be the same as graph_uri.

Ok, now I see the issue very clearly, and it is this:  if you allow
embedded identifiers you inevitably end up with a clash between global
and local semantics.  Embedded URIs are effectively local names.  They
may be globally unique syntactically, but their semantics is (are?)
scoped to the URI in which they are embedded.  This is a pragmatic
fact, not a theoretical definition (see Wittgenstein et al. on meaning
as use).  On the other hand, RDF wants URIs to have global,
universalized semantics.  But we can't have it both ways; we could
declare "embedded URIs shall have universal semantics" but it would
have little practical effect, just because that's is not in fact the
way people use scoped names, and it would be impossible to obey such a
law anyway without perfect knowledge.

(Or you could say that the notion of RESTful needs to be amended to
handle the problem of embedded URIs generally.)

The real problem is the notion that URIs have some kind of inherent
stable meaning simply by virtue of being URIs.  That's the evil bit;
it's a complete hoax.  In reality URIs have no meaning at all beyond
the way they are actually used, which cannot be legislated.  Names and
categories are inherently fuzzy, local, and unstable, but most
RDF-speak is premised on the fiction that they are crisp, universal,
and eternal.

Here's a real world example: we're working on the morphologies of
Afro-Asiatic.  You're working on Cushitic; I'm working on Omotic.  We
could use different URIs to name our graphs after the language
families.  But the line between Cushitic and Omotic is not clear, so
we decide to just use a "morphology" URI.  On your server, it means
mostly Cushitic; on mine it means mostly Omotic.  Same embedded URI,
different resources, no evil.  We can imagine that the community of
linguists might decide as a matter of convention to use the same
"morphology" URI for any sort of RDF morphological data (assume
they've agreed on a schema).  Then if you want to know what meaning a
particular researcher gives to that URI you inspect his data.  It's
morphology stuff, but it's Chinese.  Again no evil.  On the contrary,
this seems like the Right Way to do it to me; clients are interested
in the data, after all, not the graph URI.  Over time, these graphs
change as researchers add, edit, and trade data.  The "meaning" of the
"morphology" URI (as a *type*) is set by the user community (not the
URI itself, which need not even be human-readable) in very broad
terms, but the meaning of any particular *token* of that URI in use at
a particular time is local to the individual researcher's database.

Maybe a simpler example: think of a development environment where
multiple programmers are using the same embedded URI in their dev
endpoints but they're all working on different data.  Seems perfectly
normal to me.
>
> Also, I really don't see the problem of just giving the copied graph a new
> URI.

For copies that would be fine, I reckon; maybe a special "copy-of" op
could be defined (or maybe OWL has something along those lines).  But
copying is only one case.  And given the open semantics of RDF, we
cannot even define what a copy is precisely (what are the semantic
boundaries of an RDF resource that has links to other resources?)  The
best we could do is say this is /intended/ to be a copy of that.

> If you in your app need to know the "embedded URI", you would know how
> to get it, either by violating the opacity of the URI or by having some
> statements saying where it came from.
>
>> Not really; it's the composite that
>> identifies the triples, not the embedded IRI alone.
>
> If that had been the case, it would help, but in practice, this will often
> not be the case, which is what makes it evil.

I think it will always be the case, by definition.  The meaning of a
URI with a graph param is compositional.

-Gregg

Received on Friday, 9 December 2011 14:04:28 UTC