Re: RDF-ISSUE-5 (Graph Literals): Should we define Graph Literal datatypes? [RDF Graphs]

On Fri, 2011-03-04 at 19:26 -0600, Pat Hayes wrote:
> On Mar 4, 2011, at 3:59 PM, RDF Working Group Issue Tracker wrote:
> 
> > 
> > RDF-ISSUE-5 (Graph Literals): Should we define Graph Literal datatypes? [RDF Graphs]
> > 
> > http://www.w3.org/2011/rdf-wg/track/issues/5
> > 
> > Raised by: Sandro Hawke
> > On product: RDF Graphs
> > 
> > We could define datatypes, such as ser:rdfxml and ser:turtle, whose
> > lexical space is the set of valid document strings in RDF/XML, Turtle,
> > etc, and whose value space contains the corresponding RDF graphs.
> > 
> > This would allow people to use ordinary RDF tools to express facts involving RDF graphs, such as that some graph was obtained from some URI at some point in time, or that some person claims some graph is true or false.
> 
> Allow me to cast doubt on this claim. I do not believe that graph literals (in contrast to named graphs) would in fact provide such functionality in practice. For several reasons.

I hesitate to reply because of your regrets (and please accept my
condolences), but maybe I can have the honor of having my e-mail first
in the myriads sacks that await your return.   Or... maybe this way we
can have it all perfectly sorted before your return.

Anyway, I completely agree with what you say in favor of RDF data
talking about g-boxes -- I agree that's often the best approach -- but
I think there are other use cases for graph literals.

It would be good to work through the use cases and try to figure out the
costs of benefits of addressing each with each approach.   I might
manage to do that.

> 1. This would allow such 'metaRDF' descriptions only for the case where the object graph - the one being described - is completely specified by its full textual representation. This would make such metaRDF almost unusable for large object graphs, and exceedingly awkward, at best, for all but toy object graphs. For any graph, the g-text is a much more verbose way to refer to it than a URI would be. 

I think there are lots of real apps that use many tiny (but not "toy")
graphs of 1-50 triples, often where the graph represents an 'object' or
a simple claim.    The data store might have a billion triples, but the
granularity of the metadata is often per-triple or close to it.

Yes, the URI is probably still smaller, but it presents its own
problems, like worrying about the box contents changing.

I don't think apps like this weigh in heavily for either design.   

(I certainly agree situations with very large graphs get 'exceedingly
awkward' for graph literals.)

> 2. The full textual representation of a graph does not, ironically, serve to "identify" it in the sense required. Suppose I publish some RDF in a box with a URI. The URI identifies the box, but it does not identify the graph. The very same graph might be a snapshot of a different box with a different provenance and history and authority claiming it to be true. It is the box, not the graph, which will be asserted or will have a history or be deprecated, etc.. But a graph literal of a snap of a box does not identify the box. Even if we say that such a literal identifies any box whose snap is equivalent to the literal, the task of checking such equivalence is NP-complete (an old result of Jeremy's) so we have hamstrung our implementations ahead of time. And this is probably not a good rule to adopt, in any case, even if it were computationally cheap.

I think there are different use cases here.   Sometimes we want to
reason about the box, sometimes we want to reason about a graph that
might have been in certain boxes at certain times.   It's nice to be
able to talk about both the boxes and the graphs.

I think you're proposing that whenever we want to talk about a known
graph, we first put it in a box, and then talk about that box.   That's
the technique, as I mentioned, that I've been using in my own code for
many years.  It doesn't really need any new specs.  So, yes, it works,
but I think it'd be somewhat clearer to be able to talk about the graphs
themselves, sometimes.

(I can live with either answer to ISSUE-5; I just think it's something
we'll need to think about and decide.)

> 3. It is completely unnecessary, if we have named graphs. A named graph has a name which refers to it and identifies its box. Most descriptive languages, including RDF, use names in this way to make assertions about the things named. AFAIKS, nothing is gained by making such a graph into a literal instead of simply using its name to refer to it. And this use of graph names requires no changes to any RDF syntax (or indeed semantics.)

Again, I'm all in favor of being explicit about boxes, sometimes giving
them URI names, sometimes maybe referring to them using blank nodes.
But I disagree that "nothing is gained by making such a graph into a
literal instead of simply using its name to refer to it."   You mean
instead of simply using the name of a box that currently happens to
contain it?  What if that box content changes?  How do you find out what
URI names which graph?   These issues can be addressed, but in some
cases, particularly when you're working with boxes whose contents change
rapidly (current stock price) or continually (current temperature), I
think it's probably better to also have an explicit notion of the graphs
themselves.

    -- Sandro

> Pat Hayes
> 
> 
> > 
> > This would address some of the use cases for quads, reification, named
> > graphs, etc, with a mechanism that is very simple to understand and
> > relatively easy to implement.
> > 
> > Languages (like Turtle and RDF/XML) could be extended to provide
> > syntactic sugar for these literals, much as Turtle provides a nicer
> > syntax for numbers, but that is not necessary for these literals to be
> > useful and is not part of this proposal.
> > 
> > Some discussion in http://lists.w3.org/Archives/Public/public-rdf-wg/2011Mar/0130.html
> > 
> > 
> > 
> > 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 
> 

Received on Saturday, 5 March 2011 05:17:12 UTC