Re: Unasserted triples, Contexts and things that go bump in the night.

Pat,

As I've said, these developments greatly interest me.  I'll make some 
comments on your text, then toss in my own suggestion.

At 11:51 PM 3/19/02 -0800, Pat Hayes wrote:

>1. Introduction: why bother?

[...]

>2. Dark triples.

I love that expression!!

>The simplest proposal is therefore to simply allow an RDF graph to contain 
>triples which do not make any assertions. This allows such 'dark triples' 
>to be used by other languages to encode syntax for more complex 
>expressions (or indeed for any other purpose, as far as RDF is 
>concerned.). The point of this is that RDF triples can be used both to 
>make some simple assertions and also as a datastructure, but these two 
>uses tend to trip over one another. Allowing datastructures to be sets of 
>dark triples frees them from accidentally making assertions that are 
>inappropriate to the intentions of the user of the datatstructure.
>
>This could be done in several ways. One idea is to allow an RDF graph to 
>contain two kinds of triples, so triples need one extra bit. We could 
>encode this in N-triples by having two ways to terminate a triple, so that
>
>ex:judy ex:age ex:whatever .
>
>is an asserted triple but
>
>ex:judy ex:age ex:whatever ;
>
>is an unasserted triple. I confess to having no idea how to represent 
>something analogous to this in RDF/XML, however.

(We can fix this later - syntax is the easy bit, I think.)

>Alternatively, the unasserted triples could all be isolated in a subgraph, 
>perhaps stored in a separate file. This amounts to treating an RDF graph 
>as a merge of two subgraphs, one consisting of all the asserted triples 
>and the other of all the unasserted triples. This requires no extension to 
>the language itself, but it requires some kind of convention to indicate 
>that an entire graph is unasserted. This seems to be the simplest idea and 
>the one that requires the least change to the language.This could be done 
>in RDF/XML by a property tag in the header.

I really dislike anything that enforces physical document boundaries (e.g. 
requires that things be stored in separate files).  Ultimately, though, I 
think that's a syntactic issue.

>Or, we could introduce a more restrictive convention so that only certain 
>kinds of triple can be unasserted. For example, many of the 
>language-extension proposals involve the use of certain kinds of 
>container-like structures to encode syntax (eg daml:list). We could 
>introduce a general-purpose 'dark container' which acts much like a bag or 
>list, but is simply invisible to the RDF model theory. I think this would 
>be less generally useful, however, since it would not allow other users or 
>languages to use their own constructions.

I rather like the "dark container" idea;  I think I'm missing something as 
it seems rather similar to contexts (below).

>Another possibility is to allow certain namespaces to be declared to be 
>dark, so that any triple using a property from a dark namespace is 
>considered to be unasserted. Again, this does not require any change to 
>the syntax, but only some extra conventions to be added to the language. 
>This would allow a language to use its own particular namespace to create 
>sets of triples for the purposes of encoding its particular syntax (and 
>perhaps some other language to use a different namespace to encode its 
>syntax) .

I'm uneasy about doing a separation on namespace lines.  I think that 
overloads namespaces.

>3. Contexts, whatever.
>
>Allowing dark triples allows other languages to encode syntax in RDF 
>triples. A more ambitious kind of change to RDF would be to provide a 
>general-purpose syntax-construction service of its own. Syntax is 
>essentially trees, and a natural way to build trees out of triples would 
>be to provide some way for a set of triples to be the subject or object of 
>a triple. This would provide a very expressive general-purpose technique 
>for creating arbitrarily complex tree structures.
>
>One way to do this would be to explicitly introduce a 'context' mechanism 
>into the basic RDF syntax, as Tim did in N3. This amounts to having some 
>way to refer to a set of triples, ie an RDF graph, as a single entity 
>which can stand in the subject or object position of another triple. For 
>example, we could extend Ntriples by some notation (eg curly brackets) 
>which allows RDF to indicate an entire graph rather than a single node. 
>However, this would be a major change to RDF syntax.

Maybe, not so major?  Jonathan Borden has proposed allowing 
<rdf:RDF>...</rdf:RDF> to be used as a resource within an RDF graph.  Some 
extension to the abstract graph syntax would be needed to distinguish the 
triples contained within such a structure, and N-triples would have to be 
extended to accommodate it.

>Another way is used by Jos' Euler engine, where the graphs are simply 
>referred to by a URL used as a uriref. This requires no change to RDF 
>syntax at all, but it does require some convention to ensure that urirefs 
>used in this way are 'dereferenced', ie are treated as labels being used 
>to indicate an RDF graph, rather than simply as a referring name. The 
>model theory would need to be modified to reflect whatever convention was 
>used, but the change would be easy. Euler uses the property name of the 
>triple to determine whether or not to dereference, but I am not sure if 
>that would be adequate as a general mechanism. One way to generalize this 
>would be to combine it with the idea mentioned earlier, and allow a 
>namespace to be declared to be 'dereferencing'.

I'm really uncomfortable about having a dependency on dereferencing 
here.  I strongly feel that RDF formats should stand independently of the 
protocols that may be used to access them.

(One of my personal interests is applications that operate in disconnected 
mode via messaging;  e.g. my INET 2001 presentation: 
http://www.ninebynine.org/Presentations/INET2001-Cinderella-20010608.PDF. 
In such environments, it is convenient to bundle the various pieces 
together.  There are MIME mechanisms that can do this, but they are not 
widely supported and in some cases would just add unnecessary application 
complexity.)

>The natural way to understand either of these conventions would be that 
>the triples in the 'inner' graphs - those that are inside the curly 
>brackets, or can be got at by dereferencing - are 'dark', i.e. unasserted. 
>Since they are not included in the top-level RDF graph, this would be 
>pretty easy to do. The extra semantic complexity of these proposals comes 
>from making sense of the top-level triples that have these new structures 
>as subject or object.

But if the (sub)graph is just a denotation of another resource, don't the 
existing semantics work fine?  Core RDF would have no way to relate its 
semantics to the internal structure of such a graph resource, but I think 
that's exactly the territory that should be left to future extensions.

>Although these context ideas are appealing, they do rather go beyond the 
>current state of RDF. My own suggestion would be to stick to the more 
>boring 'dark triples' alternative at present, and leave the question of 
>what is the best way to enrich RDF syntax open at the present time, and 
>assume that people will experiment. All we really need to do is to provide 
>some way in which RDF can, as it were, get out of the way. I would suggest 
>that the easiest way to do that would be to say that the general form of 
>an RDF graph is TWO sets of triples, and the triples in the first set have 
>the usual model-theoretic interpretation, and the triples in the second 
>set have no RDF interpretation at all. The second set can be optional, so 
>if it isn't mentioned then its assumed to be empty. That makes all 
>existing RDF graphs into the new kind of graph at no cost.

...

My thoughts are this:  syntactically, allow for collections of triples to 
be used in a subject or object position.  Semantically, simply treat the 
triples as dark, but have the syntactic construct denote a resource that is 
used in the normal way in the semantics of the containing graph.  For 
N-triples, adopt the {...} syntax from N3.

e.g., for provenance:

   { some-statements } ex:comeFrom ex:someDocument .

The construct { some-statements } corresponds to a node labelled with a 
graph.  The node denotes some resource, treated pretty the same as any 
other node for the purposes of semantics;  i.e. it denotes a resource in 
the domain of discourse.  RDFcore says nothing about the meaning of 
some-statememts.

Some questions that arise are:


1. is the graph tidy on graph-labels?  I think not;  e.g. consider:

   { some-statements } ex:comeFrom ex:someDocument ;
                       ex:writtenBy ex:somePerson .

   { some-statements } ex:comeFrom ex:anotherDocument .

Given a rule of inference:

   ?s ex:comeFrom ?doc .
   ?s ex:writtenBy ?person .

=>

   ?doc dc:author ?person .

We can entail:

   ex:someDocument dc:author ex:somePerson .

But can we also entail:

   ex:anotherDocument dc:author ex:somePerson .

???


2. How would this be handled in a "triple-store"?  I think that this is 
where the concept of dark triples may be helpful:  triples that can be 
marked as dark, but more than this also linked to a resource corresponding 
to the node that may be part of a 'bright' triple, would encode the graph 
that labels a node.  Thus, the structural information is available that can 
be interpreted according to some layered language semantics.

...

What I'm trying to do here is push the dark triples idea just a little bit 
further to recognize groupings of such triples that can be treated as 
resources.  I can see that dark triples alone would be enough to encode 
this kind of thing, but I expect that the work of maintaining groupings 
(which I'm fairly sure will be wanted) will fall upon individual 
applications, which will likely end up doing the same thing in different 
ways.  I think if we go down the route of laying groundwork for 
extensibility, there's real value to standardizing the grouping mechanism 
-- and I think the common use of N3 formulae/contexts in real applications 
of RDF supports this view.

I'll also observe that, in the absence of subgraphs of dark triples, what I 
describe here reduces to exactly the current base RDF.  Therefore, simple 
applications that only process/generate RDF "ground facts" need not be 
affected.

#g


-------------------
Graham Klyne
<GK@NineByNine.org>

Received on Wednesday, 20 March 2002 08:09:49 UTC