Re: Why do we name nodes and not edges? from Stephen Williams on 2012-07-30 (semantic-web@w3.org from July 2012)

From: Stephen Williams <sdw@lig.net>
Date: Mon, 30 Jul 2012 10:47:22 -0700
To: semantic-web@w3.org
Message-ID: <5016C8AA.2060609@lig.net>
On 7/29/12 6:09 AM, Nathan wrote:
> David Booth wrote:
>> Another approach (instead of reification, which I personally hate), is
>> to use named graphs.  Named graph have to be used differently, but can
>> often solve the same use case.
>>
>> For RDF stores that store everything as quads anyway, my guess is that
>> even if you have only one named graph per triple it would likely involve
>> less overhead than reification, but perhaps one or more of the
>> developers of such stores can comment on that more authoritatively.
>>
>
> As I understand it, Melvin is looking for a well defined function that would allow one to canonicalize a triple (edge) in to a 
> unique URI. Such that f(subject, predicate, object) = edge:123234234 .
>
> Reification allows you to name a triple, but it's not in a canonical form with a unique name per triple.

At at W3C plenary at MIT several years ago, I asked TBL why triples and not quads.  To which he replied, they are quads: the 
forth element is just usually implied (or something close to that).

I've long thought that we need unique identification of each triple and to be able to uniquely group arbitrary subsets of 
statements in a "triple store" so that the subset can be referred to easily.  My solution is to represent "triples" as pents: 
triple+ID+context, where context is very general purpose and semi-automatically maintained.  Going further, I am mostly 
convinced that it should be a "hex" with two kinds of context: provenance / certainty (time stamps, source, several types of 
trust) and statement subset association.  (There is one further level needed in my system, but I won't go into that here yet.)  
I need to implement this soon and have a number of ideas about how this should work to be efficient and scalable.

Please let me know if you are interested in exploring the idea and helping to implement this in one way or another.  In 
particular, I need (and may create) a SQLite-like licensed library (Apache 2, MIT, or a commercial license with few 
restrictions, etc.) that can be used widely without restriction.  Which may of course just be a layering on SQLite initially, 
although that likely won't be efficient and scalable enough for my purposes.

With current standards, this would be externalized as reified RDF if "everything" were exported, or simple triples if the 
metadata is elided.  Probably a new twist on external representation would be useful.  Additionally, based on my work related to 
W3C EXI and my own binary XML work, I have had a number of ideas related to a binary RDF/pent/hex/ntuple interchange format.  
This is also something I'm going to need soon.

Named graphs are the beginnings of how to do this, and everything could be done through the fourth term in a quad.  However, 
this is likely to be cumbersome and I don't see current implementations actually solving the problem properly yet.

>
> In logic we assign symbols to statements all the time (~A & B), but not in a well defined way where each unique statement has 
> exactly one canonical name.
>
> An interesting question, is whether two identical triples (edges) from different documents would share the same canonicalized 
> form, or whether the provenance / named graph would need to be part of the canonicalization. More of a f(subject, predicate, 
> object, graph) = <edge:graph#123wer234d23> where 123wer234d23 is a hash(subject, predicate, object).

This is one good solution.  Another, applicable sometimes, is to just have serial numbers relative to some database.  One 
semantic web idiom is that the only unambiguous reference to a triple or set of triples is a complete restatement of those 
triples.  It is basically the same however to define a temporary term in a local context like A = {set of triples}, then make 
statements about A.  An externalized set should be able to do that and even reference a subset in a database elsewhere.

>
> One use case of for this (from Melvin) would be to apply weights to statements: { X :magnitude 10 } where X is a uri which 
> identifies the statement { :Bob :trusts :Mary } .

There are many cases where you need to describe provenance, trust/probability, and make statements about groups of statements. 
It shouldn't be so hard or confusing.

>
> Best,
>
> Nathan

sdw
Received on Monday, 30 July 2012 17:47:48 UTC