Re: Why do we name nodes and not edges? from Ivan Mikhailov on 2012-07-31 (semantic-web@w3.org from July 2012)

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Tue, 31 Jul 2012 14:34:19 +0700
To: Melvin Carvalho <melvincarvalho@gmail.com>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <1343720059.17455.40.camel@octo.iv.dev.null>

On Wed, 2012-07-25 at 17:07 +0200, Melvin Carvalho wrote:
> Why dont edges get the same treatment, ie encouragment to give it a
> (universal) name.  Is it even practical?

It is indeed practical in some special cases. In fact, this data model
is much older than the RDF. When LISP systems kept statements as ( P S O
) lists, every list had an address and thus it could be placed in S or O
position of other statement (putting it to P would be technically
possible as well, but I don't know any real example).

It's out of RDF mainstream due to the space and processing time costs
for extra data field and indexes on it. First, note two heuristics:

N times more index trees means N times slower run at a box that cost N
times more.

N times more statements in same number of index trees means log(N) times
slower run at a box that cost from log(N) to N times more.

Next, note that big applications need G as an additional field and ACID
properties. For a database, a reasonable coverage of G,S,P,O table with
indexes require at least 4 full or 3+2 "partial+full" index trees, but
adding fifth field would multiple the number of trees by factor 3 to 5,
not plain adding one more index. According to the mentioned heuristics,
it costs much more than multiplying the number of stored statements by 6
with storing extra
[] a Statement ; graph G ; subject S ; predicate P ; object O .

So there's no "scientific" or "philosophical" reason to keep edges not
named "by default", it's all about money. As a database vendor, we're
getting related questions from customers quite regularly, but no one
found the fifth column practical enough to write a feature request and
sign a contract.

The workaround for small systems is to keep G unique. "One triple per
graph" policy turns graph IRI into convenient edge IRI and the
application developer can use the existing infrastructure for free.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

Received on Tuesday, 31 July 2012 07:34:56 UTC