temporal nature of T-box (and A-box), and why bNodes should rarely (if ever!) be used from Ted Thibodeau Jr on 2011-03-08 (public-rdb2rdf-wg@w3.org from March 2011)

From: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Date: Tue, 8 Mar 2011 10:20:49 -0500
To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Cc: Kingsley Idehen <kidehen@openlinksw.com>
Message-Id: <CED215AD-BB43-479E-BDCA-EFDFC6A1DC42@openlinksw.com>
All --

As we're talking about the most basic, the most common, the ultimate
in "I know nothing" mapping in the Direct/Default Mapping, we must
remember that this will be used against RDB schema that are not 
fully developed, that have not been fully considered, mapped out,
planned, etc. -- as well as fully-fledged, heavily-vetted enterprise
application-serving schema.

Thus -- they will change.  (Even the long-lived ones will change,
given enough time -- or just inconvenient timing of interactions.)

This means that *all* generated URIs, whether for Classes or 
Relationships or Attributes or Entities, will change -- Cool URIs
notwithstanding.  Even RowID-based URIs may change over time, due
to DBMS migration from hardware to hardware, depending on the
methods used.

Thus, any time an RDB2RDF Ontology (i.e., a Direct Mapping Graph,
the T-box) is generated from an RDB Schema -- and further, any 
time instance data is generated (that is, triples involving 
Entities described by that RDB, i.e., an Instance Data Graph -- 
the A-box) -- these Graphs and URIs *must* be treated as temporary.

(In the end, the transformation here is not so much RDB to RDF, 
as it is RDB to DDB -- Relational Database to Deductive Database.)

However -- once transformed to RDF, Cool URIs are strongly to be
desired.  Making everything a bNode is *not* helpful to Linked
Data, nor any other long-term use of EAV+CR or RDF.

It is useful to be able to say "this thing was once described
thus, and now is described so."  It is useful to be able to
say *when* the original description was retrieved, or accurate, 
or asserted -- and the same about the *new* description.

When this is done well, and as SPARQL and the Linked Data Web 
mature -- you will be able to say "DESCRIBE <URI> AS OF <date>" -- 
and get whatever was "known" about that thing as of that moment.  
You will also be able to ask things like "Who has lived at 
<address>?" or "Where has <person> lived?" or "who has owned 
<property>?" or "How have the Top 25 Shareholders and Board 
Members of the Fortune (10, 100, 500, 1000) been interconnected 
over the past (10, 25, 50, 100) years?"

These are not the sorts of questions you can easily ask of RDBMS
through SQL -- and this is part of why we want to transform the 
data which is now found (and should remain, for many purposes!)
in those RDBMS, or at least how we interact with it.

Ontologies are like source code, in many ways.  Versions happen.
Instance data, likewise.  Everything is naturally found within
some context -- and that context must be taken into consideration
when you change your observational perspective.

Imagine you look at an apple, and describe it today.  (Red, 137 
grams, 63 cubic centimeters, etc.)  Now wait a month ... or a year.
Describe it again.  (Brown, 35 grams, 38 cubic centimeters, etc.)

It's the same apple.  RFID tag proves it.  It's the same entity;
it should be referred to by the same name/URI.  But the information
about it has changed.  Neither description is *wrong*, *if* those
AV pairs (or the graphs holding them) have time-stamp data somehow associated with them.

The same hypothetical applies to RDB T-box and A-box information,
and likewise to RDF T-box and A-box information.

Many things last a long time, and need to be described several
times at different points in their "lives."  We don't always know
what things have such long lives, and what things don't -- so it's
best to be able to always refer to the same Entity by a definite
Identifier (URI) -- even if that URI has no real meaning when it's
originally minted, and even if at some point you come up with an
inherently meaningful URI -- because owl:sameAs and similar special
Relationships can be used to draw necessary connections over time.

But these connections *cannot* be drawn when entirely ephemeral
bNodes are used for those Entities, or Attributes, etc.  UUID-based
dereferenceable URIs are fine for such purposes as bNodes have often
been used -- because UUIDs are persistent over time, and each UUDI 
can be forced to only ever refer to a single entity.  bNodes cannot
have such restrictions placed on them ... and therein lies their doom.

I hope this starts to clarify what I've been talking about in our
concalls.  But please feel free to ask for more, or raise objections
to anything you don't agree with.  Discussion is usually helpful.

Regards,

Ted






--
A: Yes.                      http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
                            //              http://twitter.com/TallTed
OpenLink Software, Inc.      //              http://www.openlinksw.com/
       10 Burlington Mall Road, Suite 265, Burlington MA 01803
                                http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
                              http://www.openlinksw.com/blog/~kidehen/
   Universal Data Access and Virtual Database Technology Providers
Received on Tuesday, 8 March 2011 15:21:19 UTC