Re: temporal nature of T-box (and A-box), and why bNodes should rarely (if ever!) be used from Alexandre Bertails on 2011-03-09 (public-rdb2rdf-wg@w3.org from March 2011)

From: Alexandre Bertails <bertails@w3.org>
Date: Wed, 09 Mar 2011 11:29:59 -0500
To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>, Kingsley Idehen <kidehen@openlinksw.com>
Message-ID: <1299688199.16232.1343.camel@simplet>
Ted,

I agree with a lot of your remarks and fully understand your concern
about the temporal aspect of data. Please read my comments below in the
perspective of the Direct Mapping. R2RML folks may have other opinions.

On Tue, 2011-03-08 at 10:20 -0500, Ted Thibodeau Jr wrote: 
> All --
> 
> As we're talking about the most basic, the most common, the ultimate
> in "I know nothing" mapping in the Direct/Default Mapping, we must
> remember that this will be used against RDB schema that are not 
> fully developed, that have not been fully considered, mapped out,
> planned, etc. -- as well as fully-fledged, heavily-vetted enterprise
> application-serving schema.

Relational databases don't understand the notion of "revision" by
default, it's not part of the model.

Also, the Direct Mapping is meant to be a simple function from RDB to
RDF [1]. You can easily chain it with other transformations. I believe
that handling the temporal aspect in such a transformation instead of
the Direct Mapping itself makes a lot of sense.

> 
> Thus -- they will change.  (Even the long-lived ones will change,
> given enough time -- or just inconvenient timing of interactions.)
> 
> This means that *all* generated URIs, whether for Classes or 
> Relationships or Attributes or Entities, will change -- Cool URIs
> notwithstanding.  Even RowID-based URIs may change over time, due
> to DBMS migration from hardware to hardware, depending on the
> methods used.

Yes they will change, but we don't care! The denotational semantics is
just a function. So its only requirement is that the result of the
Direct Mapping depends only of its input, the RDB database. It does not
depend on time.

Of course, you need a notion of equality to compare the results, in
presence of blank nodes. Hopefully, it's already defined in a
Recommendation and it's called graph isomorphism [2].

The equality of URIs is syntactical. If you use a UUID-like process to
generate them, you'll have to give us a way to compare two graphs that
are supposed to be isomorphic. Ivan proposed something based on SPARQL
but in my experience, defining an equality relation is hard and we
should avoid it.

> 
> Thus, any time an RDB2RDF Ontology (i.e., a Direct Mapping Graph,
> the T-box) is generated from an RDB Schema -- and further, any 
> time instance data is generated (that is, triples involving 
> Entities described by that RDB, i.e., an Instance Data Graph -- 
> the A-box) -- these Graphs and URIs *must* be treated as temporary.
> 
> (In the end, the transformation here is not so much RDB to RDF, 
> as it is RDB to DDB -- Relational Database to Deductive Database.)

The denotational semantics [3] just denotes RDB objects in the RDF
world. I have never seen RDB people saying that a particular tuple is
persistent over time, nor they allow people to know if it was
modified. If they want to do so, they *encode* this information.

You basically want to embed a new information that does not exist in
RDB per se: it is just deducted. So it's definitely beyong the goal of
the Direct Mapping.

> 
> However -- once transformed to RDF, Cool URIs are strongly to be
> desired.  Making everything a bNode is *not* helpful to Linked
> Data, nor any other long-term use of EAV+CR or RDF.
> 
> It is useful to be able to say "this thing was once described
> thus, and now is described so."  It is useful to be able to
> say *when* the original description was retrieved, or accurate, 
> or asserted -- and the same about the *new* description.
> 
> When this is done well, and as SPARQL and the Linked Data Web 
> mature -- you will be able to say "DESCRIBE <URI> AS OF <date>" -- 
> and get whatever was "known" about that thing as of that moment.  
> You will also be able to ask things like "Who has lived at 
> <address>?" or "Where has <person> lived?" or "who has owned 
> <property>?" or "How have the Top 25 Shareholders and Board 
> Members of the Fortune (10, 100, 500, 1000) been interconnected 
> over the past (10, 25, 50, 100) years?"

That's indeed an interesting question. But in my opinion this is beyond
RDB2RDF. I would call this thing RDB+revisions2RDF.

Also, if I understand well, you're only point against the use of blank
nodes is this use-case.

> 
> These are not the sorts of questions you can easily ask of RDBMS
> through SQL -- and this is part of why we want to transform the 
> data which is now found (and should remain, for many purposes!)
> in those RDBMS, or at least how we interact with it.
> 
> Ontologies are like source code, in many ways.  Versions happen.
> Instance data, likewise.  Everything is naturally found within
> some context -- and that context must be taken into consideration
> when you change your observational perspective.

When I started working on the Direct Mapping, one of my questions was to
ask the Working Group to define the input and the output. Eric and I
spent time asking people to review our model for RDB [4]. Our definition
for RDB is basically the subpart of SQL that let you describe a
relational database and its data.

Over the past year, people tried to add more and more orthogonal
concepts to RDB:
* data modeling considerations, such as
  * many-to-many relation
  * hierarchical table
* revisions / temporal nature

I find it very sad that for a group called RDB2RDF, we are still
arguing about what RDB actually is...

Alexandre Bertails.

[1] http://www.w3.org/2001/sw/rdb2rdf/directMapping/?english#database-semantics
[2] http://www.w3.org/TR/2003/WD-rdf-concepts-20030123/#section-graph-equality
[3] http://www.w3.org/2001/sw/rdb2rdf/directMapping/?english#denotational-semantics
[4] http://www.w3.org/2001/sw/rdb2rdf/directMapping/?english#RDB

> 
> Imagine you look at an apple, and describe it today.  (Red, 137 
> grams, 63 cubic centimeters, etc.)  Now wait a month ... or a year.
> Describe it again.  (Brown, 35 grams, 38 cubic centimeters, etc.)
> 
> It's the same apple.  RFID tag proves it.  It's the same entity;
> it should be referred to by the same name/URI.  But the information
> about it has changed.  Neither description is *wrong*, *if* those
> AV pairs (or the graphs holding them) have time-stamp data somehow associated with them.
> 
> The same hypothetical applies to RDB T-box and A-box information,
> and likewise to RDF T-box and A-box information.
> 
> Many things last a long time, and need to be described several
> times at different points in their "lives."  We don't always know
> what things have such long lives, and what things don't -- so it's
> best to be able to always refer to the same Entity by a definite
> Identifier (URI) -- even if that URI has no real meaning when it's
> originally minted, and even if at some point you come up with an
> inherently meaningful URI -- because owl:sameAs and similar special
> Relationships can be used to draw necessary connections over time.
> 
> But these connections *cannot* be drawn when entirely ephemeral
> bNodes are used for those Entities, or Attributes, etc.  UUID-based
> dereferenceable URIs are fine for such purposes as bNodes have often
> been used -- because UUIDs are persistent over time, and each UUDI 
> can be forced to only ever refer to a single entity.  bNodes cannot
> have such restrictions placed on them ... and therein lies their doom.
> 
> I hope this starts to clarify what I've been talking about in our
> concalls.  But please feel free to ask for more, or raise objections
> to anything you don't agree with.  Discussion is usually helpful.
> 
> Regards,
> 
> Ted
> 
> 
> 
> 
> 
> 
> --
> A: Yes.                      http://www.guckes.net/faq/attribution.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
> 
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
>                             //              http://twitter.com/TallTed
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>        10 Burlington Mall Road, Suite 265, Burlington MA 01803
>                                 http://www.openlinksw.com/weblogs/uda/
> OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
>                               http://www.openlinksw.com/blog/~kidehen/
>    Universal Data Access and Virtual Database Technology Providers
> 
> 
> 
> 
> 
>
Received on Wednesday, 9 March 2011 16:29:56 UTC