Re: What should we call RDF's ability to allow multiple models to peacefully coexist, interconnected? from David Booth on 2014-03-07 (semantic-web@w3.org from March 2014)

From: David Booth <david@dbooth.org>
Date: Fri, 07 Mar 2014 15:00:13 -0500
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: semantic-web <semantic-web@w3.org>
Message-ID: <531A254D.1010903@dbooth.org>
Hi Alan,

On 03/07/2014 12:44 PM, Alan Ruttenberg wrote:
> Can you explain what you mean by "RDF's ability to allow multiple data
> models to peacefully coexist, interconnected, in the same data" ?

Yes.  Here is an imprecise illustration, on slides 10-17:
http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf
(I took some artistic liberties blurring class/instance distinctions in 
that diagram.)

And here is a more precise example that cleanly distinguishes classes 
from instances:
http://tinyurl.com/pzsgf7f
(I've also attached the same illustration, for offline readers.)

In this latter example (of a hypothetical systolic blood pressure 
measurement), the same information is represented according to two 
different models/schemas/vocabularies/ontologies, v1 (green) and v2 
(red).  (I am using the terms model, schema, vocabulary and ontology 
loosely and somewhat interchangeably here.)

In the v1 model, the systolic blood pressure is indicated in RDF like this:

   ex:patient319 foaf:name "John Doe" ;
     v1:bps ex1:bp_023 .

   ex1:bp_023 a v1:SystolicBPSitting_mmHg ;
     v1:value 120 .

Whereas in the v2 model, the same information is represented 
differently, in RDF like this:

   ex:patient319 foaf:name "John Doe" ;
     v2:bps ex2:bp_409 .

   ex2:bp_409 a v2:SystolicBP ;
     v2:pressure 120 ;
     v2:units v2:mmHg ;
     v2:bodyPosition v2:sitting .

Thus, although ex1:bp_023 and ex2:bp409 capture the same blood pressure 
information, they represent that information differently.  Nonetheless, 
both representations can peacefully coexist in the same merged RDF data 
without conflict, which might happen, for example, if one is derived 
from the other through inference.

Furthermore, the relationship between these classes, 
v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the relationship 
between the corresponding v1 and v2 instance data, can also be 
explicitly captured in RDF, as the v1v2:SystolicBP_Transform (yellow) 
relationship:

   v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform v2:SystolicBP .

Inference rules for v1v2:SystolicBP_Transform could therefore convert a 
v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP measurement or 
vice versa.

This example only illustrated the case where the transformation from one 
model to the other is lossless and thus reversible.  Usually that isn't 
the case.  Relating models and transforming between them is *not* easy, 
but at least RDF makes it possible to explicitly indicate these 
relationships.

Obviously some intelligence must be exercised to avoid, for example, 
accidentally thinking that ex:bp_023 and ex2:bp_409 represent two 
distinct blood pressure measurements, and thereby double counting them, 
but that's easy enough to do.

Also, there isn't always a desire to relate or transform between models. 
  Sometimes some data is related and other data is not, and it is all 
still merged into the same RDF graph.  In fact, the point may be to 
connect that part of the data that *is* related and let the rest coexist 
without being connected (or at least not *directly* connected).

The point is that these data models can peacefully coexist in RDF data 
without conflict: applications using the v1 model against the merged 
data might only see v1 instance data, whereas applications using the v2 
model might only see the v2 data.  That's qualitatively different than 
in the world of XML, for example, where one schema generally wants to be 
"on top", and when you merge XML of different schemas, you need to 
create a new "top" schema.  That is the difference that I have so often 
tried to explain to people outside the RDF community, and what I am 
trying to capture succinctly in a term or phrase.   It isn't an easy 
idea to convey to those who are accustomed to a schema-centric approach. 
  I think a catchy but descriptive term or phrase could help.

Thanks,
David

>
> -Alan
>
>
> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org
> <mailto:david@dbooth.org>> wrote:
>
>     I -- and I'm sure many others -- have struggled for years trying to
>     succinctly describe RDF's ability to allow multiple data models to
>     peacefully coexist, interconnected, in the same data.  For data
>     integration, this is a key strength of RDF that distinguishes it
>     from other information representation languages such as XML.   I
>     have tried various terms over the years -- most recently "schema
>     promiscuous" -- but have not yet found one that I think really nails
>     it, so I would love to get other people's thoughts.
>
>     This google doc lists several candidate terms, some pros and cons,
>     and allows you to indicate which ones you like best:
>     http://goo.gl/zrXQgj
>
>     Please have a look and indicate your favorite(s).  You may also add
>     more ideas and comments to it.  The document can be edited by anyone
>     with the URL.
>
>     Thanks!
>     David Booth
>
>
Attachments

image/png attachment: Screenshot_from_2014-03-07_14:20:06.png
Received on Friday, 7 March 2014 20:00:41 UTC