Re: What should we call RDF's ability to allow multiple models to peacefully coexist, interconnected? from Timothy W. Cook on 2014-03-08 (semantic-web@w3.org from March 2014)

From: Timothy W. Cook <tim@mlhim.org>
Date: Sat, 8 Mar 2014 18:36:54 -0300
To: David Booth <david@dbooth.org>
Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, semantic-web <semantic-web@w3.org>
Message-ID: <CA+=OU3V9hgxkVTDoNtBu_5CKffAZqoNXEUTFG-L4LxnydJ9kqg@mail.gmail.com>
A very interesting and I think, foundational discussion.  David, thanks for
bringing it up.
Below is a discussion of why I believe that RDF should be considered a
layer over data models or maybe as 'semantic glue'.

 David, we are working on the same type of problem but from slightly
different perspectives.  The presentation that you linked to re:KnowMED, is
very important and I recall seeing it before.  I'll take this opportunity
to comment on it since it is in the context of this discussion.  The
indicates that you propse RDF as a language to be used in the exchange of
healthcare data.  Then on slide #5 you say it isn't enough to 'get us
there'.  So I am not sure how much of this is marketing swagger and how
much is hard fact.

 On slide #8 item #2 we are 100% in agreement.  But then on slide #9 you
are mixing apples and oranges.  XML and RDF have two different purposes
that work well together.

 On further slides, your Blue, Green and Red customers exactly indicate
what I mean by RDF being an essential layer on top of multiple models.

 What happens further in the presentation is where we disagree.  You assert
that RDF should be the language used to actually 'exchange' data. This
where RDF and the tools around it (AFAIK) are not mature enough to perform.
 Several times you have mentioned 'semantics and not syntax'. This is a
huge mistake.  You must have both in order to insure data quality and
meaning.  Secondly we know from history that top-down consensus in
healthcare concept modelling is an impossibility.[1]

 In your post describing the BP screenshot you said:
 "Thus, although ex1:bp_023 and ex2:bp409 capture the same blood pressure
information, they represent that information differently.  Nonetheless,
both representations can peacefully coexist in the same merged RDF data
without conflict, which might happen, for example, if one is derived from
the other through inference."
I take this to mean that you are representing the exact same BP measurement
data in two different ways?  Your use case, 'by inference' is a little
fuzzy for me.  If it is derivation by inference, it will just be an in
memory representation and not persisted; correct?   Irregardless, the
existence of the same data instance, in the same application is in complete
contradiction to good data quality management.  As you go on to explain,
now you must add application intelligence to analyze whether or not two
data instances are the same or not to avoid counting them as two separate
instances.  This is approach is very dangerous, in addition to adding
complexity and cost to the applications.    However, having the ability to
determine if two different data instances exactly match the same concept is
essential.  Minor differences such as the position of the patient (stitting
or prone) or the type of instrument used to perform the measurement or the
location on the body (left upper arm or right thigh, etc.) that the
measurement was taken are all important.  They may or may not rule in or
out specific measurements, based on the intended use of the query results.
 This is where RDF is essential, do these two instances point to exactly
the same code in a controlled vocabulary, etc.?    These questions are
essential to having the ability to perform machine based reasoning over the
data repository; whether at the point of care or for research purposes.

Refering back for a moment, to 'the same data instance' situation.  It is
essential to have additional information (meta-data) to determine if two
instances are are exactly the same.  This can legitimately occur during
aggregation for research or systemic quality analysis.  Unique patient
identifiers along with datetime stamps are ideal.  However, the patient
identifier issue is an ongoing problem that is actually implementation
context and application specific.  It is outside of the context of data
quality and management.

 Slide #22 clearly indicates that there is an expectation that RDF is used
as a common format.  However, as I said earlier, the current implementation
of RDF is not robust enough to perform this function, UNLESS, there is a
global expert consensus on all healthcare concepts so that models may be
created and distributed from a central authority.  This is simply
unrealistic as history has shown and is formalized in the Cavalini-Cook
theory [1].

The reason that I state that RDF is not capable, at this point of maturity,
is that it doesn't support the ability to represent syntactic structures in
a multi-level model environment.  IOW: There is no ability (AFAIK) to
express a common reference model and then derive concepts models that issue
further constraints.  A multi-level model approach is essential in order to
abstract the syntax and semantics of each concept out of the application
source code and repository schemas so that they can be shared between
disparate applications.  This is what provides for full syntactic and
semantic interoperability.

A multi-level model approach may or may not be useful in many domains.
 Specifically, human engineered domains that we fully understand can be
modeled as one level representations.  However, biological domains that
involve evolutionary complexity are quite different.  Primarily because we
do not fully understand them so our science and understanding is constantly
changing.  Additionally, it appears that the data has a much longer
lifetime of significance than other domains.  Therefore the data should be
initially captured and represented in a manner that makes it as future
proof and reusable as possible.  In healthcare, the most semantically rich
point of any information is at the point of care.  Every point of
transition/translation after that will most assuredly lose context.  As a
brief example; reference ranges for conditions change over time.  It is
essential that data captured today be expressed in the context of today's
knowledge, even 20 or more years from now.  The concept model around high
blood pressure is different than it was 10 years ago.

Where RDF shines is that in a syntactic model of a concept designed to
capture reference ranges and other metadata, it can be used to provide
external semantic context to that model.  Whether that context exists in a
controlled vocabulary or even free text documents such as clinical
guidelines.

In the Multi-Level Healthcare Information Modelling (MLHIM) approach we
developed a conceptual reference model to provide a basis for software
implementations. While the MLHIM model doesn't preclude other
serializations, we found that XML Schema 1.1 does provide the prerequisites
for implementation both a reference model and concepts models.  This means
that we can have full validation of instance data back to the W3C
specifications.  By marking up the concept models (XML Schema 1.1
annotations) with RDF providing the computable semantic links for each
model as defined by the modeller.  These models can now be created by
domain experts (with additional knowledge modelling training) so that
software developers do not have to interpret the meanings.

The concept models are now fully detached from any specific implementation
and can be shared to use for validating instance data in the context in
which it was recorded.  I believe that this is the closest we have to
semantic interoperability, to date.  I am of course open for discussion and
debate on the issue.  I used the acronym 'AFAIK' a few times above.  I used
this because my last serious attempt to use RDF for this purpose was in
2010/2011.  I know that there is a continuous maturing process going on.  I
believe that there may come a day when RDF and OWL can be used exclusively
for syntactic and semantic representation and reasoning.  But AFAIK, not
today.

 We have a significant number of peer-reviewed publications about MLHIM and
academic as well as other implementations. I am happy to share those with
the group or you may peruse the links in my signature line as well as
www.mlhim.org and the specs are openly downloadable from here[2] as a
package and as source from here [3].

We also have  almost 2000 datatypes converted from other modeling
approaches (such as the NIH CDE browser and HL7 FHIR) into reusable
complexTypes to be used in concept models.  You can review those as well as
download some example concept models from here[4].  Free registration is
required to download the models.

 Kind Regards,
 Tim


 [1]
https://github.com/mlhim/specs/blob/2_4_3/graphics/cavalini_cook_theory.png
 [2]
https://launchpad.net/mlhim-specs/2.0/2.4.3/+download/mlhim-specs-2013-10-15-2.4.3-Release.zip
 [3]  https://github.com/mlhim/
 [4]  http://www.ccdgen.com




On Fri, Mar 7, 2014 at 5:00 PM, David Booth <david@dbooth.org> wrote:

> Hi Alan,
>
>
> On 03/07/2014 12:44 PM, Alan Ruttenberg wrote:
>
>> Can you explain what you mean by "RDF's ability to allow multiple data
>> models to peacefully coexist, interconnected, in the same data" ?
>>
>
> Yes.  Here is an imprecise illustration, on slides 10-17:
> http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf
> (I took some artistic liberties blurring class/instance distinctions in
> that diagram.)
>
> And here is a more precise example that cleanly distinguishes classes from
> instances:
> http://tinyurl.com/pzsgf7f
> (I've also attached the same illustration, for offline readers.)
>
> In this latter example (of a hypothetical systolic blood pressure
> measurement), the same information is represented according to two
> different models/schemas/vocabularies/ontologies, v1 (green) and v2
> (red).  (I am using the terms model, schema, vocabulary and ontology
> loosely and somewhat interchangeably here.)
>
> In the v1 model, the systolic blood pressure is indicated in RDF like this:
>
>   ex:patient319 foaf:name "John Doe" ;
>     v1:bps ex1:bp_023 .
>
>   ex1:bp_023 a v1:SystolicBPSitting_mmHg ;
>     v1:value 120 .
>
> Whereas in the v2 model, the same information is represented differently,
> in RDF like this:
>
>   ex:patient319 foaf:name "John Doe" ;
>     v2:bps ex2:bp_409 .
>
>   ex2:bp_409 a v2:SystolicBP ;
>     v2:pressure 120 ;
>     v2:units v2:mmHg ;
>     v2:bodyPosition v2:sitting .
>
> Thus, although ex1:bp_023 and ex2:bp409 capture the same blood pressure
> information, they represent that information differently.  Nonetheless,
> both representations can peacefully coexist in the same merged RDF data
> without conflict, which might happen, for example, if one is derived from
> the other through inference.
>
> Furthermore, the relationship between these classes,
> v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the relationship
> between the corresponding v1 and v2 instance data, can also be explicitly
> captured in RDF, as the v1v2:SystolicBP_Transform (yellow) relationship:
>
>   v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform v2:SystolicBP .
>
> Inference rules for v1v2:SystolicBP_Transform could therefore convert a
> v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP measurement or
> vice versa.
>
> This example only illustrated the case where the transformation from one
> model to the other is lossless and thus reversible.  Usually that isn't the
> case.  Relating models and transforming between them is *not* easy, but at
> least RDF makes it possible to explicitly indicate these relationships.
>
> Obviously some intelligence must be exercised to avoid, for example,
> accidentally thinking that ex:bp_023 and ex2:bp_409 represent two distinct
> blood pressure measurements, and thereby double counting them, but that's
> easy enough to do.
>
> Also, there isn't always a desire to relate or transform between models.
>  Sometimes some data is related and other data is not, and it is all still
> merged into the same RDF graph.  In fact, the point may be to connect that
> part of the data that *is* related and let the rest coexist without being
> connected (or at least not *directly* connected).
>
> The point is that these data models can peacefully coexist in RDF data
> without conflict: applications using the v1 model against the merged data
> might only see v1 instance data, whereas applications using the v2 model
> might only see the v2 data.  That's qualitatively different than in the
> world of XML, for example, where one schema generally wants to be "on top",
> and when you merge XML of different schemas, you need to create a new "top"
> schema.  That is the difference that I have so often tried to explain to
> people outside the RDF community, and what I am trying to capture
> succinctly in a term or phrase.   It isn't an easy idea to convey to those
> who are accustomed to a schema-centric approach.  I think a catchy but
> descriptive term or phrase could help.
>
> Thanks,
> David
>
>
>> -Alan
>>
>>
>> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org
>> <mailto:david@dbooth.org>> wrote:
>>
>>     I -- and I'm sure many others -- have struggled for years trying to
>>     succinctly describe RDF's ability to allow multiple data models to
>>     peacefully coexist, interconnected, in the same data.  For data
>>     integration, this is a key strength of RDF that distinguishes it
>>     from other information representation languages such as XML.   I
>>     have tried various terms over the years -- most recently "schema
>>     promiscuous" -- but have not yet found one that I think really nails
>>     it, so I would love to get other people's thoughts.
>>
>>     This google doc lists several candidate terms, some pros and cons,
>>     and allows you to indicate which ones you like best:
>>     http://goo.gl/zrXQgj
>>
>>     Please have a look and indicate your favorite(s).  You may also add
>>     more ideas and comments to it.  The document can be edited by anyone
>>     with the URL.
>>
>>     Thanks!
>>     David Booth
>>
>>
>>


-- 
MLHIM VIP Signup: http://goo.gl/22B0U
============================================
Timothy Cook, MSc           +55 21 994711995
MLHIM http://www.mlhim.org
Like Us on FB: https://www.facebook.com/mlhim2
Circle us on G+: http://goo.gl/44EV5
Google Scholar: http://goo.gl/MMZ1o
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
Received on Saturday, 8 March 2014 21:37:24 UTC