Re: What should we call RDF's ability to allow multiple models to peacefully coexist, interconnected? from Michael Brunnbauer on 2014-03-09 (semantic-web@w3.org from March 2014)

From: Michael Brunnbauer <brunni@netestate.de>
Date: Sun, 9 Mar 2014 15:48:50 +0100
To: "Timothy W. Cook" <tim@mlhim.org>
Cc: semantic-web <semantic-web@w3.org>
Message-ID: <20140309144850.GA24523@netestate.de>
Hello Timothy,

MLHIM seems to be annotated data models - with optional RDF annotations.

The claims regarding interoperability and semantics are a bit exaggerated, IMO.

If we had something like annotated portable RDB schemas, would they carry less
meaning and would applications built with them be less interoperable than with
MLHIM?

In order to make applications completely interoperable and remove all
implicit semantics from their code, you have to abolish them - replacing them
with some standard component. This is probably as futile as the ontology/data 
model to rule them all.

I agree that the proposition of XML Schema is alluring: The information about
the data model used and how to validate the data is always present and the
tools for validation are already there.

You did not use RDF because it has no standard way to do this - which is
unfortunate.

You could have created a way and tools to do this in RDF. Did you fear the 
necessary effort or the risk to adoption?

It seems that XML Schema allows vocabulary reuse down to the property/attribute
level - but the temptation to create own terms instead of reusing others seems
to be greater than with RDF. Having some of the semantics in the XML Schema
layer and more of it in the RDF layer on top of it definitely is a drawback.

How many implementors will just ignore the optional RDF layer?

Regards,

Michael Brunnbauer

On Sat, Mar 08, 2014 at 06:36:54PM -0300, Timothy W. Cook wrote:
> A very interesting and I think, foundational discussion.  David, thanks for
> bringing it up.
> Below is a discussion of why I believe that RDF should be considered a
> layer over data models or maybe as 'semantic glue'.
> 
>  David, we are working on the same type of problem but from slightly
> different perspectives.  The presentation that you linked to re:KnowMED, is
> very important and I recall seeing it before.  I'll take this opportunity
> to comment on it since it is in the context of this discussion.  The
> indicates that you propse RDF as a language to be used in the exchange of
> healthcare data.  Then on slide #5 you say it isn't enough to 'get us
> there'.  So I am not sure how much of this is marketing swagger and how
> much is hard fact.
> 
>  On slide #8 item #2 we are 100% in agreement.  But then on slide #9 you
> are mixing apples and oranges.  XML and RDF have two different purposes
> that work well together.
> 
>  On further slides, your Blue, Green and Red customers exactly indicate
> what I mean by RDF being an essential layer on top of multiple models.
> 
>  What happens further in the presentation is where we disagree.  You assert
> that RDF should be the language used to actually 'exchange' data. This
> where RDF and the tools around it (AFAIK) are not mature enough to perform.
>  Several times you have mentioned 'semantics and not syntax'. This is a
> huge mistake.  You must have both in order to insure data quality and
> meaning.  Secondly we know from history that top-down consensus in
> healthcare concept modelling is an impossibility.[1]
> 
>  In your post describing the BP screenshot you said:
>  "Thus, although ex1:bp_023 and ex2:bp409 capture the same blood pressure
> information, they represent that information differently.  Nonetheless,
> both representations can peacefully coexist in the same merged RDF data
> without conflict, which might happen, for example, if one is derived from
> the other through inference."
> I take this to mean that you are representing the exact same BP measurement
> data in two different ways?  Your use case, 'by inference' is a little
> fuzzy for me.  If it is derivation by inference, it will just be an in
> memory representation and not persisted; correct?   Irregardless, the
> existence of the same data instance, in the same application is in complete
> contradiction to good data quality management.  As you go on to explain,
> now you must add application intelligence to analyze whether or not two
> data instances are the same or not to avoid counting them as two separate
> instances.  This is approach is very dangerous, in addition to adding
> complexity and cost to the applications.    However, having the ability to
> determine if two different data instances exactly match the same concept is
> essential.  Minor differences such as the position of the patient (stitting
> or prone) or the type of instrument used to perform the measurement or the
> location on the body (left upper arm or right thigh, etc.) that the
> measurement was taken are all important.  They may or may not rule in or
> out specific measurements, based on the intended use of the query results.
>  This is where RDF is essential, do these two instances point to exactly
> the same code in a controlled vocabulary, etc.?    These questions are
> essential to having the ability to perform machine based reasoning over the
> data repository; whether at the point of care or for research purposes.
> 
> Refering back for a moment, to 'the same data instance' situation.  It is
> essential to have additional information (meta-data) to determine if two
> instances are are exactly the same.  This can legitimately occur during
> aggregation for research or systemic quality analysis.  Unique patient
> identifiers along with datetime stamps are ideal.  However, the patient
> identifier issue is an ongoing problem that is actually implementation
> context and application specific.  It is outside of the context of data
> quality and management.
> 
>  Slide #22 clearly indicates that there is an expectation that RDF is used
> as a common format.  However, as I said earlier, the current implementation
> of RDF is not robust enough to perform this function, UNLESS, there is a
> global expert consensus on all healthcare concepts so that models may be
> created and distributed from a central authority.  This is simply
> unrealistic as history has shown and is formalized in the Cavalini-Cook
> theory [1].
> 
> The reason that I state that RDF is not capable, at this point of maturity,
> is that it doesn't support the ability to represent syntactic structures in
> a multi-level model environment.  IOW: There is no ability (AFAIK) to
> express a common reference model and then derive concepts models that issue
> further constraints.  A multi-level model approach is essential in order to
> abstract the syntax and semantics of each concept out of the application
> source code and repository schemas so that they can be shared between
> disparate applications.  This is what provides for full syntactic and
> semantic interoperability.
> 
> A multi-level model approach may or may not be useful in many domains.
>  Specifically, human engineered domains that we fully understand can be
> modeled as one level representations.  However, biological domains that
> involve evolutionary complexity are quite different.  Primarily because we
> do not fully understand them so our science and understanding is constantly
> changing.  Additionally, it appears that the data has a much longer
> lifetime of significance than other domains.  Therefore the data should be
> initially captured and represented in a manner that makes it as future
> proof and reusable as possible.  In healthcare, the most semantically rich
> point of any information is at the point of care.  Every point of
> transition/translation after that will most assuredly lose context.  As a
> brief example; reference ranges for conditions change over time.  It is
> essential that data captured today be expressed in the context of today's
> knowledge, even 20 or more years from now.  The concept model around high
> blood pressure is different than it was 10 years ago.
> 
> Where RDF shines is that in a syntactic model of a concept designed to
> capture reference ranges and other metadata, it can be used to provide
> external semantic context to that model.  Whether that context exists in a
> controlled vocabulary or even free text documents such as clinical
> guidelines.
> 
> In the Multi-Level Healthcare Information Modelling (MLHIM) approach we
> developed a conceptual reference model to provide a basis for software
> implementations. While the MLHIM model doesn't preclude other
> serializations, we found that XML Schema 1.1 does provide the prerequisites
> for implementation both a reference model and concepts models.  This means
> that we can have full validation of instance data back to the W3C
> specifications.  By marking up the concept models (XML Schema 1.1
> annotations) with RDF providing the computable semantic links for each
> model as defined by the modeller.  These models can now be created by
> domain experts (with additional knowledge modelling training) so that
> software developers do not have to interpret the meanings.
> 
> The concept models are now fully detached from any specific implementation
> and can be shared to use for validating instance data in the context in
> which it was recorded.  I believe that this is the closest we have to
> semantic interoperability, to date.  I am of course open for discussion and
> debate on the issue.  I used the acronym 'AFAIK' a few times above.  I used
> this because my last serious attempt to use RDF for this purpose was in
> 2010/2011.  I know that there is a continuous maturing process going on.  I
> believe that there may come a day when RDF and OWL can be used exclusively
> for syntactic and semantic representation and reasoning.  But AFAIK, not
> today.
> 
>  We have a significant number of peer-reviewed publications about MLHIM and
> academic as well as other implementations. I am happy to share those with
> the group or you may peruse the links in my signature line as well as
> www.mlhim.org and the specs are openly downloadable from here[2] as a
> package and as source from here [3].
> 
> We also have  almost 2000 datatypes converted from other modeling
> approaches (such as the NIH CDE browser and HL7 FHIR) into reusable
> complexTypes to be used in concept models.  You can review those as well as
> download some example concept models from here[4].  Free registration is
> required to download the models.
> 
>  Kind Regards,
>  Tim
> 
> 
>  [1]
> https://github.com/mlhim/specs/blob/2_4_3/graphics/cavalini_cook_theory.png
>  [2]
> https://launchpad.net/mlhim-specs/2.0/2.4.3/+download/mlhim-specs-2013-10-15-2.4.3-Release.zip
>  [3]  https://github.com/mlhim/
>  [4]  http://www.ccdgen.com
> 
> 
> 
> 
> On Fri, Mar 7, 2014 at 5:00 PM, David Booth <david@dbooth.org> wrote:
> 
> > Hi Alan,
> >
> >
> > On 03/07/2014 12:44 PM, Alan Ruttenberg wrote:
> >
> >> Can you explain what you mean by "RDF's ability to allow multiple data
> >> models to peacefully coexist, interconnected, in the same data" ?
> >>
> >
> > Yes.  Here is an imprecise illustration, on slides 10-17:
> > http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf
> > (I took some artistic liberties blurring class/instance distinctions in
> > that diagram.)
> >
> > And here is a more precise example that cleanly distinguishes classes from
> > instances:
> > http://tinyurl.com/pzsgf7f
> > (I've also attached the same illustration, for offline readers.)
> >
> > In this latter example (of a hypothetical systolic blood pressure
> > measurement), the same information is represented according to two
> > different models/schemas/vocabularies/ontologies, v1 (green) and v2
> > (red).  (I am using the terms model, schema, vocabulary and ontology
> > loosely and somewhat interchangeably here.)
> >
> > In the v1 model, the systolic blood pressure is indicated in RDF like this:
> >
> >   ex:patient319 foaf:name "John Doe" ;
> >     v1:bps ex1:bp_023 .
> >
> >   ex1:bp_023 a v1:SystolicBPSitting_mmHg ;
> >     v1:value 120 .
> >
> > Whereas in the v2 model, the same information is represented differently,
> > in RDF like this:
> >
> >   ex:patient319 foaf:name "John Doe" ;
> >     v2:bps ex2:bp_409 .
> >
> >   ex2:bp_409 a v2:SystolicBP ;
> >     v2:pressure 120 ;
> >     v2:units v2:mmHg ;
> >     v2:bodyPosition v2:sitting .
> >
> > Thus, although ex1:bp_023 and ex2:bp409 capture the same blood pressure
> > information, they represent that information differently.  Nonetheless,
> > both representations can peacefully coexist in the same merged RDF data
> > without conflict, which might happen, for example, if one is derived from
> > the other through inference.
> >
> > Furthermore, the relationship between these classes,
> > v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the relationship
> > between the corresponding v1 and v2 instance data, can also be explicitly
> > captured in RDF, as the v1v2:SystolicBP_Transform (yellow) relationship:
> >
> >   v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform v2:SystolicBP .
> >
> > Inference rules for v1v2:SystolicBP_Transform could therefore convert a
> > v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP measurement or
> > vice versa.
> >
> > This example only illustrated the case where the transformation from one
> > model to the other is lossless and thus reversible.  Usually that isn't the
> > case.  Relating models and transforming between them is *not* easy, but at
> > least RDF makes it possible to explicitly indicate these relationships.
> >
> > Obviously some intelligence must be exercised to avoid, for example,
> > accidentally thinking that ex:bp_023 and ex2:bp_409 represent two distinct
> > blood pressure measurements, and thereby double counting them, but that's
> > easy enough to do.
> >
> > Also, there isn't always a desire to relate or transform between models.
> >  Sometimes some data is related and other data is not, and it is all still
> > merged into the same RDF graph.  In fact, the point may be to connect that
> > part of the data that *is* related and let the rest coexist without being
> > connected (or at least not *directly* connected).
> >
> > The point is that these data models can peacefully coexist in RDF data
> > without conflict: applications using the v1 model against the merged data
> > might only see v1 instance data, whereas applications using the v2 model
> > might only see the v2 data.  That's qualitatively different than in the
> > world of XML, for example, where one schema generally wants to be "on top",
> > and when you merge XML of different schemas, you need to create a new "top"
> > schema.  That is the difference that I have so often tried to explain to
> > people outside the RDF community, and what I am trying to capture
> > succinctly in a term or phrase.   It isn't an easy idea to convey to those
> > who are accustomed to a schema-centric approach.  I think a catchy but
> > descriptive term or phrase could help.
> >
> > Thanks,
> > David
> >
> >
> >> -Alan
> >>
> >>
> >> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org
> >> <mailto:david@dbooth.org>> wrote:
> >>
> >>     I -- and I'm sure many others -- have struggled for years trying to
> >>     succinctly describe RDF's ability to allow multiple data models to
> >>     peacefully coexist, interconnected, in the same data.  For data
> >>     integration, this is a key strength of RDF that distinguishes it
> >>     from other information representation languages such as XML.   I
> >>     have tried various terms over the years -- most recently "schema
> >>     promiscuous" -- but have not yet found one that I think really nails
> >>     it, so I would love to get other people's thoughts.
> >>
> >>     This google doc lists several candidate terms, some pros and cons,
> >>     and allows you to indicate which ones you like best:
> >>     http://goo.gl/zrXQgj
> >>
> >>     Please have a look and indicate your favorite(s).  You may also add
> >>     more ideas and comments to it.  The document can be edited by anyone
> >>     with the URL.
> >>
> >>     Thanks!
> >>     David Booth
> >>
> >>
> >>
> 
> 
> -- 
> MLHIM VIP Signup: http://goo.gl/22B0U
> ============================================
> Timothy Cook, MSc           +55 21 994711995
> MLHIM http://www.mlhim.org
> Like Us on FB: https://www.facebook.com/mlhim2
> Circle us on G+: http://goo.gl/44EV5
> Google Scholar: http://goo.gl/MMZ1o
> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Received on Sunday, 9 March 2014 14:49:14 UTC