Re: What should we call RDF's ability to allow multiple models to peacefully coexist, interconnected? from Timothy W. Cook on 2014-03-10 (semantic-web@w3.org from March 2014)

From: Timothy W. Cook <tim@mlhim.org>
Date: Mon, 10 Mar 2014 18:57:26 -0300
To: Michael Brunnbauer <brunni@netestate.de>
Cc: semantic-web <semantic-web@w3.org>
Message-ID: <CA+=OU3U8Eh9NdK2Bkk1z1F2W9i5uJQ7_Pqfj=cLF7Ndv_nTXpQ@mail.gmail.com>
We do not disagree that RDF has a much cleaner approach to creating
meaningful connections between information components.   As I stated
earlier, I originally thought that I would use RDF and/or OWL to build
MLHIM.  The reality is that it just isn't robust enough in some aspects nor
is the eco-system around it mature enough; yet.

I have stated in papers and social media posts that I assume MLHIM 3.x will
be RDF.  I still believe that, but we do not know if that will be in 5
years or 10 or even longer.





On Mon, Mar 10, 2014 at 6:15 PM, Michael Brunnbauer <brunni@netestate.de>wrote:

>
> Hello Timothy,
>
> I am not a friend of the data model / ontology distinction but I will use
> it
> here: A data model generally has less semantics, reusability and explicit
> knowledge than an ontology.
>
> You can map XML Schema to OWL automatically but what you have then is still
> more data model than ontology.
>
> With your approach, the step from data model to ontology is discreet while
> with RDF, it would be continuous.
>
> Regards,
>
> Michael Brunnbauer
>
> On Sun, Mar 09, 2014 at 05:59:49PM -0300, Timothy W. Cook wrote:
> > On Sun, Mar 9, 2014 at 11:48 AM, Michael Brunnbauer <brunni@netestate.de
> >wrote:
> >
> > >
> > > Hello Timothy,
> > >
> > > MLHIM seems to be annotated data models - with optional RDF
> annotations.
> > >
> > > Somewhat, but the models are are restrictions of a common reference
> model.
> >  Each model represents a concept that is as broad or narrow as the
> modeller
> > chooses.  The annotations must be optional.  It is up to the domain
> > experts/knowledge modellers to determine the resultant quality.
> >
> >
> >
> >
> > > The claims regarding interoperability and semantics are a bit
> exaggerated,
> > > IMO.
> > >
> > >
> >
> > I suppose your opinion will change when you decide to put some study into
> > the matter.
> >
> >
> >
> > > If we had something like annotated portable RDB schemas, would they
> carry
> > > less
> > > meaning and would applications built with them be less interoperable
> than
> > > with
> > > MLHIM?
> > >
> > >
> > If you were able to share those concept models between applications and
> > they were restrictions of a common reference model; then yes they would
> be
> > the same.
> >
> >
> >
> > > In order to make applications completely interoperable and remove all
> > > implicit semantics from their code, you have to abolish them -
> replacing
> > > them
> > > with some standard component. This is probably as futile as the
> > > ontology/data
> > > model to rule them all.
> > >
> >
> > Further study will show that there are paths to operate along in the
> > interim.  But yes, the eventual goal would be for a common healthcare
> > reference model.
> >
> >
> > >
> > > I agree that the proposition of XML Schema is alluring: The information
> > > about
> > > the data model used and how to validate the data is always present and
> the
> > > tools for validation are already there.
> > >
> > > You did not use RDF because it has no standard way to do this - which
> is
> > > unfortunate.
> > >
> >
> > It is unfortunate.  After working with the openEHR Foundation on
> > multi-level modelling for a decade using a domain specific language it
> was
> > an easy realization that a relatively small group of people could not
> > create high quality tools needed for a DSL; in any reasonable amount of
> > time.
> > I began looking for alternatives.  OWL and RDF would be my first choices
> > for implementation.  They just weren't and still aren't mature enough to
> do
> > everything needed.  Remember as I stated before; the MLHIM reference
> model
> > is a conceptual information model.  I choose XML because I did not see
> > anything with that capability and widespread adoption. I knew very little
> > about XML Schema prior to this.  So I did not choose it because it was my
> > hammer already.  I spent a lot of time on a lang learning curve and had
> to
> > wait for tools to catch up to XML Schema 1.1
> >
> > >
> > > You could have created a way and tools to do this in RDF. Did you fear
> the
> > > necessary effort or the risk to adoption?
> > >
> >
> > (see above)
> > Given, time talent and money; openEHR could do it with the Archetype
> > Definition Language. But it would never be as ubiquitous as XML.
> >
> >
> > > It seems that XML Schema allows vocabulary reuse down to the
> > > property/attribute
> > > level - but the temptation to create own terms instead of reusing
> others
> > > seems
> > > to be greater than with RDF. Having some of the semantics in the XML
> Schema
> > > layer and more of it in the RDF layer on top of it definitely is a
> > > drawback.
> > >
> > >
> > There may be other/additional approaches that may help improve MLHIM.  I
> am
> > certainly open to and welcome dialog about it.  The specifications (such
> > that they are at this point) are openly available under a Creative
> Commons
> > license.  Feel free to join the discussion on social media (Google Plus
> > preferred).
> >
> >
> >
> > > How many implementors will just ignore the optional RDF layer?
> > >
> >
> > You must realize that software developers do not have control of the
> models
> > in this approach.  Domain experts that understand a little bit of how to
> > use the CCD-Gen are the ones responsible for building the models.  In the
> > process of teaching them this activity, they are also taught the
> importance
> > of the quality of their models and it ultimately decides the quality of
> > their data.
> >
> > The MLHIM eco-system allows for closed loop concept models( CCDs) to be
> > developed as well as openly licensed CCDs.  There may eventually be
> 10,000
> > blood pressure CCDs in the open.  But like most things, we predict that
> > most people will reuse a model that is good and openly available, instead
> > of building their own.
> >
> > I can't decide for the experts nor do I want to control what is or is
> not a
> > good model for any particular implementation.  All I can do is offer
> them a
> > real solution that is bottom up and under their control instead of slow
> > moving international standards bodies that can't keep up with the
> changing
> > science.
> >
> > Thanks for your feedback.   Explaining MLHIM in words is always a
> learning
> > experience for me.
> >
> > Regards,
> > Tim
> >
> >
> >
> >
> >
> > >
> > > Regards,
> > >
> > > Michael Brunnbauer
> > >
> > > On Sat, Mar 08, 2014 at 06:36:54PM -0300, Timothy W. Cook wrote:
> > > > A very interesting and I think, foundational discussion.  David,
> thanks
> > > for
> > > > bringing it up.
> > > > Below is a discussion of why I believe that RDF should be considered
> a
> > > > layer over data models or maybe as 'semantic glue'.
> > > >
> > > >  David, we are working on the same type of problem but from slightly
> > > > different perspectives.  The presentation that you linked to
> re:KnowMED,
> > > is
> > > > very important and I recall seeing it before.  I'll take this
> opportunity
> > > > to comment on it since it is in the context of this discussion.  The
> > > > indicates that you propse RDF as a language to be used in the
> exchange of
> > > > healthcare data.  Then on slide #5 you say it isn't enough to 'get us
> > > > there'.  So I am not sure how much of this is marketing swagger and
> how
> > > > much is hard fact.
> > > >
> > > >  On slide #8 item #2 we are 100% in agreement.  But then on slide #9
> you
> > > > are mixing apples and oranges.  XML and RDF have two different
> purposes
> > > > that work well together.
> > > >
> > > >  On further slides, your Blue, Green and Red customers exactly
> indicate
> > > > what I mean by RDF being an essential layer on top of multiple
> models.
> > > >
> > > >  What happens further in the presentation is where we disagree.  You
> > > assert
> > > > that RDF should be the language used to actually 'exchange' data.
> This
> > > > where RDF and the tools around it (AFAIK) are not mature enough to
> > > perform.
> > > >  Several times you have mentioned 'semantics and not syntax'. This
> is a
> > > > huge mistake.  You must have both in order to insure data quality and
> > > > meaning.  Secondly we know from history that top-down consensus in
> > > > healthcare concept modelling is an impossibility.[1]
> > > >
> > > >  In your post describing the BP screenshot you said:
> > > >  "Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
> pressure
> > > > information, they represent that information differently.
>  Nonetheless,
> > > > both representations can peacefully coexist in the same merged RDF
> data
> > > > without conflict, which might happen, for example, if one is derived
> from
> > > > the other through inference."
> > > > I take this to mean that you are representing the exact same BP
> > > measurement
> > > > data in two different ways?  Your use case, 'by inference' is a
> little
> > > > fuzzy for me.  If it is derivation by inference, it will just be an
> in
> > > > memory representation and not persisted; correct?   Irregardless, the
> > > > existence of the same data instance, in the same application is in
> > > complete
> > > > contradiction to good data quality management.  As you go on to
> explain,
> > > > now you must add application intelligence to analyze whether or not
> two
> > > > data instances are the same or not to avoid counting them as two
> separate
> > > > instances.  This is approach is very dangerous, in addition to adding
> > > > complexity and cost to the applications.    However, having the
> ability
> > > to
> > > > determine if two different data instances exactly match the same
> concept
> > > is
> > > > essential.  Minor differences such as the position of the patient
> > > (stitting
> > > > or prone) or the type of instrument used to perform the measurement
> or
> > > the
> > > > location on the body (left upper arm or right thigh, etc.) that the
> > > > measurement was taken are all important.  They may or may not rule
> in or
> > > > out specific measurements, based on the intended use of the query
> > > results.
> > > >  This is where RDF is essential, do these two instances point to
> exactly
> > > > the same code in a controlled vocabulary, etc.?    These questions
> are
> > > > essential to having the ability to perform machine based reasoning
> over
> > > the
> > > > data repository; whether at the point of care or for research
> purposes.
> > > >
> > > > Refering back for a moment, to 'the same data instance' situation.
>  It is
> > > > essential to have additional information (meta-data) to determine if
> two
> > > > instances are are exactly the same.  This can legitimately occur
> during
> > > > aggregation for research or systemic quality analysis.  Unique
> patient
> > > > identifiers along with datetime stamps are ideal.  However, the
> patient
> > > > identifier issue is an ongoing problem that is actually
> implementation
> > > > context and application specific.  It is outside of the context of
> data
> > > > quality and management.
> > > >
> > > >  Slide #22 clearly indicates that there is an expectation that RDF is
> > > used
> > > > as a common format.  However, as I said earlier, the current
> > > implementation
> > > > of RDF is not robust enough to perform this function, UNLESS, there
> is a
> > > > global expert consensus on all healthcare concepts so that models
> may be
> > > > created and distributed from a central authority.  This is simply
> > > > unrealistic as history has shown and is formalized in the
> Cavalini-Cook
> > > > theory [1].
> > > >
> > > > The reason that I state that RDF is not capable, at this point of
> > > maturity,
> > > > is that it doesn't support the ability to represent syntactic
> structures
> > > in
> > > > a multi-level model environment.  IOW: There is no ability (AFAIK) to
> > > > express a common reference model and then derive concepts models that
> > > issue
> > > > further constraints.  A multi-level model approach is essential in
> order
> > > to
> > > > abstract the syntax and semantics of each concept out of the
> application
> > > > source code and repository schemas so that they can be shared between
> > > > disparate applications.  This is what provides for full syntactic and
> > > > semantic interoperability.
> > > >
> > > > A multi-level model approach may or may not be useful in many
> domains.
> > > >  Specifically, human engineered domains that we fully understand can
> be
> > > > modeled as one level representations.  However, biological domains
> that
> > > > involve evolutionary complexity are quite different.  Primarily
> because
> > > we
> > > > do not fully understand them so our science and understanding is
> > > constantly
> > > > changing.  Additionally, it appears that the data has a much longer
> > > > lifetime of significance than other domains.  Therefore the data
> should
> > > be
> > > > initially captured and represented in a manner that makes it as
> future
> > > > proof and reusable as possible.  In healthcare, the most semantically
> > > rich
> > > > point of any information is at the point of care.  Every point of
> > > > transition/translation after that will most assuredly lose context.
>  As a
> > > > brief example; reference ranges for conditions change over time.  It
> is
> > > > essential that data captured today be expressed in the context of
> today's
> > > > knowledge, even 20 or more years from now.  The concept model around
> high
> > > > blood pressure is different than it was 10 years ago.
> > > >
> > > > Where RDF shines is that in a syntactic model of a concept designed
> to
> > > > capture reference ranges and other metadata, it can be used to
> provide
> > > > external semantic context to that model.  Whether that context
> exists in
> > > a
> > > > controlled vocabulary or even free text documents such as clinical
> > > > guidelines.
> > > >
> > > > In the Multi-Level Healthcare Information Modelling (MLHIM) approach
> we
> > > > developed a conceptual reference model to provide a basis for
> software
> > > > implementations. While the MLHIM model doesn't preclude other
> > > > serializations, we found that XML Schema 1.1 does provide the
> > > prerequisites
> > > > for implementation both a reference model and concepts models.  This
> > > means
> > > > that we can have full validation of instance data back to the W3C
> > > > specifications.  By marking up the concept models (XML Schema 1.1
> > > > annotations) with RDF providing the computable semantic links for
> each
> > > > model as defined by the modeller.  These models can now be created by
> > > > domain experts (with additional knowledge modelling training) so that
> > > > software developers do not have to interpret the meanings.
> > > >
> > > > The concept models are now fully detached from any specific
> > > implementation
> > > > and can be shared to use for validating instance data in the context
> in
> > > > which it was recorded.  I believe that this is the closest we have to
> > > > semantic interoperability, to date.  I am of course open for
> discussion
> > > and
> > > > debate on the issue.  I used the acronym 'AFAIK' a few times above.
>  I
> > > used
> > > > this because my last serious attempt to use RDF for this purpose was
> in
> > > > 2010/2011.  I know that there is a continuous maturing process going
> on.
> > >  I
> > > > believe that there may come a day when RDF and OWL can be used
> > > exclusively
> > > > for syntactic and semantic representation and reasoning.  But AFAIK,
> not
> > > > today.
> > > >
> > > >  We have a significant number of peer-reviewed publications about
> MLHIM
> > > and
> > > > academic as well as other implementations. I am happy to share those
> with
> > > > the group or you may peruse the links in my signature line as well as
> > > > www.mlhim.org and the specs are openly downloadable from here[2] as
> a
> > > > package and as source from here [3].
> > > >
> > > > We also have  almost 2000 datatypes converted from other modeling
> > > > approaches (such as the NIH CDE browser and HL7 FHIR) into reusable
> > > > complexTypes to be used in concept models.  You can review those as
> well
> > > as
> > > > download some example concept models from here[4].  Free
> registration is
> > > > required to download the models.
> > > >
> > > >  Kind Regards,
> > > >  Tim
> > > >
> > > >
> > > >  [1]
> > > >
> > >
> https://github.com/mlhim/specs/blob/2_4_3/graphics/cavalini_cook_theory.png
> > > >  [2]
> > > >
> > >
> https://launchpad.net/mlhim-specs/2.0/2.4.3/+download/mlhim-specs-2013-10-15-2.4.3-Release.zip
> > > >  [3]  https://github.com/mlhim/
> > > >  [4]  http://www.ccdgen.com
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Mar 7, 2014 at 5:00 PM, David Booth <david@dbooth.org>
> wrote:
> > > >
> > > > > Hi Alan,
> > > > >
> > > > >
> > > > > On 03/07/2014 12:44 PM, Alan Ruttenberg wrote:
> > > > >
> > > > >> Can you explain what you mean by "RDF's ability to allow multiple
> data
> > > > >> models to peacefully coexist, interconnected, in the same data" ?
> > > > >>
> > > > >
> > > > > Yes.  Here is an imprecise illustration, on slides 10-17:
> > > > >
> > >
> http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf
> > > > > (I took some artistic liberties blurring class/instance
> distinctions in
> > > > > that diagram.)
> > > > >
> > > > > And here is a more precise example that cleanly distinguishes
> classes
> > > from
> > > > > instances:
> > > > > http://tinyurl.com/pzsgf7f
> > > > > (I've also attached the same illustration, for offline readers.)
> > > > >
> > > > > In this latter example (of a hypothetical systolic blood pressure
> > > > > measurement), the same information is represented according to two
> > > > > different models/schemas/vocabularies/ontologies, v1 (green) and v2
> > > > > (red).  (I am using the terms model, schema, vocabulary and
> ontology
> > > > > loosely and somewhat interchangeably here.)
> > > > >
> > > > > In the v1 model, the systolic blood pressure is indicated in RDF
> like
> > > this:
> > > > >
> > > > >   ex:patient319 foaf:name "John Doe" ;
> > > > >     v1:bps ex1:bp_023 .
> > > > >
> > > > >   ex1:bp_023 a v1:SystolicBPSitting_mmHg ;
> > > > >     v1:value 120 .
> > > > >
> > > > > Whereas in the v2 model, the same information is represented
> > > differently,
> > > > > in RDF like this:
> > > > >
> > > > >   ex:patient319 foaf:name "John Doe" ;
> > > > >     v2:bps ex2:bp_409 .
> > > > >
> > > > >   ex2:bp_409 a v2:SystolicBP ;
> > > > >     v2:pressure 120 ;
> > > > >     v2:units v2:mmHg ;
> > > > >     v2:bodyPosition v2:sitting .
> > > > >
> > > > > Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
> pressure
> > > > > information, they represent that information differently.
>  Nonetheless,
> > > > > both representations can peacefully coexist in the same merged RDF
> data
> > > > > without conflict, which might happen, for example, if one is
> derived
> > > from
> > > > > the other through inference.
> > > > >
> > > > > Furthermore, the relationship between these classes,
> > > > > v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the
> relationship
> > > > > between the corresponding v1 and v2 instance data, can also be
> > > explicitly
> > > > > captured in RDF, as the v1v2:SystolicBP_Transform (yellow)
> > > relationship:
> > > > >
> > > > >   v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform
> v2:SystolicBP .
> > > > >
> > > > > Inference rules for v1v2:SystolicBP_Transform could therefore
> convert a
> > > > > v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP
> measurement or
> > > > > vice versa.
> > > > >
> > > > > This example only illustrated the case where the transformation
> from
> > > one
> > > > > model to the other is lossless and thus reversible.  Usually that
> > > isn't the
> > > > > case.  Relating models and transforming between them is *not* easy,
> > > but at
> > > > > least RDF makes it possible to explicitly indicate these
> relationships.
> > > > >
> > > > > Obviously some intelligence must be exercised to avoid, for
> example,
> > > > > accidentally thinking that ex:bp_023 and ex2:bp_409 represent two
> > > distinct
> > > > > blood pressure measurements, and thereby double counting them, but
> > > that's
> > > > > easy enough to do.
> > > > >
> > > > > Also, there isn't always a desire to relate or transform between
> > > models.
> > > > >  Sometimes some data is related and other data is not, and it is
> all
> > > still
> > > > > merged into the same RDF graph.  In fact, the point may be to
> connect
> > > that
> > > > > part of the data that *is* related and let the rest coexist without
> > > being
> > > > > connected (or at least not *directly* connected).
> > > > >
> > > > > The point is that these data models can peacefully coexist in RDF
> data
> > > > > without conflict: applications using the v1 model against the
> merged
> > > data
> > > > > might only see v1 instance data, whereas applications using the v2
> > > model
> > > > > might only see the v2 data.  That's qualitatively different than
> in the
> > > > > world of XML, for example, where one schema generally wants to be
> "on
> > > top",
> > > > > and when you merge XML of different schemas, you need to create a
> new
> > > "top"
> > > > > schema.  That is the difference that I have so often tried to
> explain
> > > to
> > > > > people outside the RDF community, and what I am trying to capture
> > > > > succinctly in a term or phrase.   It isn't an easy idea to convey
> to
> > > those
> > > > > who are accustomed to a schema-centric approach.  I think a catchy
> but
> > > > > descriptive term or phrase could help.
> > > > >
> > > > > Thanks,
> > > > > David
> > > > >
> > > > >
> > > > >> -Alan
> > > > >>
> > > > >>
> > > > >> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org
> > > > >> <mailto:david@dbooth.org>> wrote:
> > > > >>
> > > > >>     I -- and I'm sure many others -- have struggled for years
> trying
> > > to
> > > > >>     succinctly describe RDF's ability to allow multiple data
> models to
> > > > >>     peacefully coexist, interconnected, in the same data.  For
> data
> > > > >>     integration, this is a key strength of RDF that distinguishes
> it
> > > > >>     from other information representation languages such as XML.
>   I
> > > > >>     have tried various terms over the years -- most recently
> "schema
> > > > >>     promiscuous" -- but have not yet found one that I think really
> > > nails
> > > > >>     it, so I would love to get other people's thoughts.
> > > > >>
> > > > >>     This google doc lists several candidate terms, some pros and
> cons,
> > > > >>     and allows you to indicate which ones you like best:
> > > > >>     http://goo.gl/zrXQgj
> > > > >>
> > > > >>     Please have a look and indicate your favorite(s).  You may
> also
> > > add
> > > > >>     more ideas and comments to it.  The document can be edited by
> > > anyone
> > > > >>     with the URL.
> > > > >>
> > > > >>     Thanks!
> > > > >>     David Booth
> > > > >>
> > > > >>
> > > > >>
> > > >
> > > >
> > > > --
> > > > MLHIM VIP Signup: http://goo.gl/22B0U
> > > > ============================================
> > > > Timothy Cook, MSc           +55 21 994711995
> > > > MLHIM http://www.mlhim.org
> > > > Like Us on FB: https://www.facebook.com/mlhim2
> > > > Circle us on G+: http://goo.gl/44EV5
> > > > Google Scholar: http://goo.gl/MMZ1o
> > > > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
> > >
> > > --
> > > ++  Michael Brunnbauer
> > > ++  netEstate GmbH
> > > ++  Geisenhausener Straße 11a
> > > ++  81379 München
> > > ++  Tel +49 89 32 19 77 80
> > > ++  Fax +49 89 32 19 77 89
> > > ++  E-Mail brunni@netestate.de
> > > ++  http://www.netestate.de/
> > > ++
> > > ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> > > ++  USt-IdNr. DE221033342
> > > ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> > > ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
> > >
> >
> >
> >
> > --
> > MLHIM VIP Signup: http://goo.gl/22B0U
> > ============================================
> > Timothy Cook, MSc           +55 21 994711995
> > MLHIM http://www.mlhim.org
> > Like Us on FB: https://www.facebook.com/mlhim2
> > Circle us on G+: http://goo.gl/44EV5
> > Google Scholar: http://goo.gl/MMZ1o
> > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>
> --
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  Geisenhausener Straße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89
> ++  E-Mail brunni@netestate.de
> ++  http://www.netestate.de/
> ++
> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> ++  USt-IdNr. DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>



-- 
MLHIM VIP Signup: http://goo.gl/22B0U
============================================
Timothy Cook, MSc           +55 21 994711995
MLHIM http://www.mlhim.org
Like Us on FB: https://www.facebook.com/mlhim2
Circle us on G+: http://goo.gl/44EV5
Google Scholar: http://goo.gl/MMZ1o
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
Received on Monday, 10 March 2014 21:57:56 UTC