Re: What should we call RDF's ability to allow multiple models to peacefully coexist, interconnected? from Martynas Jusevičius on 2014-03-09 (semantic-web@w3.org from March 2014)

From: Martynas Jusevičius <martynas@graphity.org>
Date: Sun, 9 Mar 2014 22:12:33 +0100
To: "Timothy W. Cook" <tim@mlhim.org>
Cc: semantic-web <semantic-web@w3.org>, Michael Brunnbauer <brunni@netestate.de>
Message-ID: <CAE35Vmzok-q=3o0ZitF7bENPWPYC3jaxMuOm_5RNMrNxaBA3HA@mail.gmail.com>
Hey all,

Regarding RDF validation - I guess you all know about SPIN constraints,
right? They're SPARQL-based.

http://spinrdf.org/spin.html#spin-constraints


Martynas

graphityhq.com


On Mar 9, 2014 10:03 PM, "Timothy W. Cook" <tim@mlhim.org> wrote:

> On Sun, Mar 9, 2014 at 11:48 AM, Michael Brunnbauer <brunni@netestate.de>wrote:
>
>>
>> Hello Timothy,
>>
>> MLHIM seems to be annotated data models - with optional RDF annotations.
>>
>> Somewhat, but the models are are restrictions of a common reference
> model.  Each model represents a concept that is as broad or narrow as the
> modeller chooses.  The annotations must be optional.  It is up to the
> domain experts/knowledge modellers to determine the resultant quality.
>
>
>
>
>> The claims regarding interoperability and semantics are a bit
>> exaggerated, IMO.
>>
>>
>
> I suppose your opinion will change when you decide to put some study into
> the matter.
>
>
>
>> If we had something like annotated portable RDB schemas, would they carry
>> less
>> meaning and would applications built with them be less interoperable than
>> with
>> MLHIM?
>>
>>
> If you were able to share those concept models between applications and
> they were restrictions of a common reference model; then yes they would be
> the same.
>
>
>
>> In order to make applications completely interoperable and remove all
>> implicit semantics from their code, you have to abolish them - replacing
>> them
>> with some standard component. This is probably as futile as the
>> ontology/data
>> model to rule them all.
>>
>
> Further study will show that there are paths to operate along in the
> interim.  But yes, the eventual goal would be for a common healthcare
> reference model.
>
>
>>
>> I agree that the proposition of XML Schema is alluring: The information
>> about
>> the data model used and how to validate the data is always present and the
>> tools for validation are already there.
>>
>> You did not use RDF because it has no standard way to do this - which is
>> unfortunate.
>>
>
> It is unfortunate.  After working with the openEHR Foundation on
> multi-level modelling for a decade using a domain specific language it was
> an easy realization that a relatively small group of people could not
> create high quality tools needed for a DSL; in any reasonable amount of
> time.
> I began looking for alternatives.  OWL and RDF would be my first choices
> for implementation.  They just weren't and still aren't mature enough to do
> everything needed.  Remember as I stated before; the MLHIM reference model
> is a conceptual information model.  I choose XML because I did not see
> anything with that capability and widespread adoption. I knew very little
> about XML Schema prior to this.  So I did not choose it because it was my
> hammer already.  I spent a lot of time on a lang learning curve and had to
> wait for tools to catch up to XML Schema 1.1
>
>>
>> You could have created a way and tools to do this in RDF. Did you fear the
>> necessary effort or the risk to adoption?
>>
>
> (see above)
> Given, time talent and money; openEHR could do it with the Archetype
> Definition Language. But it would never be as ubiquitous as XML.
>
>
>> It seems that XML Schema allows vocabulary reuse down to the
>> property/attribute
>> level - but the temptation to create own terms instead of reusing others
>> seems
>> to be greater than with RDF. Having some of the semantics in the XML
>> Schema
>> layer and more of it in the RDF layer on top of it definitely is a
>> drawback.
>>
>>
> There may be other/additional approaches that may help improve MLHIM.  I
> am certainly open to and welcome dialog about it.  The specifications (such
> that they are at this point) are openly available under a Creative Commons
> license.  Feel free to join the discussion on social media (Google Plus
> preferred).
>
>
>
>> How many implementors will just ignore the optional RDF layer?
>>
>
> You must realize that software developers do not have control of the
> models in this approach.  Domain experts that understand a little bit of
> how to use the CCD-Gen are the ones responsible for building the models.
>  In the process of teaching them this activity, they are also taught the
> importance of the quality of their models and it ultimately decides the
> quality of their data.
>
> The MLHIM eco-system allows for closed loop concept models( CCDs) to be
> developed as well as openly licensed CCDs.  There may eventually be 10,000
> blood pressure CCDs in the open.  But like most things, we predict that
> most people will reuse a model that is good and openly available, instead
> of building their own.
>
> I can't decide for the experts nor do I want to control what is or is not
> a good model for any particular implementation.  All I can do is offer them
> a real solution that is bottom up and under their control instead of slow
> moving international standards bodies that can't keep up with the changing
> science.
>
> Thanks for your feedback.   Explaining MLHIM in words is always a learning
> experience for me.
>
> Regards,
> Tim
>
>
>
>
>
>>
>> Regards,
>>
>> Michael Brunnbauer
>>
>> On Sat, Mar 08, 2014 at 06:36:54PM -0300, Timothy W. Cook wrote:
>> > A very interesting and I think, foundational discussion.  David, thanks
>> for
>> > bringing it up.
>> > Below is a discussion of why I believe that RDF should be considered a
>> > layer over data models or maybe as 'semantic glue'.
>> >
>> >  David, we are working on the same type of problem but from slightly
>> > different perspectives.  The presentation that you linked to
>> re:KnowMED, is
>> > very important and I recall seeing it before.  I'll take this
>> opportunity
>> > to comment on it since it is in the context of this discussion.  The
>> > indicates that you propse RDF as a language to be used in the exchange
>> of
>> > healthcare data.  Then on slide #5 you say it isn't enough to 'get us
>> > there'.  So I am not sure how much of this is marketing swagger and how
>> > much is hard fact.
>> >
>> >  On slide #8 item #2 we are 100% in agreement.  But then on slide #9 you
>> > are mixing apples and oranges.  XML and RDF have two different purposes
>> > that work well together.
>> >
>> >  On further slides, your Blue, Green and Red customers exactly indicate
>> > what I mean by RDF being an essential layer on top of multiple models.
>> >
>> >  What happens further in the presentation is where we disagree.  You
>> assert
>> > that RDF should be the language used to actually 'exchange' data. This
>> > where RDF and the tools around it (AFAIK) are not mature enough to
>> perform.
>> >  Several times you have mentioned 'semantics and not syntax'. This is a
>> > huge mistake.  You must have both in order to insure data quality and
>> > meaning.  Secondly we know from history that top-down consensus in
>> > healthcare concept modelling is an impossibility.[1]
>> >
>> >  In your post describing the BP screenshot you said:
>> >  "Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
>> pressure
>> > information, they represent that information differently.  Nonetheless,
>> > both representations can peacefully coexist in the same merged RDF data
>> > without conflict, which might happen, for example, if one is derived
>> from
>> > the other through inference."
>> > I take this to mean that you are representing the exact same BP
>> measurement
>> > data in two different ways?  Your use case, 'by inference' is a little
>> > fuzzy for me.  If it is derivation by inference, it will just be an in
>> > memory representation and not persisted; correct?   Irregardless, the
>> > existence of the same data instance, in the same application is in
>> complete
>> > contradiction to good data quality management.  As you go on to explain,
>> > now you must add application intelligence to analyze whether or not two
>> > data instances are the same or not to avoid counting them as two
>> separate
>> > instances.  This is approach is very dangerous, in addition to adding
>> > complexity and cost to the applications.    However, having the ability
>> to
>> > determine if two different data instances exactly match the same
>> concept is
>> > essential.  Minor differences such as the position of the patient
>> (stitting
>> > or prone) or the type of instrument used to perform the measurement or
>> the
>> > location on the body (left upper arm or right thigh, etc.) that the
>> > measurement was taken are all important.  They may or may not rule in or
>> > out specific measurements, based on the intended use of the query
>> results.
>> >  This is where RDF is essential, do these two instances point to exactly
>> > the same code in a controlled vocabulary, etc.?    These questions are
>> > essential to having the ability to perform machine based reasoning over
>> the
>> > data repository; whether at the point of care or for research purposes.
>> >
>> > Refering back for a moment, to 'the same data instance' situation.  It
>> is
>> > essential to have additional information (meta-data) to determine if two
>> > instances are are exactly the same.  This can legitimately occur during
>> > aggregation for research or systemic quality analysis.  Unique patient
>> > identifiers along with datetime stamps are ideal.  However, the patient
>> > identifier issue is an ongoing problem that is actually implementation
>> > context and application specific.  It is outside of the context of data
>> > quality and management.
>> >
>> >  Slide #22 clearly indicates that there is an expectation that RDF is
>> used
>> > as a common format.  However, as I said earlier, the current
>> implementation
>> > of RDF is not robust enough to perform this function, UNLESS, there is a
>> > global expert consensus on all healthcare concepts so that models may be
>> > created and distributed from a central authority.  This is simply
>> > unrealistic as history has shown and is formalized in the Cavalini-Cook
>> > theory [1].
>> >
>> > The reason that I state that RDF is not capable, at this point of
>> maturity,
>> > is that it doesn't support the ability to represent syntactic
>> structures in
>> > a multi-level model environment.  IOW: There is no ability (AFAIK) to
>> > express a common reference model and then derive concepts models that
>> issue
>> > further constraints.  A multi-level model approach is essential in
>> order to
>> > abstract the syntax and semantics of each concept out of the application
>> > source code and repository schemas so that they can be shared between
>> > disparate applications.  This is what provides for full syntactic and
>> > semantic interoperability.
>> >
>> > A multi-level model approach may or may not be useful in many domains.
>> >  Specifically, human engineered domains that we fully understand can be
>> > modeled as one level representations.  However, biological domains that
>> > involve evolutionary complexity are quite different.  Primarily because
>> we
>> > do not fully understand them so our science and understanding is
>> constantly
>> > changing.  Additionally, it appears that the data has a much longer
>> > lifetime of significance than other domains.  Therefore the data should
>> be
>> > initially captured and represented in a manner that makes it as future
>> > proof and reusable as possible.  In healthcare, the most semantically
>> rich
>> > point of any information is at the point of care.  Every point of
>> > transition/translation after that will most assuredly lose context.  As
>> a
>> > brief example; reference ranges for conditions change over time.  It is
>> > essential that data captured today be expressed in the context of
>> today's
>> > knowledge, even 20 or more years from now.  The concept model around
>> high
>> > blood pressure is different than it was 10 years ago.
>> >
>> > Where RDF shines is that in a syntactic model of a concept designed to
>> > capture reference ranges and other metadata, it can be used to provide
>> > external semantic context to that model.  Whether that context exists
>> in a
>> > controlled vocabulary or even free text documents such as clinical
>> > guidelines.
>> >
>> > In the Multi-Level Healthcare Information Modelling (MLHIM) approach we
>> > developed a conceptual reference model to provide a basis for software
>> > implementations. While the MLHIM model doesn't preclude other
>> > serializations, we found that XML Schema 1.1 does provide the
>> prerequisites
>> > for implementation both a reference model and concepts models.  This
>> means
>> > that we can have full validation of instance data back to the W3C
>> > specifications.  By marking up the concept models (XML Schema 1.1
>> > annotations) with RDF providing the computable semantic links for each
>> > model as defined by the modeller.  These models can now be created by
>> > domain experts (with additional knowledge modelling training) so that
>> > software developers do not have to interpret the meanings.
>> >
>> > The concept models are now fully detached from any specific
>> implementation
>> > and can be shared to use for validating instance data in the context in
>> > which it was recorded.  I believe that this is the closest we have to
>> > semantic interoperability, to date.  I am of course open for discussion
>> and
>> > debate on the issue.  I used the acronym 'AFAIK' a few times above.  I
>> used
>> > this because my last serious attempt to use RDF for this purpose was in
>> > 2010/2011.  I know that there is a continuous maturing process going
>> on.  I
>> > believe that there may come a day when RDF and OWL can be used
>> exclusively
>> > for syntactic and semantic representation and reasoning.  But AFAIK, not
>> > today.
>> >
>> >  We have a significant number of peer-reviewed publications about MLHIM
>> and
>> > academic as well as other implementations. I am happy to share those
>> with
>> > the group or you may peruse the links in my signature line as well as
>> > www.mlhim.org and the specs are openly downloadable from here[2] as a
>> > package and as source from here [3].
>> >
>> > We also have  almost 2000 datatypes converted from other modeling
>> > approaches (such as the NIH CDE browser and HL7 FHIR) into reusable
>> > complexTypes to be used in concept models.  You can review those as
>> well as
>> > download some example concept models from here[4].  Free registration is
>> > required to download the models.
>> >
>> >  Kind Regards,
>> >  Tim
>> >
>> >
>> >  [1]
>> >
>> https://github.com/mlhim/specs/blob/2_4_3/graphics/cavalini_cook_theory.png
>> >  [2]
>> >
>> https://launchpad.net/mlhim-specs/2.0/2.4.3/+download/mlhim-specs-2013-10-15-2.4.3-Release.zip
>> >  [3]  https://github.com/mlhim/
>> >  [4]  http://www.ccdgen.com
>> >
>> >
>> >
>> >
>> > On Fri, Mar 7, 2014 at 5:00 PM, David Booth <david@dbooth.org> wrote:
>> >
>> > > Hi Alan,
>> > >
>> > >
>> > > On 03/07/2014 12:44 PM, Alan Ruttenberg wrote:
>> > >
>> > >> Can you explain what you mean by "RDF's ability to allow multiple
>> data
>> > >> models to peacefully coexist, interconnected, in the same data" ?
>> > >>
>> > >
>> > > Yes.  Here is an imprecise illustration, on slides 10-17:
>> > >
>> http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf
>> > > (I took some artistic liberties blurring class/instance distinctions
>> in
>> > > that diagram.)
>> > >
>> > > And here is a more precise example that cleanly distinguishes classes
>> from
>> > > instances:
>> > > http://tinyurl.com/pzsgf7f
>> > > (I've also attached the same illustration, for offline readers.)
>> > >
>> > > In this latter example (of a hypothetical systolic blood pressure
>> > > measurement), the same information is represented according to two
>> > > different models/schemas/vocabularies/ontologies, v1 (green) and v2
>> > > (red).  (I am using the terms model, schema, vocabulary and ontology
>> > > loosely and somewhat interchangeably here.)
>> > >
>> > > In the v1 model, the systolic blood pressure is indicated in RDF like
>> this:
>> > >
>> > >   ex:patient319 foaf:name "John Doe" ;
>> > >     v1:bps ex1:bp_023 .
>> > >
>> > >   ex1:bp_023 a v1:SystolicBPSitting_mmHg ;
>> > >     v1:value 120 .
>> > >
>> > > Whereas in the v2 model, the same information is represented
>> differently,
>> > > in RDF like this:
>> > >
>> > >   ex:patient319 foaf:name "John Doe" ;
>> > >     v2:bps ex2:bp_409 .
>> > >
>> > >   ex2:bp_409 a v2:SystolicBP ;
>> > >     v2:pressure 120 ;
>> > >     v2:units v2:mmHg ;
>> > >     v2:bodyPosition v2:sitting .
>> > >
>> > > Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
>> pressure
>> > > information, they represent that information differently.
>>  Nonetheless,
>> > > both representations can peacefully coexist in the same merged RDF
>> data
>> > > without conflict, which might happen, for example, if one is derived
>> from
>> > > the other through inference.
>> > >
>> > > Furthermore, the relationship between these classes,
>> > > v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the
>> relationship
>> > > between the corresponding v1 and v2 instance data, can also be
>> explicitly
>> > > captured in RDF, as the v1v2:SystolicBP_Transform (yellow)
>> relationship:
>> > >
>> > >   v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform v2:SystolicBP .
>> > >
>> > > Inference rules for v1v2:SystolicBP_Transform could therefore convert
>> a
>> > > v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP measurement
>> or
>> > > vice versa.
>> > >
>> > > This example only illustrated the case where the transformation from
>> one
>> > > model to the other is lossless and thus reversible.  Usually that
>> isn't the
>> > > case.  Relating models and transforming between them is *not* easy,
>> but at
>> > > least RDF makes it possible to explicitly indicate these
>> relationships.
>> > >
>> > > Obviously some intelligence must be exercised to avoid, for example,
>> > > accidentally thinking that ex:bp_023 and ex2:bp_409 represent two
>> distinct
>> > > blood pressure measurements, and thereby double counting them, but
>> that's
>> > > easy enough to do.
>> > >
>> > > Also, there isn't always a desire to relate or transform between
>> models.
>> > >  Sometimes some data is related and other data is not, and it is all
>> still
>> > > merged into the same RDF graph.  In fact, the point may be to connect
>> that
>> > > part of the data that *is* related and let the rest coexist without
>> being
>> > > connected (or at least not *directly* connected).
>> > >
>> > > The point is that these data models can peacefully coexist in RDF data
>> > > without conflict: applications using the v1 model against the merged
>> data
>> > > might only see v1 instance data, whereas applications using the v2
>> model
>> > > might only see the v2 data.  That's qualitatively different than in
>> the
>> > > world of XML, for example, where one schema generally wants to be "on
>> top",
>> > > and when you merge XML of different schemas, you need to create a new
>> "top"
>> > > schema.  That is the difference that I have so often tried to explain
>> to
>> > > people outside the RDF community, and what I am trying to capture
>> > > succinctly in a term or phrase.   It isn't an easy idea to convey to
>> those
>> > > who are accustomed to a schema-centric approach.  I think a catchy but
>> > > descriptive term or phrase could help.
>> > >
>> > > Thanks,
>> > > David
>> > >
>> > >
>> > >> -Alan
>> > >>
>> > >>
>> > >> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org
>> > >> <mailto:david@dbooth.org>> wrote:
>> > >>
>> > >>     I -- and I'm sure many others -- have struggled for years trying
>> to
>> > >>     succinctly describe RDF's ability to allow multiple data models
>> to
>> > >>     peacefully coexist, interconnected, in the same data.  For data
>> > >>     integration, this is a key strength of RDF that distinguishes it
>> > >>     from other information representation languages such as XML.   I
>> > >>     have tried various terms over the years -- most recently "schema
>> > >>     promiscuous" -- but have not yet found one that I think really
>> nails
>> > >>     it, so I would love to get other people's thoughts.
>> > >>
>> > >>     This google doc lists several candidate terms, some pros and
>> cons,
>> > >>     and allows you to indicate which ones you like best:
>> > >>     http://goo.gl/zrXQgj
>> > >>
>> > >>     Please have a look and indicate your favorite(s).  You may also
>> add
>> > >>     more ideas and comments to it.  The document can be edited by
>> anyone
>> > >>     with the URL.
>> > >>
>> > >>     Thanks!
>> > >>     David Booth
>> > >>
>> > >>
>> > >>
>> >
>> >
>> > --
>> > MLHIM VIP Signup: http://goo.gl/22B0U
>> > ============================================
>> > Timothy Cook, MSc           +55 21 994711995
>> > MLHIM http://www.mlhim.org
>> > Like Us on FB: https://www.facebook.com/mlhim2
>> > Circle us on G+: http://goo.gl/44EV5
>> > Google Scholar: http://goo.gl/MMZ1o
>> > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>>
>> --
>> ++  Michael Brunnbauer
>> ++  netEstate GmbH
>> ++  Geisenhausener Straße 11a
>> ++  81379 München
>> ++  Tel +49 89 32 19 77 80
>> ++  Fax +49 89 32 19 77 89
>> ++  E-Mail brunni@netestate.de
>> ++  http://www.netestate.de/
>> ++
>> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
>> ++  USt-IdNr. DE221033342
>> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
>> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>>
>
>
>
> --
> MLHIM VIP Signup: http://goo.gl/22B0U
> ============================================
> Timothy Cook, MSc           +55 21 994711995
> MLHIM http://www.mlhim.org
> Like Us on FB: https://www.facebook.com/mlhim2
> Circle us on G+: http://goo.gl/44EV5
> Google Scholar: http://goo.gl/MMZ1o
> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>
Received on Sunday, 9 March 2014 21:13:02 UTC