Re: What should we call RDF's ability to allow multiple models to peacefully coexist, interconnected? from Eric Prud'hommeaux on 2014-03-10 (semantic-web@w3.org from March 2014)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 10 Mar 2014 03:46:33 -0400
To: Martynas Jusevičius <martynas@graphity.org>
Cc: "Timothy W. Cook" <tim@mlhim.org>, semantic-web <semantic-web@w3.org>, Michael Brunnbauer <brunni@netestate.de>
Message-ID: <20140310074630.GA7438@w3.org>
* Martynas Jusevičius <martynas@graphity.org> [2014-03-09 22:12+0100]
> Hey all,
> 
> Regarding RDF validation - I guess you all know about SPIN constraints,
> right? They're SPARQL-based.
> 
> http://spinrdf.org/spin.html#spin-constraints

There are a few systems out there for this, including a DSL called
"Shape Expressions" which can be executed directly or compiled to
validating SPARQL queries.

  http://www.w3.org/2013/ShEx/FancyShExDemo?schemaURL=Examples/Issue-simple-annotated.shex&dataURL=test/Issue-pass-date.ttl&colorize=1

  http://www.w3.org/2013/ShEx/Primer

It is my hope that W3C will begin standardizing RDF Validation
(interface defintion, etc.) in the very near future.


> Martynas
> 
> graphityhq.com
> 
> 
> On Mar 9, 2014 10:03 PM, "Timothy W. Cook" <tim@mlhim.org> wrote:
> 
> > On Sun, Mar 9, 2014 at 11:48 AM, Michael Brunnbauer <brunni@netestate.de>wrote:
> >
> >>
> >> Hello Timothy,
> >>
> >> MLHIM seems to be annotated data models - with optional RDF annotations.
> >>
> >> Somewhat, but the models are are restrictions of a common reference
> > model.  Each model represents a concept that is as broad or narrow as the
> > modeller chooses.  The annotations must be optional.  It is up to the
> > domain experts/knowledge modellers to determine the resultant quality.
> >
> >
> >
> >
> >> The claims regarding interoperability and semantics are a bit
> >> exaggerated, IMO.
> >>
> >>
> >
> > I suppose your opinion will change when you decide to put some study into
> > the matter.
> >
> >
> >
> >> If we had something like annotated portable RDB schemas, would they carry
> >> less
> >> meaning and would applications built with them be less interoperable than
> >> with
> >> MLHIM?
> >>
> >>
> > If you were able to share those concept models between applications and
> > they were restrictions of a common reference model; then yes they would be
> > the same.
> >
> >
> >
> >> In order to make applications completely interoperable and remove all
> >> implicit semantics from their code, you have to abolish them - replacing
> >> them
> >> with some standard component. This is probably as futile as the
> >> ontology/data
> >> model to rule them all.
> >>
> >
> > Further study will show that there are paths to operate along in the
> > interim.  But yes, the eventual goal would be for a common healthcare
> > reference model.
> >
> >
> >>
> >> I agree that the proposition of XML Schema is alluring: The information
> >> about
> >> the data model used and how to validate the data is always present and the
> >> tools for validation are already there.
> >>
> >> You did not use RDF because it has no standard way to do this - which is
> >> unfortunate.
> >>
> >
> > It is unfortunate.  After working with the openEHR Foundation on
> > multi-level modelling for a decade using a domain specific language it was
> > an easy realization that a relatively small group of people could not
> > create high quality tools needed for a DSL; in any reasonable amount of
> > time.
> > I began looking for alternatives.  OWL and RDF would be my first choices
> > for implementation.  They just weren't and still aren't mature enough to do
> > everything needed.  Remember as I stated before; the MLHIM reference model
> > is a conceptual information model.  I choose XML because I did not see
> > anything with that capability and widespread adoption. I knew very little
> > about XML Schema prior to this.  So I did not choose it because it was my
> > hammer already.  I spent a lot of time on a lang learning curve and had to
> > wait for tools to catch up to XML Schema 1.1
> >
> >>
> >> You could have created a way and tools to do this in RDF. Did you fear the
> >> necessary effort or the risk to adoption?
> >>
> >
> > (see above)
> > Given, time talent and money; openEHR could do it with the Archetype
> > Definition Language. But it would never be as ubiquitous as XML.
> >
> >
> >> It seems that XML Schema allows vocabulary reuse down to the
> >> property/attribute
> >> level - but the temptation to create own terms instead of reusing others
> >> seems
> >> to be greater than with RDF. Having some of the semantics in the XML
> >> Schema
> >> layer and more of it in the RDF layer on top of it definitely is a
> >> drawback.
> >>
> >>
> > There may be other/additional approaches that may help improve MLHIM.  I
> > am certainly open to and welcome dialog about it.  The specifications (such
> > that they are at this point) are openly available under a Creative Commons
> > license.  Feel free to join the discussion on social media (Google Plus
> > preferred).
> >
> >
> >
> >> How many implementors will just ignore the optional RDF layer?
> >>
> >
> > You must realize that software developers do not have control of the
> > models in this approach.  Domain experts that understand a little bit of
> > how to use the CCD-Gen are the ones responsible for building the models.
> >  In the process of teaching them this activity, they are also taught the
> > importance of the quality of their models and it ultimately decides the
> > quality of their data.
> >
> > The MLHIM eco-system allows for closed loop concept models( CCDs) to be
> > developed as well as openly licensed CCDs.  There may eventually be 10,000
> > blood pressure CCDs in the open.  But like most things, we predict that
> > most people will reuse a model that is good and openly available, instead
> > of building their own.
> >
> > I can't decide for the experts nor do I want to control what is or is not
> > a good model for any particular implementation.  All I can do is offer them
> > a real solution that is bottom up and under their control instead of slow
> > moving international standards bodies that can't keep up with the changing
> > science.
> >
> > Thanks for your feedback.   Explaining MLHIM in words is always a learning
> > experience for me.
> >
> > Regards,
> > Tim
> >
> >
> >
> >
> >
> >>
> >> Regards,
> >>
> >> Michael Brunnbauer
> >>
> >> On Sat, Mar 08, 2014 at 06:36:54PM -0300, Timothy W. Cook wrote:
> >> > A very interesting and I think, foundational discussion.  David, thanks
> >> for
> >> > bringing it up.
> >> > Below is a discussion of why I believe that RDF should be considered a
> >> > layer over data models or maybe as 'semantic glue'.
> >> >
> >> >  David, we are working on the same type of problem but from slightly
> >> > different perspectives.  The presentation that you linked to
> >> re:KnowMED, is
> >> > very important and I recall seeing it before.  I'll take this
> >> opportunity
> >> > to comment on it since it is in the context of this discussion.  The
> >> > indicates that you propse RDF as a language to be used in the exchange
> >> of
> >> > healthcare data.  Then on slide #5 you say it isn't enough to 'get us
> >> > there'.  So I am not sure how much of this is marketing swagger and how
> >> > much is hard fact.
> >> >
> >> >  On slide #8 item #2 we are 100% in agreement.  But then on slide #9 you
> >> > are mixing apples and oranges.  XML and RDF have two different purposes
> >> > that work well together.
> >> >
> >> >  On further slides, your Blue, Green and Red customers exactly indicate
> >> > what I mean by RDF being an essential layer on top of multiple models.
> >> >
> >> >  What happens further in the presentation is where we disagree.  You
> >> assert
> >> > that RDF should be the language used to actually 'exchange' data. This
> >> > where RDF and the tools around it (AFAIK) are not mature enough to
> >> perform.
> >> >  Several times you have mentioned 'semantics and not syntax'. This is a
> >> > huge mistake.  You must have both in order to insure data quality and
> >> > meaning.  Secondly we know from history that top-down consensus in
> >> > healthcare concept modelling is an impossibility.[1]
> >> >
> >> >  In your post describing the BP screenshot you said:
> >> >  "Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
> >> pressure
> >> > information, they represent that information differently.  Nonetheless,
> >> > both representations can peacefully coexist in the same merged RDF data
> >> > without conflict, which might happen, for example, if one is derived
> >> from
> >> > the other through inference."
> >> > I take this to mean that you are representing the exact same BP
> >> measurement
> >> > data in two different ways?  Your use case, 'by inference' is a little
> >> > fuzzy for me.  If it is derivation by inference, it will just be an in
> >> > memory representation and not persisted; correct?   Irregardless, the
> >> > existence of the same data instance, in the same application is in
> >> complete
> >> > contradiction to good data quality management.  As you go on to explain,
> >> > now you must add application intelligence to analyze whether or not two
> >> > data instances are the same or not to avoid counting them as two
> >> separate
> >> > instances.  This is approach is very dangerous, in addition to adding
> >> > complexity and cost to the applications.    However, having the ability
> >> to
> >> > determine if two different data instances exactly match the same
> >> concept is
> >> > essential.  Minor differences such as the position of the patient
> >> (stitting
> >> > or prone) or the type of instrument used to perform the measurement or
> >> the
> >> > location on the body (left upper arm or right thigh, etc.) that the
> >> > measurement was taken are all important.  They may or may not rule in or
> >> > out specific measurements, based on the intended use of the query
> >> results.
> >> >  This is where RDF is essential, do these two instances point to exactly
> >> > the same code in a controlled vocabulary, etc.?    These questions are
> >> > essential to having the ability to perform machine based reasoning over
> >> the
> >> > data repository; whether at the point of care or for research purposes.
> >> >
> >> > Refering back for a moment, to 'the same data instance' situation.  It
> >> is
> >> > essential to have additional information (meta-data) to determine if two
> >> > instances are are exactly the same.  This can legitimately occur during
> >> > aggregation for research or systemic quality analysis.  Unique patient
> >> > identifiers along with datetime stamps are ideal.  However, the patient
> >> > identifier issue is an ongoing problem that is actually implementation
> >> > context and application specific.  It is outside of the context of data
> >> > quality and management.
> >> >
> >> >  Slide #22 clearly indicates that there is an expectation that RDF is
> >> used
> >> > as a common format.  However, as I said earlier, the current
> >> implementation
> >> > of RDF is not robust enough to perform this function, UNLESS, there is a
> >> > global expert consensus on all healthcare concepts so that models may be
> >> > created and distributed from a central authority.  This is simply
> >> > unrealistic as history has shown and is formalized in the Cavalini-Cook
> >> > theory [1].
> >> >
> >> > The reason that I state that RDF is not capable, at this point of
> >> maturity,
> >> > is that it doesn't support the ability to represent syntactic
> >> structures in
> >> > a multi-level model environment.  IOW: There is no ability (AFAIK) to
> >> > express a common reference model and then derive concepts models that
> >> issue
> >> > further constraints.  A multi-level model approach is essential in
> >> order to
> >> > abstract the syntax and semantics of each concept out of the application
> >> > source code and repository schemas so that they can be shared between
> >> > disparate applications.  This is what provides for full syntactic and
> >> > semantic interoperability.
> >> >
> >> > A multi-level model approach may or may not be useful in many domains.
> >> >  Specifically, human engineered domains that we fully understand can be
> >> > modeled as one level representations.  However, biological domains that
> >> > involve evolutionary complexity are quite different.  Primarily because
> >> we
> >> > do not fully understand them so our science and understanding is
> >> constantly
> >> > changing.  Additionally, it appears that the data has a much longer
> >> > lifetime of significance than other domains.  Therefore the data should
> >> be
> >> > initially captured and represented in a manner that makes it as future
> >> > proof and reusable as possible.  In healthcare, the most semantically
> >> rich
> >> > point of any information is at the point of care.  Every point of
> >> > transition/translation after that will most assuredly lose context.  As
> >> a
> >> > brief example; reference ranges for conditions change over time.  It is
> >> > essential that data captured today be expressed in the context of
> >> today's
> >> > knowledge, even 20 or more years from now.  The concept model around
> >> high
> >> > blood pressure is different than it was 10 years ago.
> >> >
> >> > Where RDF shines is that in a syntactic model of a concept designed to
> >> > capture reference ranges and other metadata, it can be used to provide
> >> > external semantic context to that model.  Whether that context exists
> >> in a
> >> > controlled vocabulary or even free text documents such as clinical
> >> > guidelines.
> >> >
> >> > In the Multi-Level Healthcare Information Modelling (MLHIM) approach we
> >> > developed a conceptual reference model to provide a basis for software
> >> > implementations. While the MLHIM model doesn't preclude other
> >> > serializations, we found that XML Schema 1.1 does provide the
> >> prerequisites
> >> > for implementation both a reference model and concepts models.  This
> >> means
> >> > that we can have full validation of instance data back to the W3C
> >> > specifications.  By marking up the concept models (XML Schema 1.1
> >> > annotations) with RDF providing the computable semantic links for each
> >> > model as defined by the modeller.  These models can now be created by
> >> > domain experts (with additional knowledge modelling training) so that
> >> > software developers do not have to interpret the meanings.
> >> >
> >> > The concept models are now fully detached from any specific
> >> implementation
> >> > and can be shared to use for validating instance data in the context in
> >> > which it was recorded.  I believe that this is the closest we have to
> >> > semantic interoperability, to date.  I am of course open for discussion
> >> and
> >> > debate on the issue.  I used the acronym 'AFAIK' a few times above.  I
> >> used
> >> > this because my last serious attempt to use RDF for this purpose was in
> >> > 2010/2011.  I know that there is a continuous maturing process going
> >> on.  I
> >> > believe that there may come a day when RDF and OWL can be used
> >> exclusively
> >> > for syntactic and semantic representation and reasoning.  But AFAIK, not
> >> > today.
> >> >
> >> >  We have a significant number of peer-reviewed publications about MLHIM
> >> and
> >> > academic as well as other implementations. I am happy to share those
> >> with
> >> > the group or you may peruse the links in my signature line as well as
> >> > www.mlhim.org and the specs are openly downloadable from here[2] as a
> >> > package and as source from here [3].
> >> >
> >> > We also have  almost 2000 datatypes converted from other modeling
> >> > approaches (such as the NIH CDE browser and HL7 FHIR) into reusable
> >> > complexTypes to be used in concept models.  You can review those as
> >> well as
> >> > download some example concept models from here[4].  Free registration is
> >> > required to download the models.
> >> >
> >> >  Kind Regards,
> >> >  Tim
> >> >
> >> >
> >> >  [1]
> >> >
> >> https://github.com/mlhim/specs/blob/2_4_3/graphics/cavalini_cook_theory.png
> >> >  [2]
> >> >
> >> https://launchpad.net/mlhim-specs/2.0/2.4.3/+download/mlhim-specs-2013-10-15-2.4.3-Release.zip
> >> >  [3]  https://github.com/mlhim/
> >> >  [4]  http://www.ccdgen.com
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Mar 7, 2014 at 5:00 PM, David Booth <david@dbooth.org> wrote:
> >> >
> >> > > Hi Alan,
> >> > >
> >> > >
> >> > > On 03/07/2014 12:44 PM, Alan Ruttenberg wrote:
> >> > >
> >> > >> Can you explain what you mean by "RDF's ability to allow multiple
> >> data
> >> > >> models to peacefully coexist, interconnected, in the same data" ?
> >> > >>
> >> > >
> >> > > Yes.  Here is an imprecise illustration, on slides 10-17:
> >> > >
> >> http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf
> >> > > (I took some artistic liberties blurring class/instance distinctions
> >> in
> >> > > that diagram.)
> >> > >
> >> > > And here is a more precise example that cleanly distinguishes classes
> >> from
> >> > > instances:
> >> > > http://tinyurl.com/pzsgf7f
> >> > > (I've also attached the same illustration, for offline readers.)
> >> > >
> >> > > In this latter example (of a hypothetical systolic blood pressure
> >> > > measurement), the same information is represented according to two
> >> > > different models/schemas/vocabularies/ontologies, v1 (green) and v2
> >> > > (red).  (I am using the terms model, schema, vocabulary and ontology
> >> > > loosely and somewhat interchangeably here.)
> >> > >
> >> > > In the v1 model, the systolic blood pressure is indicated in RDF like
> >> this:
> >> > >
> >> > >   ex:patient319 foaf:name "John Doe" ;
> >> > >     v1:bps ex1:bp_023 .
> >> > >
> >> > >   ex1:bp_023 a v1:SystolicBPSitting_mmHg ;
> >> > >     v1:value 120 .
> >> > >
> >> > > Whereas in the v2 model, the same information is represented
> >> differently,
> >> > > in RDF like this:
> >> > >
> >> > >   ex:patient319 foaf:name "John Doe" ;
> >> > >     v2:bps ex2:bp_409 .
> >> > >
> >> > >   ex2:bp_409 a v2:SystolicBP ;
> >> > >     v2:pressure 120 ;
> >> > >     v2:units v2:mmHg ;
> >> > >     v2:bodyPosition v2:sitting .
> >> > >
> >> > > Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
> >> pressure
> >> > > information, they represent that information differently.
> >>  Nonetheless,
> >> > > both representations can peacefully coexist in the same merged RDF
> >> data
> >> > > without conflict, which might happen, for example, if one is derived
> >> from
> >> > > the other through inference.
> >> > >
> >> > > Furthermore, the relationship between these classes,
> >> > > v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the
> >> relationship
> >> > > between the corresponding v1 and v2 instance data, can also be
> >> explicitly
> >> > > captured in RDF, as the v1v2:SystolicBP_Transform (yellow)
> >> relationship:
> >> > >
> >> > >   v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform v2:SystolicBP .
> >> > >
> >> > > Inference rules for v1v2:SystolicBP_Transform could therefore convert
> >> a
> >> > > v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP measurement
> >> or
> >> > > vice versa.
> >> > >
> >> > > This example only illustrated the case where the transformation from
> >> one
> >> > > model to the other is lossless and thus reversible.  Usually that
> >> isn't the
> >> > > case.  Relating models and transforming between them is *not* easy,
> >> but at
> >> > > least RDF makes it possible to explicitly indicate these
> >> relationships.
> >> > >
> >> > > Obviously some intelligence must be exercised to avoid, for example,
> >> > > accidentally thinking that ex:bp_023 and ex2:bp_409 represent two
> >> distinct
> >> > > blood pressure measurements, and thereby double counting them, but
> >> that's
> >> > > easy enough to do.
> >> > >
> >> > > Also, there isn't always a desire to relate or transform between
> >> models.
> >> > >  Sometimes some data is related and other data is not, and it is all
> >> still
> >> > > merged into the same RDF graph.  In fact, the point may be to connect
> >> that
> >> > > part of the data that *is* related and let the rest coexist without
> >> being
> >> > > connected (or at least not *directly* connected).
> >> > >
> >> > > The point is that these data models can peacefully coexist in RDF data
> >> > > without conflict: applications using the v1 model against the merged
> >> data
> >> > > might only see v1 instance data, whereas applications using the v2
> >> model
> >> > > might only see the v2 data.  That's qualitatively different than in
> >> the
> >> > > world of XML, for example, where one schema generally wants to be "on
> >> top",
> >> > > and when you merge XML of different schemas, you need to create a new
> >> "top"
> >> > > schema.  That is the difference that I have so often tried to explain
> >> to
> >> > > people outside the RDF community, and what I am trying to capture
> >> > > succinctly in a term or phrase.   It isn't an easy idea to convey to
> >> those
> >> > > who are accustomed to a schema-centric approach.  I think a catchy but
> >> > > descriptive term or phrase could help.
> >> > >
> >> > > Thanks,
> >> > > David
> >> > >
> >> > >
> >> > >> -Alan
> >> > >>
> >> > >>
> >> > >> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org
> >> > >> <mailto:david@dbooth.org>> wrote:
> >> > >>
> >> > >>     I -- and I'm sure many others -- have struggled for years trying
> >> to
> >> > >>     succinctly describe RDF's ability to allow multiple data models
> >> to
> >> > >>     peacefully coexist, interconnected, in the same data.  For data
> >> > >>     integration, this is a key strength of RDF that distinguishes it
> >> > >>     from other information representation languages such as XML.   I
> >> > >>     have tried various terms over the years -- most recently "schema
> >> > >>     promiscuous" -- but have not yet found one that I think really
> >> nails
> >> > >>     it, so I would love to get other people's thoughts.
> >> > >>
> >> > >>     This google doc lists several candidate terms, some pros and
> >> cons,
> >> > >>     and allows you to indicate which ones you like best:
> >> > >>     http://goo.gl/zrXQgj
> >> > >>
> >> > >>     Please have a look and indicate your favorite(s).  You may also
> >> add
> >> > >>     more ideas and comments to it.  The document can be edited by
> >> anyone
> >> > >>     with the URL.
> >> > >>
> >> > >>     Thanks!
> >> > >>     David Booth
> >> > >>
> >> > >>
> >> > >>
> >> >
> >> >
> >> > --
> >> > MLHIM VIP Signup: http://goo.gl/22B0U
> >> > ============================================
> >> > Timothy Cook, MSc           +55 21 994711995
> >> > MLHIM http://www.mlhim.org
> >> > Like Us on FB: https://www.facebook.com/mlhim2
> >> > Circle us on G+: http://goo.gl/44EV5
> >> > Google Scholar: http://goo.gl/MMZ1o
> >> > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
> >>
> >> --
> >> ++  Michael Brunnbauer
> >> ++  netEstate GmbH
> >> ++  Geisenhausener Straße 11a
> >> ++  81379 München
> >> ++  Tel +49 89 32 19 77 80
> >> ++  Fax +49 89 32 19 77 89
> >> ++  E-Mail brunni@netestate.de
> >> ++  http://www.netestate.de/
> >> ++
> >> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> >> ++  USt-IdNr. DE221033342
> >> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> >> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
> >>
> >
> >
> >
> > --
> > MLHIM VIP Signup: http://goo.gl/22B0U
> > ============================================
> > Timothy Cook, MSc           +55 21 994711995
> > MLHIM http://www.mlhim.org
> > Like Us on FB: https://www.facebook.com/mlhim2
> > Circle us on G+: http://goo.gl/44EV5
> > Google Scholar: http://goo.gl/MMZ1o
> > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
> >

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Monday, 10 March 2014 07:47:05 UTC