- From: Martynas Jusevièius <martynas@graphity.org>
- Date: Sun, 9 Mar 2014 22:12:33 +0100
- To: "Timothy W. Cook" <tim@mlhim.org>
- Cc: semantic-web <semantic-web@w3.org>, Michael Brunnbauer <brunni@netestate.de>
- Message-ID: <CAE35Vmzok-q=3o0ZitF7bENPWPYC3jaxMuOm_5RNMrNxaBA3HA@mail.gmail.com>
Hey all, Regarding RDF validation - I guess you all know about SPIN constraints, right? They're SPARQL-based. http://spinrdf.org/spin.html#spin-constraints Martynas graphityhq.com On Mar 9, 2014 10:03 PM, "Timothy W. Cook" <tim@mlhim.org> wrote: > On Sun, Mar 9, 2014 at 11:48 AM, Michael Brunnbauer <brunni@netestate.de>wrote: > >> >> Hello Timothy, >> >> MLHIM seems to be annotated data models - with optional RDF annotations. >> >> Somewhat, but the models are are restrictions of a common reference > model. Each model represents a concept that is as broad or narrow as the > modeller chooses. The annotations must be optional. It is up to the > domain experts/knowledge modellers to determine the resultant quality. > > > > >> The claims regarding interoperability and semantics are a bit >> exaggerated, IMO. >> >> > > I suppose your opinion will change when you decide to put some study into > the matter. > > > >> If we had something like annotated portable RDB schemas, would they carry >> less >> meaning and would applications built with them be less interoperable than >> with >> MLHIM? >> >> > If you were able to share those concept models between applications and > they were restrictions of a common reference model; then yes they would be > the same. > > > >> In order to make applications completely interoperable and remove all >> implicit semantics from their code, you have to abolish them - replacing >> them >> with some standard component. This is probably as futile as the >> ontology/data >> model to rule them all. >> > > Further study will show that there are paths to operate along in the > interim. But yes, the eventual goal would be for a common healthcare > reference model. > > >> >> I agree that the proposition of XML Schema is alluring: The information >> about >> the data model used and how to validate the data is always present and the >> tools for validation are already there. >> >> You did not use RDF because it has no standard way to do this - which is >> unfortunate. >> > > It is unfortunate. After working with the openEHR Foundation on > multi-level modelling for a decade using a domain specific language it was > an easy realization that a relatively small group of people could not > create high quality tools needed for a DSL; in any reasonable amount of > time. > I began looking for alternatives. OWL and RDF would be my first choices > for implementation. They just weren't and still aren't mature enough to do > everything needed. Remember as I stated before; the MLHIM reference model > is a conceptual information model. I choose XML because I did not see > anything with that capability and widespread adoption. I knew very little > about XML Schema prior to this. So I did not choose it because it was my > hammer already. I spent a lot of time on a lang learning curve and had to > wait for tools to catch up to XML Schema 1.1 > >> >> You could have created a way and tools to do this in RDF. Did you fear the >> necessary effort or the risk to adoption? >> > > (see above) > Given, time talent and money; openEHR could do it with the Archetype > Definition Language. But it would never be as ubiquitous as XML. > > >> It seems that XML Schema allows vocabulary reuse down to the >> property/attribute >> level - but the temptation to create own terms instead of reusing others >> seems >> to be greater than with RDF. Having some of the semantics in the XML >> Schema >> layer and more of it in the RDF layer on top of it definitely is a >> drawback. >> >> > There may be other/additional approaches that may help improve MLHIM. I > am certainly open to and welcome dialog about it. The specifications (such > that they are at this point) are openly available under a Creative Commons > license. Feel free to join the discussion on social media (Google Plus > preferred). > > > >> How many implementors will just ignore the optional RDF layer? >> > > You must realize that software developers do not have control of the > models in this approach. Domain experts that understand a little bit of > how to use the CCD-Gen are the ones responsible for building the models. > In the process of teaching them this activity, they are also taught the > importance of the quality of their models and it ultimately decides the > quality of their data. > > The MLHIM eco-system allows for closed loop concept models( CCDs) to be > developed as well as openly licensed CCDs. There may eventually be 10,000 > blood pressure CCDs in the open. But like most things, we predict that > most people will reuse a model that is good and openly available, instead > of building their own. > > I can't decide for the experts nor do I want to control what is or is not > a good model for any particular implementation. All I can do is offer them > a real solution that is bottom up and under their control instead of slow > moving international standards bodies that can't keep up with the changing > science. > > Thanks for your feedback. Explaining MLHIM in words is always a learning > experience for me. > > Regards, > Tim > > > > > >> >> Regards, >> >> Michael Brunnbauer >> >> On Sat, Mar 08, 2014 at 06:36:54PM -0300, Timothy W. Cook wrote: >> > A very interesting and I think, foundational discussion. David, thanks >> for >> > bringing it up. >> > Below is a discussion of why I believe that RDF should be considered a >> > layer over data models or maybe as 'semantic glue'. >> > >> > David, we are working on the same type of problem but from slightly >> > different perspectives. The presentation that you linked to >> re:KnowMED, is >> > very important and I recall seeing it before. I'll take this >> opportunity >> > to comment on it since it is in the context of this discussion. The >> > indicates that you propse RDF as a language to be used in the exchange >> of >> > healthcare data. Then on slide #5 you say it isn't enough to 'get us >> > there'. So I am not sure how much of this is marketing swagger and how >> > much is hard fact. >> > >> > On slide #8 item #2 we are 100% in agreement. But then on slide #9 you >> > are mixing apples and oranges. XML and RDF have two different purposes >> > that work well together. >> > >> > On further slides, your Blue, Green and Red customers exactly indicate >> > what I mean by RDF being an essential layer on top of multiple models. >> > >> > What happens further in the presentation is where we disagree. You >> assert >> > that RDF should be the language used to actually 'exchange' data. This >> > where RDF and the tools around it (AFAIK) are not mature enough to >> perform. >> > Several times you have mentioned 'semantics and not syntax'. This is a >> > huge mistake. You must have both in order to insure data quality and >> > meaning. Secondly we know from history that top-down consensus in >> > healthcare concept modelling is an impossibility.[1] >> > >> > In your post describing the BP screenshot you said: >> > "Thus, although ex1:bp_023 and ex2:bp409 capture the same blood >> pressure >> > information, they represent that information differently. Nonetheless, >> > both representations can peacefully coexist in the same merged RDF data >> > without conflict, which might happen, for example, if one is derived >> from >> > the other through inference." >> > I take this to mean that you are representing the exact same BP >> measurement >> > data in two different ways? Your use case, 'by inference' is a little >> > fuzzy for me. If it is derivation by inference, it will just be an in >> > memory representation and not persisted; correct? Irregardless, the >> > existence of the same data instance, in the same application is in >> complete >> > contradiction to good data quality management. As you go on to explain, >> > now you must add application intelligence to analyze whether or not two >> > data instances are the same or not to avoid counting them as two >> separate >> > instances. This is approach is very dangerous, in addition to adding >> > complexity and cost to the applications. However, having the ability >> to >> > determine if two different data instances exactly match the same >> concept is >> > essential. Minor differences such as the position of the patient >> (stitting >> > or prone) or the type of instrument used to perform the measurement or >> the >> > location on the body (left upper arm or right thigh, etc.) that the >> > measurement was taken are all important. They may or may not rule in or >> > out specific measurements, based on the intended use of the query >> results. >> > This is where RDF is essential, do these two instances point to exactly >> > the same code in a controlled vocabulary, etc.? These questions are >> > essential to having the ability to perform machine based reasoning over >> the >> > data repository; whether at the point of care or for research purposes. >> > >> > Refering back for a moment, to 'the same data instance' situation. It >> is >> > essential to have additional information (meta-data) to determine if two >> > instances are are exactly the same. This can legitimately occur during >> > aggregation for research or systemic quality analysis. Unique patient >> > identifiers along with datetime stamps are ideal. However, the patient >> > identifier issue is an ongoing problem that is actually implementation >> > context and application specific. It is outside of the context of data >> > quality and management. >> > >> > Slide #22 clearly indicates that there is an expectation that RDF is >> used >> > as a common format. However, as I said earlier, the current >> implementation >> > of RDF is not robust enough to perform this function, UNLESS, there is a >> > global expert consensus on all healthcare concepts so that models may be >> > created and distributed from a central authority. This is simply >> > unrealistic as history has shown and is formalized in the Cavalini-Cook >> > theory [1]. >> > >> > The reason that I state that RDF is not capable, at this point of >> maturity, >> > is that it doesn't support the ability to represent syntactic >> structures in >> > a multi-level model environment. IOW: There is no ability (AFAIK) to >> > express a common reference model and then derive concepts models that >> issue >> > further constraints. A multi-level model approach is essential in >> order to >> > abstract the syntax and semantics of each concept out of the application >> > source code and repository schemas so that they can be shared between >> > disparate applications. This is what provides for full syntactic and >> > semantic interoperability. >> > >> > A multi-level model approach may or may not be useful in many domains. >> > Specifically, human engineered domains that we fully understand can be >> > modeled as one level representations. However, biological domains that >> > involve evolutionary complexity are quite different. Primarily because >> we >> > do not fully understand them so our science and understanding is >> constantly >> > changing. Additionally, it appears that the data has a much longer >> > lifetime of significance than other domains. Therefore the data should >> be >> > initially captured and represented in a manner that makes it as future >> > proof and reusable as possible. In healthcare, the most semantically >> rich >> > point of any information is at the point of care. Every point of >> > transition/translation after that will most assuredly lose context. As >> a >> > brief example; reference ranges for conditions change over time. It is >> > essential that data captured today be expressed in the context of >> today's >> > knowledge, even 20 or more years from now. The concept model around >> high >> > blood pressure is different than it was 10 years ago. >> > >> > Where RDF shines is that in a syntactic model of a concept designed to >> > capture reference ranges and other metadata, it can be used to provide >> > external semantic context to that model. Whether that context exists >> in a >> > controlled vocabulary or even free text documents such as clinical >> > guidelines. >> > >> > In the Multi-Level Healthcare Information Modelling (MLHIM) approach we >> > developed a conceptual reference model to provide a basis for software >> > implementations. While the MLHIM model doesn't preclude other >> > serializations, we found that XML Schema 1.1 does provide the >> prerequisites >> > for implementation both a reference model and concepts models. This >> means >> > that we can have full validation of instance data back to the W3C >> > specifications. By marking up the concept models (XML Schema 1.1 >> > annotations) with RDF providing the computable semantic links for each >> > model as defined by the modeller. These models can now be created by >> > domain experts (with additional knowledge modelling training) so that >> > software developers do not have to interpret the meanings. >> > >> > The concept models are now fully detached from any specific >> implementation >> > and can be shared to use for validating instance data in the context in >> > which it was recorded. I believe that this is the closest we have to >> > semantic interoperability, to date. I am of course open for discussion >> and >> > debate on the issue. I used the acronym 'AFAIK' a few times above. I >> used >> > this because my last serious attempt to use RDF for this purpose was in >> > 2010/2011. I know that there is a continuous maturing process going >> on. I >> > believe that there may come a day when RDF and OWL can be used >> exclusively >> > for syntactic and semantic representation and reasoning. But AFAIK, not >> > today. >> > >> > We have a significant number of peer-reviewed publications about MLHIM >> and >> > academic as well as other implementations. I am happy to share those >> with >> > the group or you may peruse the links in my signature line as well as >> > www.mlhim.org and the specs are openly downloadable from here[2] as a >> > package and as source from here [3]. >> > >> > We also have almost 2000 datatypes converted from other modeling >> > approaches (such as the NIH CDE browser and HL7 FHIR) into reusable >> > complexTypes to be used in concept models. You can review those as >> well as >> > download some example concept models from here[4]. Free registration is >> > required to download the models. >> > >> > Kind Regards, >> > Tim >> > >> > >> > [1] >> > >> https://github.com/mlhim/specs/blob/2_4_3/graphics/cavalini_cook_theory.png >> > [2] >> > >> https://launchpad.net/mlhim-specs/2.0/2.4.3/+download/mlhim-specs-2013-10-15-2.4.3-Release.zip >> > [3] https://github.com/mlhim/ >> > [4] http://www.ccdgen.com >> > >> > >> > >> > >> > On Fri, Mar 7, 2014 at 5:00 PM, David Booth <david@dbooth.org> wrote: >> > >> > > Hi Alan, >> > > >> > > >> > > On 03/07/2014 12:44 PM, Alan Ruttenberg wrote: >> > > >> > >> Can you explain what you mean by "RDF's ability to allow multiple >> data >> > >> models to peacefully coexist, interconnected, in the same data" ? >> > >> >> > > >> > > Yes. Here is an imprecise illustration, on slides 10-17: >> > > >> http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf >> > > (I took some artistic liberties blurring class/instance distinctions >> in >> > > that diagram.) >> > > >> > > And here is a more precise example that cleanly distinguishes classes >> from >> > > instances: >> > > http://tinyurl.com/pzsgf7f >> > > (I've also attached the same illustration, for offline readers.) >> > > >> > > In this latter example (of a hypothetical systolic blood pressure >> > > measurement), the same information is represented according to two >> > > different models/schemas/vocabularies/ontologies, v1 (green) and v2 >> > > (red). (I am using the terms model, schema, vocabulary and ontology >> > > loosely and somewhat interchangeably here.) >> > > >> > > In the v1 model, the systolic blood pressure is indicated in RDF like >> this: >> > > >> > > ex:patient319 foaf:name "John Doe" ; >> > > v1:bps ex1:bp_023 . >> > > >> > > ex1:bp_023 a v1:SystolicBPSitting_mmHg ; >> > > v1:value 120 . >> > > >> > > Whereas in the v2 model, the same information is represented >> differently, >> > > in RDF like this: >> > > >> > > ex:patient319 foaf:name "John Doe" ; >> > > v2:bps ex2:bp_409 . >> > > >> > > ex2:bp_409 a v2:SystolicBP ; >> > > v2:pressure 120 ; >> > > v2:units v2:mmHg ; >> > > v2:bodyPosition v2:sitting . >> > > >> > > Thus, although ex1:bp_023 and ex2:bp409 capture the same blood >> pressure >> > > information, they represent that information differently. >> Nonetheless, >> > > both representations can peacefully coexist in the same merged RDF >> data >> > > without conflict, which might happen, for example, if one is derived >> from >> > > the other through inference. >> > > >> > > Furthermore, the relationship between these classes, >> > > v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the >> relationship >> > > between the corresponding v1 and v2 instance data, can also be >> explicitly >> > > captured in RDF, as the v1v2:SystolicBP_Transform (yellow) >> relationship: >> > > >> > > v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform v2:SystolicBP . >> > > >> > > Inference rules for v1v2:SystolicBP_Transform could therefore convert >> a >> > > v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP measurement >> or >> > > vice versa. >> > > >> > > This example only illustrated the case where the transformation from >> one >> > > model to the other is lossless and thus reversible. Usually that >> isn't the >> > > case. Relating models and transforming between them is *not* easy, >> but at >> > > least RDF makes it possible to explicitly indicate these >> relationships. >> > > >> > > Obviously some intelligence must be exercised to avoid, for example, >> > > accidentally thinking that ex:bp_023 and ex2:bp_409 represent two >> distinct >> > > blood pressure measurements, and thereby double counting them, but >> that's >> > > easy enough to do. >> > > >> > > Also, there isn't always a desire to relate or transform between >> models. >> > > Sometimes some data is related and other data is not, and it is all >> still >> > > merged into the same RDF graph. In fact, the point may be to connect >> that >> > > part of the data that *is* related and let the rest coexist without >> being >> > > connected (or at least not *directly* connected). >> > > >> > > The point is that these data models can peacefully coexist in RDF data >> > > without conflict: applications using the v1 model against the merged >> data >> > > might only see v1 instance data, whereas applications using the v2 >> model >> > > might only see the v2 data. That's qualitatively different than in >> the >> > > world of XML, for example, where one schema generally wants to be "on >> top", >> > > and when you merge XML of different schemas, you need to create a new >> "top" >> > > schema. That is the difference that I have so often tried to explain >> to >> > > people outside the RDF community, and what I am trying to capture >> > > succinctly in a term or phrase. It isn't an easy idea to convey to >> those >> > > who are accustomed to a schema-centric approach. I think a catchy but >> > > descriptive term or phrase could help. >> > > >> > > Thanks, >> > > David >> > > >> > > >> > >> -Alan >> > >> >> > >> >> > >> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org >> > >> <mailto:david@dbooth.org>> wrote: >> > >> >> > >> I -- and I'm sure many others -- have struggled for years trying >> to >> > >> succinctly describe RDF's ability to allow multiple data models >> to >> > >> peacefully coexist, interconnected, in the same data. For data >> > >> integration, this is a key strength of RDF that distinguishes it >> > >> from other information representation languages such as XML. I >> > >> have tried various terms over the years -- most recently "schema >> > >> promiscuous" -- but have not yet found one that I think really >> nails >> > >> it, so I would love to get other people's thoughts. >> > >> >> > >> This google doc lists several candidate terms, some pros and >> cons, >> > >> and allows you to indicate which ones you like best: >> > >> http://goo.gl/zrXQgj >> > >> >> > >> Please have a look and indicate your favorite(s). You may also >> add >> > >> more ideas and comments to it. The document can be edited by >> anyone >> > >> with the URL. >> > >> >> > >> Thanks! >> > >> David Booth >> > >> >> > >> >> > >> >> > >> > >> > -- >> > MLHIM VIP Signup: http://goo.gl/22B0U >> > ============================================ >> > Timothy Cook, MSc +55 21 994711995 >> > MLHIM http://www.mlhim.org >> > Like Us on FB: https://www.facebook.com/mlhim2 >> > Circle us on G+: http://goo.gl/44EV5 >> > Google Scholar: http://goo.gl/MMZ1o >> > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook >> >> -- >> ++ Michael Brunnbauer >> ++ netEstate GmbH >> ++ Geisenhausener Straße 11a >> ++ 81379 München >> ++ Tel +49 89 32 19 77 80 >> ++ Fax +49 89 32 19 77 89 >> ++ E-Mail brunni@netestate.de >> ++ http://www.netestate.de/ >> ++ >> ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) >> ++ USt-IdNr. DE221033342 >> ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer >> ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel >> > > > > -- > MLHIM VIP Signup: http://goo.gl/22B0U > ============================================ > Timothy Cook, MSc +55 21 994711995 > MLHIM http://www.mlhim.org > Like Us on FB: https://www.facebook.com/mlhim2 > Circle us on G+: http://goo.gl/44EV5 > Google Scholar: http://goo.gl/MMZ1o > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook >
Received on Sunday, 9 March 2014 21:13:02 UTC