- From: conor dowling <conor-dowling@caregraf.com>
- Date: Sun, 28 Aug 2011 10:26:57 -0700
- To: "M. Scott Marshall" <mscottmarshall@gmail.com>
- Cc: Michael Miller <Michael.Miller@systemsbiology.org>, "Hau, Dave (NIH/NCI) [E]" <haudt@mail.nih.gov>, Jim McCusker <james.mccusker@yale.edu>, John Madden <john.madden@duke.edu>, public-semweb-lifesci@w3.org
- Message-ID: <CALfFB18LosAZhVNtno8Hj2RYYfiYdTsdATxR3BN7C2BNuGQVXw@mail.gmail.com>
Scott, Dave Hau wrote: > "EHRs make great data warehouses for phenotype mining, for correlating > with genotype. I think there are a lot of incentives for people to work > together." > > Conor Dowling wrote: > To me, all of this information goes into one "soup" - in linked data, you > have *one big graph of* medical expression. I don't see the point in > separate *media* for "statements about conditions" and "statements about > condition types". > > I think that you guys are getting to the crux of the matter. Linked data > can help to refine molecular medicine as it's applied in the clinic. That > fusion will help to redefine medicine as 'personalized'. > and this is the goal right? We're not just moving bits around from one patient data silo to another. Any data exported needs to link into a meaningful concept scheme. Linked-data makes it easy to test "linkage": if all you've got is patient data in terms of local codes then it won't link anywhere. Speaking of meaningful: sad thing now is that the US effort to export patient data from clinical data silo's ("meaningful use") is largely meaningless: you get paid to export "unlinked"/local-only data. There was a lot of lobbying to allow as-is patient data exports count as "meaningful", effectively to allow data-dumps pass as interpretable patient records (I wrote a bit on this around "Dr Seuss passes meaningful use": here<http://www.caregraf.com/blog/dr-seuss-passes-meaningful-use>and translate nothing at all<http://www.caregraf.com/blog/dr-seuss-translates-nothing-at-all> ). So when you hear hospital X successfully passed "meaningful use", that their patient data is available beyond their walls, that this represents a brave new world ... it means nothing if your need is clinical data linked to standard schemes for full analysis. Even if you're allowed to access it, the export/data-dump is academic. Shame is, many people think they're doing something meaningful. Conor > -- > M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls > http://staff.science.uva.nl/~marshall > > On Sat, Aug 27, 2011 at 6:42 AM, conor dowling <conor-dowling@caregraf.com > > wrote: > >> >> "I think a SNOMED capable DAM should limit the coordination allowed." >>> >>> >>> >>> ... using SNOMED as your terminology is an implementation detail. >>> >> >> Michael, >> >> one problem with leaving it to implementation is the variety allowed in a >> concept scheme like SNOMED. Take a disorder like Hypercholesterolemia<http://datasets.caregraf.org/snomed#!13644009>: >> and a patient record with ... >> >> :finding snomed:13644009 # Hypercholesterolemia >> >> another description of the same thing has ... >> >> :finding snomed:166830008 # Serum cholesterol raised >> >> which is effectively equivalent. The "bridge" is ... >> >> snomed:13644009 snomed:363705008 snomed:166830008 (More >> here <http://www.caregraf.com/blog/the-problem-with-picking-problems>) >> # *Hypercholesterolemia* *has definitional manifestation* *Serum >> cholesterol raised*. >> >> the question is where the bridge goes. Is "has definitional manifestation" >> defined consistently with the predicate "finding" or is it part of a >> completely separate concept model and never bought into play by one >> application? >> >> To me, all of this information goes into one "soup" - in linked data, you >> have *one big graph of* medical expression. I don't see the point in >> separate *media* for "statements about conditions" and "statements about >> condition types". >> >> If in practice - well it's recommended - patient records use SNOMED then >> tying down that expression should be front and center of any clinical-care >> modeling effort. To be useful and implementable, we can't say "use any >> scheme you want" because that's equivalent to saying "you can only do >> trivial reasoning on this information". >> >> Conor >> >> >>> >>> >>> *From:* conor dowling [mailto:conor-dowling@caregraf.com] >>> *Sent:* Wednesday, August 24, 2011 3:26 PM >>> >>> *To:* Hau, Dave (NIH/NCI) [E] >>> *Cc:* Michael Miller; Jim McCusker; John Madden; >>> public-semweb-lifesci@w3.org >>> >>> *Subject:* Re: A Fresh Look Proposal (HL7) >>> >>> >>> >>> DAM: it's good to have a name. Were OWL to be used for them and then >>> other forms derived from that, you'd get the best of both worlds - get into >>> Semantics and move on. >>> >>> >>> >>> One other nuance to throw in for the "model-terminology" match up. SNOMED >>> raises a concern about the degree of "concept coordination" you should or >>> should not do, about what load the terminology should take and what should >>> be left to the model. A simple example is do you allow "disorder: allergy to >>> strawberry" or do you make the model carry "disorder: allergy + allergin: >>> strawberry" or do you allow both expressions? (see: >>> http://www.caregraf.com/blog/there-once-was-a-strawberry-allergy) >>> >>> >>> >>> I think a SNOMED capable DAM should limit the coordination allowed. It >>> should make the model carry qualifiers for severity, for progression, for >>> allergin ... To use it, you would need to normalize these "adjectives" out >>> of any concept. >>> >>> >>> >>> I suppose what I'm saying is that any useful DAM should severely limit >>> alternatives, in a way that goes beyond simple enumerations of permitted >>> values and the nice thing about concept schemes like SNOMED is that this >>> shouldn't be hard to do - crudely in SNOMED it would mean only allowing >>> primitive concepts, the atoms from which compound concepts are made. >>> >>> >>> >>> BTW, this doesn't effect what a doctor sees on a screen - it's a matter >>> of what expressions to use for interoperability. The two issues need to be >>> strictly separated and right now, if you look at how CCDs are viewed, >>> they're thick as thieves, >>> >>> >>> >>> Conor >>> >>> On Wed, Aug 24, 2011 at 2:49 PM, Hau, Dave (NIH/NCI) [E] < >>> haudt@mail.nih.gov> wrote: >>> >>> > the kind of reasoning, i think, that you want to do, conor, would run >>> on top of the information in the HL7 v3 formatted documents to take >>> advantage of, among other things, the linked data cloud. >>> >>> >>> >>> Agree. Earlier there was a discussion in HL7 on their Domain Analysis >>> Model (DAM) effort - what exactly is a DAM and what it's supposed to do. I >>> think one possible approach would be to consider these DAMs as ontologies >>> (i.e. conceptual models, knowledge), use OWL in the normative version of >>> these DAMs, then to develop UML models and XSDs from the DAMs to use in >>> applications. The DAMs can be harmonized with other domain ontologies out >>> there, and promoted for global adoption. The UML models can be encouraged >>> but not as strictly enforced, while alternatively allowing people to use RDF >>> to tie data directly to concepts in the ontologies / DAMs. >>> >>> >>> >>> - Dave >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *From:* Michael Miller [mailto:Michael.Miller@systemsbiology.org] >>> *Sent:* Wednesday, August 24, 2011 11:12 AM >>> *To:* conor dowling; Hau, Dave (NIH/NCI) [E] >>> >>> >>> *Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org >>> >>> *Subject:* RE: A Fresh Look Proposal (HL7) >>> >>> >>> >>> hi all, >>> >>> >>> >>> john, very well laid out argument in your email and what i've found in >>> practice (and didn't think that consciously about until i read your email). >>> >>> >>> >>> conor, i agree with your points. but i find it interesting that OWL is >>> expressed as XML for communication reasons. XML has become pretty much the >>> de facto standard for 'trading' information. it's how MAGE-ML was used by >>> the gene expression application i worked on at Rosetta to do import and >>> export. but the storage and presentation of the information was certainly >>> not XML, the analysis of the data would take forever. the trick is to make >>> very clear what the extra semantics are and that is well understood for OWL >>> as XML. when someone wants to use an ontology they've received as an XML >>> document, the first thing to do is transform the information in the XML so >>> that the logic can be run easily (this gets back to john's points) >>> >>> >>> >>> one thing the clinical genomics group has talked about is that with the >>> HL7 specs expressed in XML, the important part is that canonical validation >>> applications are written that verify whether a document is conformant with >>> the additional semantics plus provide boiler plate examples. this allows >>> the developers not to read the docs too closely but understand when they've >>> done something wrong! (not ideal but works, that's why OWL in XML works, >>> there's a great body of tools) >>> >>> >>> >>> (from dave) >>> >>> >>> >>> "One way would be as Michael suggested, to use ODM for mapping UML to >>> OWL. But is this mapping to OWL full or to a more computable dialect of >>> OWL? And would there be notions in UML that are not expressible in OWL and >>> vice versa? Should we maintain both the UML model and the OWL ontology as >>> normative, or one of the two, and if so, which one?" >>> >>> >>> >>> i think where things get dicey is in the business/logic (there's a good >>> discussion in the spec), so it is probably to a more computable dialect of >>> OWL. but in practice, the type of information that needs to be 'traded' by >>> HL7 specs tends to be straight-forward information with the controlled >>> vocabularies contributing the extra semantics of how a particular code >>> relates to the patient and the report in the document and also connects out >>> to the larger world. one thing the clinical genomics group has tried to do >>> is leave open what controlled vocabulary to use (this is something that i >>> think MAGE-OM was one of the first to get right). normally LOINC is >>> recommended but, in the genomics world it is true things become out of date >>> so to get the right term may require a new CV. the kind of reasoning, i >>> think, that you want to do, conor, would run on top of the information in >>> the HL7 v3 formatted documents to take advantage of, among other things, the >>> linked data cloud. >>> >>> >>> >>> so i guess what i'm saying here is that using XML as the language of >>> interchange is not a bad thing but that it is expected, and this needs to be >>> made clear, that the XML is almost certainly not the best storage mechanism >>> for the data. >>> >>> >>> >>> cheers, >>> >>> michael >>> >>> >>> >>> *From:* public-semweb-lifesci-request@w3.org [mailto: >>> public-semweb-lifesci-request@w3.org] *On Behalf Of *conor dowling >>> *Sent:* Tuesday, August 23, 2011 5:22 PM >>> *To:* Hau, Dave (NIH/NCI) [E] >>> *Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org >>> *Subject:* Re: A Fresh Look Proposal (HL7) >>> >>> >>> >>> So Conor if I understand you correctly, you're saying that the current >>> gap that should be addressed in Fresh Look is that the current HL7 v3 models >>> are not specified in a language that can be used for reasoning, i.e. they >>> are not OWL ontologies, otherwise publishing value sets would not be >>> necessary because the reasoning could determine whether a particular value >>> (i.e. "object" in your email) would be valid for a particular observation >>> (i.e. "verb). Is that what you're saying? >>> >>> >>> >>> Dave, >>> >>> >>> >>> exactly - that the patient information model and any recommended >>> terminologies be defined in the same medium and that the medium be capable >>> of capturing permitted ranges, appropriate domains etc. for all predicates: >>> I think a flavor of OWL with a closed-world assumption is the only real game >>> in town but ... >>> >>> >>> >>> One goal (always easier to agree on goals than technologies!) is that an >>> "allergic to allergy" misstep wouldn't happen - there would be no need to >>> read guidance and coders don't read! A meaningful use test would assert >>> permitted ranges (ex/ allergin class: >>> http://datasets.caregraf.org/snomed#!406455002 for a property >>> "allergin"). >>> >>> >>> >>> Of course, 'correctness' isn't the only goal or result: transforming >>> between equivalent expressions supported by model+terminology should be >>> possible and promoted (take: >>> http://www.caregraf.com/blog/good-son-jones-diabetic-ma ). And then >>> there's the direct path to decision-support which you mention above. >>> >>> >>> >>> The focus on enforcing syntactic correctness would fade away and the >>> model specifier's demand for greater precision from terminologies should >>> drive improvements there. This is far from new: some HL7 and SNOMED >>> documents identify the need to marry model and terminology but go no >>> further. >>> >>> >>> >>> I think the current meaningful-use CCD has six areas - allergies, >>> problems, procedures ... It would be interesting to try one or two, say look >>> at Kaiser's problem subset from SNOMED and see how a HL7-based OWL patient >>> model and that could work together. There are a lot of pieces in the wild >>> now: they just need a forum to play in. >>> >>> >>> >>> One last thing, slightly off the thread but still on topic I think. I >>> don't see any reason to mix up "human readable" and "machine processable". >>> One possibility for a patient model update, one that bypasses the need for >>> buy-in by everyone, irrespective of use case, is to call out the need for a >>> model of description purely for machine processing, one without the "we'll >>> XSLT the patient record in the doctor's browser". While the current >>> standards lead to human-readable data-dumps, a stricter parallel track could >>> take the best of current standards and re-state them in OWL to deliver >>> machine-processable health data exchange, >>> >>> >>> >>> Conor >>> >>> >>> >>> >>> >>> I agree OWL ontologies are useful in health informatics because reasoning >>> can be used for better validation, decision support etc. I'm wondering, is >>> there a need for both a UML type modeling language and OWL (or other >>> logic-based language) to be used simultaneously? If so, how? Should OWL be >>> used for representing knowledge, and UML be used for representing >>> application models? >>> >>> >>> >>> One way would be as Michael suggested, to use ODM for mapping UML to >>> OWL. But is this mapping to OWL full or to a more computable dialect of >>> OWL? And would there be notions in UML that are not expressible in OWL and >>> vice versa? Should we maintain both the UML model and the OWL ontology as >>> normative, or one of the two, and if so, which one? >>> >>> >>> >>> - Dave >>> >>> >>> >>> ps. Michael, nice meeting you at the caBIG F2F too! >>> >>> >>> >>> >>> >>> >>> >>> *From:* conor dowling [mailto:conor-dowling@caregraf.com] >>> *Sent:* Monday, August 22, 2011 12:28 PM >>> *To:* John Madden >>> *Cc:* Jim McCusker; Hau, Dave (NIH/NCI) [E]; >>> public-semweb-lifesci@w3.org >>> *Subject:* Re: A Fresh Look Proposal (HL7) >>> >>> >>> >>> >> for each tool-chain, there are some kinds of content that are natural >>> and easy to express, and other kinds of content that are difficult and >>> imperspicuous to express >>> >>> >>> >>> it's the old "medium is the message" and as you say John, it's somewhat >>> unavoidable, But this connection doesn't imply all media are equally >>> expressive. >>> >>> >>> >>> Making XSD/XML the focus for definition rather than seeing it as just one >>> end-of-a-road serialization is limiting because as a medium, it puts the >>> focus on syntax, not semantics. That can't be said of OWL/SKOS/RDFS ... >>> >>> >>> >>> By way of example: you could have a patient data ontology, one that works >>> with a KOS like SNOMED and if an implementor likes XML, there's nothing to >>> stop ... >>> >>> >>> >>> RDF (turtle) conformant to ontologies/KOS --> RDF/XML ---- >>> XSLT ----> CCD (ex) >>> >>> >>> >>> as a chain. It's trivial. But if you start with lot's of XSL, well you >>> get only what that medium permits and promotes, which is a focus on syntax, >>> on the presence or absence of fields, as opposed to guidance on the correct >>> concept to use with this or that verb. >>> >>> >>> >>> Of course, a verb-object split is comfortable because those building >>> information models can work independently of those creating terminologies >>> but is such separation a good thing? Now, were both to work in a common >>> medium then inevitably ... >>> >>> >>> >>> Conor >>> >>> >>> >>> p.s. the public release by Kaiser of their subsets of SNOMED (CMT) is the >>> kind of thing that will make that KOS more practical. Now what's needed is >>> tighter definition of the model to use with that and similar sub schemes. >>> >>> On Mon, Aug 22, 2011 at 9:03 AM, John Madden <john.madden@duke.edu> >>> wrote: >>> >>> I agree 95% with Jim and Conor. >>> >>> >>> >>> My 5% reservation is that for each tool-chain, there are some kinds of >>> content that are natural and easy to express, and other kinds of content >>> that are difficult and imperspicuous to express (is that a word?). >>> >>> >>> >>> Even this is not in itself a problem, except that it tends to make >>> architects favor some kinds of conceptualization and shun other kinds of >>> conceptualization, not on the merits, but because that's what's easy to >>> express in the given tool. >>> >>> >>> >>> For example, the fact that the decision was made to serialize all >>> RIM-based artifacts as XSD-valid XML meant that hierarchical modeling rather >>> than directed-graph modeling tended to be used in practice. (Even though the >>> RIM expressed as a Visio model has more in common with a directed-graph.) It >>> meant that derivation by restriction was made the favored extensibility >>> mechanism. >>> >>> >>> >>> These choices may not have been the most auspicious for the kind of >>> conceptualizations that needed to be expressed. None of these things are >>> "necessary" consequences of using XSD-valid XML as your language Rather, >>> they are the results that you tend to get in practice because the tool has >>> so much influence on the style that ends up, in practice, being >>> used. (id/idref//key/keyrefs are certainly part of XSD/XML, and make it >>> possible to express non-hierarchical relations, but where in any HL7 >>> artifact do you ever see key/keyref being used?? SImilarly, it is possible >>> to derive by extension in XSD, but the spec makes it less easy than deriving >>> by restriction). >>> >>> >>> >>> Or again, the fact that OIDs rather than http URLs were chosen as the >>> identifier of choice isn't in any way dispositive of whether you will be >>> tend to architect with RPC or REST principles in mind. (OIDs and http URLs >>> are actually quite interconvertible.) But I'd argue that if you are a person >>> who tends to think using http URLs, you'll more likely gravitate to REST >>> solutions out of the gate. >>> >>> >>> >>> So, I agree, what's important is the deep content, not the choice of >>> serialization of that content. But a bad serialization choice, coupled with >>> bad tools, can leave architects wandering in the wilderness for a long time. >>> So long, sometimes, that they lose track of what the deep conceptualization >>> was supposed to have been in the first place. >>> >>> >>> >>> >>> >>> >>> >>> On Aug 22, 2011, at 9:39 AM, Jim McCusker wrote: >>> >>> >>> >>> I was just crafting a mail about how our investment in XML technologies >>> hasn't paid off when this came in. What he said. :-) >>> >>> On Mon, Aug 22, 2011 at 9:33 AM, conor dowling < >>> conor-dowling@caregraf.com> wrote: >>> >>> >> The content matters, the format does not. >>> >>> >>> >>> should be front and center. Talk of XML that or JSON this, of RDF as XML >>> in a chain is a distraction - it's just plumbing. There are many tool-chains >>> and implementors are big boys - they can graze the buffet themselves. >>> >>> >>> >>> Central to any patient model rework (I hope) would be the interplay of >>> formal specifications for terminologies like SNOMED along with any patient >>> data information model. What should go in the terminology concept (the >>> "object" in RDF terms) - what is left in the model (the "predicate"). Right >>> now, this interplay is woefully under specified and implementors throw just >>> about any old concept into "appropriate" slots in RIM (I know this from >>> doing meaningful use tests: >>> http://www.caregraf.com/blog/being-allergic-to-allergies, >>> http://www.caregraf.com/blog/there-once-was-a-strawberry-allergy ) BTW, >>> if SNOMED is the terminology of choice (for most) then the dance of it and >>> any RIM-2 should drive much of RIM-2's form. >>> >>> >>> >>> This is a chance to get away from a fixation on formats/plumbing/"the >>> trucks for data" and focus on content and in that focus to consider every >>> aspect of expression, not just the verbs (RIM) or the objects (SNOMED) but >>> both. >>> >>> >>> >>> Back to "forget the plumbing": if you want to publish a patient's data as >>> an RDF graph or relational tables or you want a "document" to send on a >>> wire, if you want to query with a custom protocol or use SPARQL or SQL, you >>> should be able to and not be seen as an outlier. Each can be reduced to >>> equivalents in other formats for particular interoperability. The problem >>> right now is that so much time is spent talking about these containers and >>> working between them and too little time is given over to what they contain, >>> >>> >>> >>> Conor >>> >>> >>> >>> On Mon, Aug 22, 2011 at 6:01 AM, Hau, Dave (NIH/NCI) [E] < >>> haudt@mail.nih.gov> wrote: >>> >>> I see what you're saying and I agree. >>> >>> >>> >>> The appeal of XML (i.e. XML used with an XSD representing model >>> syntactics, not XML used as a serialization as in RDF/XML) is due in part >>> to: >>> >>> >>> >>> - XML schema validation API is available on virtually all platforms e.g. >>> Java, Javascript, Google Web Toolkit, Android etc. >>> >>> - XML schema validation is relatively lightweight computationally. >>> Pellet ICV and similar mechanisms are more complete in their validation with >>> the model, but much more computationally expensive unless you restrict >>> yourself to a small subset of OWL which then limits the expressiveness of >>> the modeling language. >>> >>> - XML provides a convenient bridge from models such as OWL to relational >>> databases e.g. via JAXB or Castor to Java objects to Hibernate to any RDB. >>> >>> - Relational querying and XML manipulation skills are much more plentiful >>> in the market than SPARQL skills currently. >>> >>> - Some of the current HL7 artifacts are expressed in XSD format, such as >>> their datatypes (ISO 21090 ; although there are alternative representations >>> such as UML, and there is an abstract spec too from HL7). If we operate >>> with OWL and RDF exclusively, would need to convert these datatypes into >>> OWL. >>> >>> >>> >>> Maybe it'd be worthwhile to get a few of us who are interested in this >>> topic together, with some of the HL7 folks interested, and have a few calls >>> to flush this out and maybe write something up? >>> >>> >>> >>> - Dave >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *From:* Jim McCusker [mailto:james.mccusker@yale.edu] >>> *Sent:* Sunday, August 21, 2011 6:12 PM >>> *To:* Hau, Dave (NIH/NCI) [E] >>> *Cc:* public-semweb-lifesci@w3.org >>> *Subject:* Re: FW: A Fresh Look Proposal (HL7) >>> >>> >>> >>> I feel I need to cut to the chase with this one: XML schema cannot >>> validate semantic correctness. >>> >>> >>> >>> It can validate that XML conforms to a particular schema, but that is >>> syntactic. The OWL validator is nothing like a schema validator, first it >>> produces a closure of all statements that can be inferred from the asserted >>> information. This means that if a secondary ontology is used to describe >>> some data, and that ontology integrates with the ontology that you're >>> attempting to validate against, you will get a valid result. An XML schema >>> can only work with what's in front of it. >>> >>> >>> >>> Two, there are many different representations of information that go >>> beyond XML, and it should be possible to validate that information without >>> anything other than a mechanical, universal translation. For instance, there >>> are a few mappings of RDF into JSON, including JSON-LD, which looks the most >>> promising at the moment. Since RDF/XML and JSON-LD both parse to the same >>> abstract graph, there is a mechanical transformation between them. When >>> dealing with semantic validity, you want to check the graph that is parsed >>> from the document, not the document itself. >>> >>> >>> >>> The content matters, the format does not. For instance, let me define a >>> new RDF format called RDF/CSV: >>> >>> >>> >>> First column is the subject. First row is the predicate. All other cell >>> values are objects. URIs that are relative are relative to the document, as >>> in RDF/XML. >>> >>> >>> >>> I can write a parser for that in 1 hour and publish it. It's genuinely >>> useful, and all you would have to do to read and write it is to use my >>> parser or write one yourself. I can then use the parser, paired with Pellet >>> ICV, and validate the information in the file without any additional work >>> from anyone. >>> >>> >>> >>> Maybe we need a simplified XML representation for RDF that looks more >>> like regular XML. But to make a schema for an OWL ontology is too much work >>> for too little payoff. >>> >>> >>> >>> Jim >>> >>> On Sun, Aug 21, 2011 at 5:45 PM, Hau, Dave (NIH/NCI) [E] < >>> haudt@mail.nih.gov> wrote: >>> >>> Hi all, >>> >>> >>> >>> As some of you may have read, HL7 is rethinking their v3 and doing some >>> brainstorming on what would be a good replacement for a data exchange >>> paradigm grounded in robust semantic modeling. >>> >>> >>> >>> On the following email exchange, I was wondering, if OWL is used for >>> semantic modeling, are there good ways to accomplish the following: >>> >>> >>> >>> 1. Generate a wire format schema (for a subset of the model, the subset >>> they call a "resource"), e.g. XSD >>> >>> >>> >>> 2. Validate XML instances for conformance to the semantic model. (Here >>> I'm reminded of Clark and Parsia's work on their Integrity Constraint >>> Validator: http://clarkparsia.com/pellet/icv ) >>> >>> >>> >>> 3. Map an XML instance conformant to an earlier version of the >>> "resource" to the current version of the "resource" via the OWL semantic >>> model >>> >>> >>> >>> I think it'd be great to get a semantic web perspective on this fresh >>> look effort. >>> >>> >>> >>> Cheers, >>> >>> Dave >>> >>> >>> >>> >>> >>> >>> >>> Dave Hau >>> >>> National Cancer Institute >>> >>> Tel: 301-443-2545 >>> >>> Dave.Hau@nih.gov >>> >>> >>> >>> >>> >>> >>> >>> *From:* owner-its@lists.hl7.org [mailto:owner-its@lists.hl7.org] *On >>> Behalf Of *Lloyd McKenzie >>> *Sent:* Sunday, August 21, 2011 12:07 PM >>> *To:* Andrew McIntyre >>> *Cc:* Grahame Grieve; Eliot Muir; Zel, M van der; HL7-MnM; RIMBAA; HL7 >>> ITS >>> *Subject:* Re: A Fresh Look Proposal >>> >>> >>> >>> Hi Andrew, >>> >>> >>> >>> Tacking stuff on the end simply doesn't work if you're planning to use >>> XML Schema for validation. (Putting new stuff in the middle or the >>> beginning has the same effect - it's an unrecognized element.) The only >>> alternative is to say that all changes after "version 1" of the >>> specification will be done using the extension mechanism. That will create >>> tremendous analysis paralysis as we try to get things "right" for that first >>> version, and will result in increasing clunkiness in future versions. >>> Furthermore, the extension mechanism only works for the wire format. For >>> the RIM-based description, we still need proper modeling, and that can't >>> work with "stick it on the end" no matter what. >>> >>> >>> >>> That said, I'm not advocating for the nightmare we currently have with v3 >>> right now. >>> >>> >>> >>> I think the problem has three parts - how to manage changes to the wire >>> format, how to version resource definitions and how to manage changes to the >>> semantic model. >>> >>> >>> >>> Wire format: >>> >>> If we're using schema for validation, we really can't change anything >>> without breaking validation. Even making an existing non-repeating element >>> repeat is going to cause schema validation issues. That leaves us with two >>> options (if we discount the previously discussed option of "get it right the >>> first time and be locked there forever": >>> >>> 1. Don't use schema >>> >>> - Using Schematron or something else could easily allow validation of the >>> elements that are present, but ignore all "unexpected" elements >>> >>> - This would cause significant pain for implementers who like to use >>> schema to help generate code though >>> >>> >>> >>> 2. Add some sort of a version indicator on new content that allows a >>> pre-processor to remove all "new" content if processing using an "old" >>> handler >>> >>> - Unpleasant in that it involves a pre-processing step and adds extra >>> "bulk" to the instances, but other than that, quite workable >>> >>> >>> >>> I think we're going to have to go with option #2. It's not ideal, but is >>> still relatively painless for implementers. The biggest thing is that we >>> can insist on "no breaking x-path changes". We don't move stuff between >>> levels in a resource wire format definition or rename elements in a resource >>> wire format definition. In the unlikely event we have to deprecate the >>> entire resource and create a new version. >>> >>> >>> >>> Resource versioning: >>> >>> At some point, HL7 is going to find at least one resource where we blew >>> it with the original design and the only way to create a coherent wire >>> format is to break compatibility with the old one. This will then require >>> definition of a new resource, with a new name that occupies the same >>> semantic space as the original. I.e. We'll end up introducing "overlap". >>> Because overlap will happen, we need to figure out how we're going to deal >>> with it. I actually think we may want to introduce overlap in some places >>> from the beginning. Otherwise we're going to force a wire format on >>> implementers of simple community EMRs that can handle prescriptions for >>> fully-encoded chemo-therapy protocols. (They can ignore some of the data >>> elements, but they'd still have to support the full complexity of the nested >>> data structures.) >>> >>> >>> >>> I don't have a clear answer here, but I think we need to have a serious >>> discussion about how we'll handle overlap in those cases where it's >>> necessary, because at some point it'll be necessary. If we don't figure out >>> the approach before we start, we can't allow for it in the design. >>> >>> >>> >>> All that said, I agree with the approach of avoiding overlap as much as >>> humanly possible. For that reason, I don't advocate calling the Person >>> resource "Person_v1" or something that telegraphs we're going to have new >>> versions of each resource eventually (let alone frequent changes). >>> Introduction of a new version of a resource should only be done when the >>> pain of doing so is outweighed by the pain of trying to fit new content in >>> an old version, or requiring implementers of the simple to support the >>> structural complexity of our most complex use-cases. >>> >>> >>> >>> >>> >>> Semantic model versioning: >>> >>> This is the space where "getting it right" the first time is the most >>> challenging. (I think we've done that with fewer than half of the normative >>> specifications we've published so far.) V3 modeling is hard. The positive >>> thing about the RFH approach is that very few people need to care. We could >>> totally refactor every single resource's RIM-based model (or even remove >>> them entirely), and the bulk of implementers would go on merrily exchanging >>> wire syntax instances. However, that doesn't mean the RIM-based >>> representations aren't important. They're the foundation for the meaning of >>> what's being shared. And if you want to start sharing at a deeper level >>> such as RIMBAA-based designs, they're critical. This is the level where OWL >>> would come in. If you have one RIM-based model structure, and then need to >>> refactor and move to a different RIM-based model structure, you're going to >>> want to map the semantics between the two structures so that anyone who was >>> using the old structure can manage instances that come in with the new >>> structure (or vice versa). OWL can do that. And anyone who's got a complex >>> enough implementation to parse the wire format and trace the elements >>> through the their underlying RIM semantic model will likely be capable of >>> managing the OWL mapping component as well. >>> >>> >>> >>> >>> >>> In short, I think we're in agreement that separation of wire syntax and >>> semantic model are needed. That will make model refactoring much easier. >>> However we do have to address how we're going to handle wire-side and >>> resource refactoring too. >>> >>> >>> >>> >>> >>> Lloyd >>> >>> -------------------------------------- >>> Lloyd McKenzie >>> >>> +1-780-993-9501 >>> >>> >>> >>> Note: Unless explicitly stated otherwise, the opinions and positions >>> expressed in this e-mail do not necessarily reflect those of my clients nor >>> those of the organizations with whom I hold governance positions. >>> >>> On Sun, Aug 21, 2011 at 7:53 AM, Andrew McIntyre < >>> andrew@medical-objects.com.au> wrote: >>> >>> Hello Lloyd, >>> >>> While "tacking stuff on the end" in V2 may not at first glance seem like >>> an elegant solution I wonder if it isn't actually the best solution, and one >>> that has stood the test of time. The parsing rules in V2 do make version >>> updates quite robust wrt backward and forward inter-operability. >>> >>> I am sure it could be done with OWL but I doubt we can switch the world >>> to using OWL in any reasonable time frame and we probably need a less >>> abstract representation for commonly used things. In V2 OBX segments, used >>> in a hierarchy can create an OWL like object-attribute structure for >>> information that is not modeled by the standard itself. >>> >>> I do think the wire format and any overlying model should be distinct >>> entities so that the model can be evolved and the wire format be changed in >>> a backward compatible way, at least for close versions. >>> >>> I also think that the concept of templates/archetypes to extend the model >>> should not invalidate the wire format, but be a metadata layer over the wire >>> format. This is what we have done in Australia with an ISO 13606 Archetypes >>> in V2 projects. I think we do need a mechanism for people to develop >>> templates to describe hierarchical data and encode that in the wire format >>> in a way that does not invalidate its vanilla semantics (ie non templated V2 >>> semantics) when the template mechanism is unknown or not implemented. >>> >>> In a way the V2 specification does hit at underlying objects/Interfaces, >>> and there is a V2 model, but it is not prescriptive and there is no >>> requirement for systems to use the same internal model as long as they use >>> the bare bones V2 model in the same way. Obviously this does not always work >>> as well as we would like, even in V2, but it does work well enough to use it >>> for quite complex data when there are good implementation guides. >>> >>> If we could separate the wire format from the clinical models then the 2 >>> can evolve in their own way. We have done several trial implementations of >>> Virtual Medical Record Models (vMR) which used V3 datatypes and RIM like >>> classes and could build those models from V2 messages, or in some cases non >>> standard Web Services, although for specific clinical classes did use ISO >>> 13606 archetypes to structure the data in V2 messages. >>> >>> I think the dream of having direct model serializations as messages is >>> flawed for all the reasons that have made V3 impossible to implement in the >>> wider world. While the tack it on the end, lots of optionality rationale >>> might seem clunky, maybe its the best solution to a difficult problem. If we >>> define tight SOAP web services for everything we will end up with thousands >>> of slightly different SOAP calls for every minor variation and I am not sure >>> this is the path to enlightenment either. >>> >>> I am looking a Grahams proposal now, but I do wonder if the start again >>> from scratch mentality is not part of the problem. Perhaps that is a lesson >>> to be learned from the V3 process. Maybe the problem is 2 complex to solve >>> from scratch and like nature we have to evolve and accept there is lots of >>> junk DNA, but maintaining a working standard at all times is the only way to >>> avoid extinction. >>> >>> I do like the idea of a cohesive model for use in decision support, and >>> that's what the vMR/GELLO is about, but I doubt there will ever be a one >>> size fits all model and any model will need to evolve. Disconnecting the >>> model from the messaging, with all the pain that involves, might create a >>> layered approach that might allow the HL7 organism to evolve gracefully. I >>> do think part of the fresh look should be education on what V2 actually >>> offers, and can offer, and I suspect many people in HL7 have never seriously >>> looked at it in any depth. >>> >>> Andrew McIntyre >>> >>> >>> >>> Saturday, August 20, 2011, 4:37:37 AM, you wrote: >>> >>> Hi Grahame, >>> >>> Going to throw some things into the mix from our previous discussions >>> because I don't see them addressed yet. (Though I admit I haven't reread >>> the whole thing, so if you've addressed and I haven't seen, just point me at >>> the proper location.) >>> >>> One of the challenges that has bogged down much of the v3 work at the >>> international level (and which causes a great deal of pain at the >>> project/implementation level) is the issue of refactoring. The pain at the >>> UV level comes from the fact that we have a real/perceived obligation to >>> meet all known and conceivable use-cases for a particular domain. For >>> example, the pharmacy domain model needs to meet the needs of clinics, >>> hospitals, veterinarians, and chemotherapy protocols and must support the >>> needs of the U.S., Soviet union and Botswana. To make matters more >>> interesting, participation from the USSR and Botswana is a tad light. >>> However the fear is that if all of these needs aren't taken into account, >>> then when someone with those needs shows up at the door, the model will need >>> to undergo substantive change, and that will break all of the existing >>> systems. >>> >>> The result is a great deal of time spent gathering requirements and >>> refactoring and re-refactoring the model as part of the design process, >>> together with a tendency to make most, if not all data elements optional at >>> the UV level. A corollary is that the UV specs are totally unimplementable >>> in an interoperable fashion. The evil of optionality that manifested in v2 >>> that v3 was going to banish turned out to not be an issue of the standard, >>> but rather of the issues with creating a generic specification that >>> satisfies global needs and a variety of use-cases. >>> >>> The problem at the implementer/project level is that when you take the UV >>> model and tightly constrain it to fit your exact requirements, you discover >>> 6 months down the road that one or more of your constraints was wrong and >>> you need to loosen it, or you have a new requirement that wasn't thought of, >>> and this too requires refactoring and often results in wire-level >>> incompatibilities. >>> >>> One of the things that needs to be addressed if we're really going to >>> eliminate one of the major issues with v3 is a way to reduce the fear of >>> refactoring. Specifically, it should be possible to totally refactor the >>> model and have implementations and designs work seemlessly across versions. >>> >>> I think putting OWL under the covers should allows for this. If we can >>> assert equivalencies between data elements in old and new models, or even >>> just map the wire syntaxes of old versions to new versions of the definition >>> models, then this issue would be significantly addressed: >>> - Committees wouldn't have to worry about satisfying absolutely every >>> use-case to get something useful out because they know they can make changes >>> later without breaking everything. (They wouldn't even necessarily have to >>> meet all the use-cases of the people in the room! :>) >>> - Realms and other implementers would be able to have an interoperability >>> path that allowed old wire formats to interoperate with new wireformats >>> through the aid of appropriate tooling that could leverage the OWL under the >>> covers. (I think creating such tooling is *really* important because >>> version management is a significant issue with v3. And with XML and >>> schemas, the whole "ignore everything on the end you don't recognize" from >>> v2 isn't a terribly reasonable way forward. >>> >>> I think it's important to figure out exactly how refactoring and version >>> management will work in this new approach. The currently proposed approach >>> of "you can add stuff, but you can't change what's there" only scales so >>> far. >>> >>> >>> I think we *will* need to significantly increase the number of Resources >>> (from 30 odd to a couple of hundred). V3 supports things like invoices, >>> clinical study design, outbreak tracking and a whole bunch of other >>> healthcare-related topics that may not be primary-care centric but are still >>> healthcare centric. That doesn't mean all (or even most) systems will need >>> to deal with them, but the systems that care will definitely need them. The >>> good news is that most of these more esoteric areas have responsible >>> committees that can manage the definition of these resources, and as you >>> mention, we can leverage the RMIMs and DMIMs we already have in defining >>> these structures. >>> >>> >>> The specification talks about robust capturing of requirements and >>> traceability to them, but gives no insight into how this will occur. It's >>> something we've done a lousy job of with v3, but part of the reason for that >>> is it's not exactly an easy thing to do. The solution needs to flesh out >>> exactly how this will happen. >>> >>> >>> We need a mapping that explains exactly what's changed in the datatypes >>> (and for stuff that's been removed, how to handle that use-case). >>> >>> There could still be a challenge around granularity of text. As I >>> understand it, you can have a text representation for an attribute, or for >>> any XML element. However, what happens if you have a text blob in your >>> interface that covers 3 of 7 attributes inside a given XML element. You >>> can't use the text property of the element, because the text only covers 3 >>> of 7. You can't use the text property of one of the attributes because it >>> covers 3 separate attributes. You could put the same text in each of the 3 >>> attributes, but that's somewhat redundant and is going to result in >>> rendering issues. One solution might be to allow the text specified at the >>> element level to identify which of the attributes the text covers. A >>> rendering system could then use that text for those attributes, and then >>> render the discrete values of the remaining specified attributes. What this >>> would mean is that an attribute might be marked as "text" but not have text >>> content directly if the parent element had a text blob that covered that >>> attribute. >>> >>> >>> >>> New (to Grahame) comments: >>> >>> I didn't see anything in the HTML section or the transaction section on >>> how collisions are managed for updates. A simple requirement (possibly >>> optional) to include the version id of the resource being updated or deleted >>> should work. >>> >>> To my knowledge, v3 (and possibly v2) has never supported true "deletes". >>> At best, we do an update and change the status to nullified. Is that the >>> intention of the "Delete" transaction, or do we really mean a true "Delete"? >>> Do we have any use-cases for true deletes? >>> >>> I wasn't totally clear on the context for uniqueness of ids. Is it >>> within a given resource or within a given base URL? What is the mechanism >>> for referencing resources from other base URLs? (We're likely to have >>> networks of systems that play together.) >>> >>> Nitpick: I think "id" might better be named "resourceId" to avoid any >>> possible confusion with "identifier". I recognize that from a coding >>> perspective, shorter is better. However, I think that's outweightd by the >>> importance of avoiding confusion. >>> >>> In the resource definitions, you repeated definitions for resources >>> inherited from parent resources. E.g. Person.created inherited from >>> Resource.Base.created. Why? That's a lot of extra maintenance and >>> potential for inconsistency. It also adds unnecessary volume. >>> >>> Suggest adding a caveat to the draft that the definitions are >>> placeholders and will need significant work. (Many are tautological and >>> none meet the Vocab WG's guidelines for quality definitions.) >>> >>> Why is Person.identifier mandatory? >>> >>> You've copied "an element from Resource.Base.???" to all of the Person >>> attributes, including those that don't come from Resource.Base. >>> >>> Obviously the workflow piece and the conformance rules that go along with >>> it need some fleshing out. (Looks like this may be as much fun in v4 as it >>> has been in v3 :>) >>> >>> The list of identifier types makes me queasy. It looks like we're >>> reintroducing the mess that was in v2. Why? Trying to maintain an ontology >>> of identifier types is a lost cause. There will be a wide range of >>> granularity requirements and at fine granularity, there will be 10s of >>> thousands. The starter list is pretty incoherent. If you're going to have >>> types at all, the vocabulary should be constrained to a set of codes based >>> on the context in which the real-world identifier is present. If there's no >>> vocabulary defined for the property in that context, then you can use text >>> for a label and that's it. >>> >>> I didn't see anything on conformance around datatypes. Are we going to >>> have datatype flavors? How is conformance stated for datatype properties? >>> >>> I didn't see templateId or flavorId or any equivalent. How do instances >>> (or portions there-of) declare conformance to "additional" constraint >>> specifications/conformance profiles than the base one for that particular >>> server? >>> >>> We need to beef up the RIM mapping portion considerably. Mapping to a >>> single RIM class or attribute isn't sufficient. Most of the time, we're >>> going to need to map to a full context model that talks about the >>> classCodes, moodCodes and relationships. Also, you need to relate >>> attributes to the context of the RIM location of your parent. >>> >>> There's no talk about context conduction, which from an implementation >>> perspective is a good thing. However, I think it's still needed behind the >>> scenes. Presumably this would be covered as part of the RIM semantics >>> layer? >>> >>> In terms of the "validate" transaction, we do a pseudo-validate in >>> pharmacy, but a 200 response isn't sufficient. We can submit a draft >>> prescription and say "is this ok?". The response might be as simple as >>> "yes" (i.e. a 200). However, it could also be a "no" or "maybe" with a list >>> of possible contraindications, dosage issues, allergy alerts and other >>> detected issues. How would such a use-case be met in this paradigm? >>> >>> At the risk of over-complicating things, it might be useful to think >>> about data properties as being identifying or not to aid in exposing >>> resources in a de-identified way. (Not critical, just wanted to plant the >>> seed in your head about if or how this might be done.) >>> >>> >>> All questions and comments aside, I definitely in favour of fleshing out >>> this approach and looking seriously at moving to it. To that end, I think >>> we need a few things: >>> - A list of the open issues that need to be resolved in the new approach. >>> (You have "todo"s scattered throughout. A consolidated list of the "big" >>> things would be useful.) >>> - An analysis of how we move from existing v3 to the new approach, both >>> in terms of leveraging existing artifacts and providing a migration path for >>> existing solutions as well as what tools, etc. we need. >>> - A plan for how to engage the broader community for review. (Should >>> ideally do this earlier rather than later.) >>> >>> Thanks to you, Rene and others for all the work you've done. >>> >>> >>> Lloyd >>> >>> -------------------------------------- >>> Lloyd McKenzie >>> >>> +1-780-993-9501 >>> >>> >>> >>> Note: Unless explicitly stated otherwise, the opinions and positions >>> expressed in this e-mail do not necessarily reflect those of my clients nor >>> those of the organizations with whom I hold governance positions. >>> >>> >>> On Fri, Aug 19, 2011 at 9:08 AM, Grahame Grieve <grahame@kestral.com.au >>> >>> > wrote: >>> >>> >>> hi All >>> >>> Responses to comments >>> >>> #Michael >>> >>> > 1. I would expect more functional interface to use these resources. >>> >>> as you noted in later, this is there, but I definitely needed to make >>> more of it. That's where I ran out of steam >>> >>> > 2. One of the things that was mentioned (e.g. at the Orlando >>> > WGM RIMBAA Fresh Look discussion) is that we want to use >>> > industry standard tooling, right? Are there enough libraries that >>> > implement REST? >>> >>> this doesn't need tooling. There's schemas if you want to bind to them >>> >>> > 2b. A lot of vendors now implement WebServices. I think we should >>> > go for something vendors already have or will easilly adopt. Is that >>> the case with REST? >>> >>> Speaking as a vendor/programmer/writer of an open source web services >>> toolkit, I prefer REST. Way prefer REST >>> >>> > Keep up the good work! >>> >>> ta >>> >>> #Mark >>> >>> > I very much like the direction of this discussion towards web services >>> > and in particular RESTful web services. >>> >>> yes, though note that REST is a place to start, not a place to finish. >>> >>> > At MITRE we have been advocating this approach for some time with our >>> hData initiative. >>> >>> yes. you'll note my to do: how does this relate to hData, which is a >>> higher level >>> specification than the CRUD stuff here. >>> >>> #Eliot >>> >>> > Hats off - I think it's an excellent piece of work and definitely a >>> step in right direction. >>> >>> thanks. >>> >>> > I didn't know other people in the HL7 world other than me were talking >>> about >>> > (highrise). Who are they? >>> >>> not in Hl7. you were one. it came up in some other purely IT places that >>> I play >>> >>> > 5) Build it up by hand with a wiki - it is more scalable really since >>> you >>> >>> wiki's have their problems, though I'm not against them. >>> >>> > 1) I think it would be better not to use inheritance to define a >>> patient as >>> > a sub type of a person. The trouble with that approach is that people >>> can >>> >>> On the wire, a patient is not a sub type of person. The relationship >>> between the two is defined in the definitions. >>> >>> > A simpler approach is associate additional data with a person if and >>> when >>> > they become a patient. >>> >>> in one way, this is exactly what RFH does. On the other hand, it creates >>> a >>> new identity for the notion of patient (for integrity). We can discuss >>> whether that's good or bad. >>> >>> > 2) I'd avoid language that speaks down to 'implementers'. It's >>> enterprise >>> >>> really? Because I'm one. down the bottom of your enterprise pole. And >>> I'm happy to be one of those stinking implementers down in the mud. >>> I wrote it first for me. But obviously we wouldn't want to cause offense. >>> I'm sure I haven't caused any of that this week ;-) >>> >>> > 3) If you want to reach a broader audience, then simplify the language. >>> >>> argh, and I thought I had. how can we not use the right terms? But I >>> agree that the introduction is not yet direct enough - and that's after >>> 4 rewrites to try and make it so.... >>> >>> Grahame >>> >>> >>> ************************************************ >>> To access the Archives of this or other lists or change your list >>> settings and information, go to: >>> >>> http://www.hl7.org/listservice >>> >>> >>> >>> ************************************************ >>> To access the Archives of this or other lists or change your list >>> settings and information, go to: http://www.hl7.org/listservice >>> >>> >>> >>> >>> >>> *-- >>> Best regards, >>> Andrew * >>> mailto:andrew@Medical-Objects.com.au <andrew@Medical-Objects.com.au> >>> >>> *sent from a real computer* >>> >>> >>> >>> >>> >>> ************************************************ >>> >>> To access the Archives of this or other lists or change your list settings and information, go to: http://www.hl7.org/listservice >>> >>> >>> >>> >>> >>> -- >>> Jim McCusker >>> Programmer Analyst >>> Krauthammer Lab, Pathology Informatics >>> Yale School of Medicine >>> james.mccusker@yale.edu | (203) 785-6330 >>> http://krauthammerlab.med.yale.edu >>> >>> PhD Student >>> Tetherless World Constellation >>> Rensselaer Polytechnic Institute >>> mccusj@cs.rpi.edu >>> http://tw.rpi.edu >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Jim McCusker >>> Programmer Analyst >>> Krauthammer Lab, Pathology Informatics >>> Yale School of Medicine >>> james.mccusker@yale.edu | (203) 785-6330 >>> http://krauthammerlab.med.yale.edu >>> >>> PhD Student >>> Tetherless World Constellation >>> Rensselaer Polytechnic Institute >>> mccusj@cs.rpi.edu >>> http://tw.rpi.edu >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> > > > > >
Received on Sunday, 28 August 2011 17:27:32 UTC