- From: Helena Deus <helenadeus@gmail.com>
- Date: Fri, 2 Sep 2011 15:40:55 +0100
- To: Michael Miller <Michael.Miller@systemsbiology.org>
- Cc: "Hau, Dave (NIH/NCI) [E]" <haudt@mail.nih.gov>, Jim McCusker <james.mccusker@yale.edu>, John Madden <john.madden@duke.edu>, public-semweb-lifesci@w3.org, conor dowling <conor-dowling@caregraf.com>
- Message-ID: <CAPkJ_9=L5d5qyYgbygL+709xm5CxwyP6jARZAE4EQ3fBqvR1oQ@mail.gmail.com>
Hi Mike, all, The tcga.s3db.org website is down because i moved out of md anderson - they shut down our servers a few months ago and I did not have time to replace it yet ... sorry about that :-( Best, Lena On Fri, Sep 2, 2011 at 3:25 PM, Michael Miller < Michael.Miller@systemsbiology.org> wrote: > hi dave, > > > > thanks for the info. > > > > 'http://tcga.s3db.org' > > > > the web site no longer seems viable. it looks like it went up in 2008 so > could be out of date. it just spins on 'loading domain description' and > eventually errors out. i tried the links and they didn't bring anything up. > > > > lena, you are listed as one of the co-authors of the page, any clue? > > > > cheers, > > michael > > > > > > *From:* Hau, Dave (NIH/NCI) [E] [mailto:haudt@mail.nih.gov] > *Sent:* Thursday, September 01, 2011 10:00 AM > *To:* Michael Miller; conor dowling > *Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org > *Subject:* RE: A Fresh Look Proposal (HL7) > > > > Last week NCI published a list of 24 provocative questions and > corresponding RFA's (funding announcements): > > > > http://provocativequestions.nci.nih.gov/rfa > > > > I'd like to encourage everyone to review this list, and see if there's any > question(s) we could work on collaboratively in the coming year, to see how > well we could tackle them from an informatics perspective using semantic web > technology, esp. in the context of our discussion on integrating semantics > between life science and clinical research and care. > > > > In the process, if we could make use of the TCGA data set (e.g. via the > SPARQL endpoint: http://tcga.s3db.org), or other datasets or reference > domain ontologies, that would be great. > > > > I think in harmonizing life science and clinical semantics, focusing on > such "rubber meets the road" kind of use cases would help ground our > discussion solidly in real science. > > > > - Dave > > > > > > > > > > *From:* Michael Miller [mailto:Michael.Miller@systemsbiology.org] > *Sent:* Wednesday, August 31, 2011 3:25 PM > *To:* Hau, Dave (NIH/NCI) [E]; conor dowling > *Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org > *Subject:* RE: A Fresh Look Proposal (HL7) > > > > hi all, > > > > conor, excellent points in your last email. > > > > "...is an increase in the interdependencies and overlaps between the > information model and the terminology " > > > > my experience has been that this doesn't necessarily have to happen. just > as linked data experience has shown, one can reason across ontology > boundaries without prior knowledge of the links. > > > > "...to reason on the information model and the value set together to > determine the right values for a particular field..." > > > > one can do this without the information model having knowledge of the value > set. the information model sets the meta-expectation of what is expected > and then the value set can be examined for the best possible fit without > there needing to be knowledge of the value set by the information model. > one can, of course, couple an information model to a terminology but i > believe that is bad modeling and with a little more effort can be avoided. > > > > "If the information model could be expressed in a language that supports > reasoning..." > > > > yes, this would allow the statement above to be computable. > > > > cheers, > > michael > > > > > > *From:* Hau, Dave (NIH/NCI) [E] [mailto:haudt@mail.nih.gov] > *Sent:* Wednesday, August 31, 2011 11:58 AM > *To:* Michael Miller; conor dowling > *Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org > *Subject:* RE: A Fresh Look Proposal (HL7) > > > > Michael I agree with you and I see where Conor is coming from too. I agree > the information model should be decoupled from the value set to a certain > extent, so each can evolve on its own and sustain over time. > > > > OTOH, I see what Conor meant by being able to reason on the information > model and the value set together to determine the right values for a > particular field. SNOMED in particular, has a flexible grammar that allows > a wide variety of post-coordinated expressions, so it would be quite > impossible to exhaustively list out all allowed values as in an extensional > definition of a value set. > > > > I think this is exactly where CTS2 comes in, in terms of improving CTS so > that value sets can be more computable with reasoning. The distinction > between the intensional and extensional definition of a value set would be > very useful in this regard, because the intensional definition if defined in > a computable way, can certainly be used to accomplish the above. > > > > If the information model could be expressed in a language that supports > reasoning, that would be even better because now you can reason across the > field, the intensional value set, and the particular value a user has > chosen. > > > > - Dave > > > > > > > > > > *From:* Michael Miller [mailto:Michael.Miller@systemsbiology.org] > *Sent:* Tuesday, August 30, 2011 4:56 PM > *To:* conor dowling > *Cc:* Hau, Dave (NIH/NCI) [E]; Jim McCusker; John Madden; > public-semweb-lifesci@w3.org > *Subject:* RE: A Fresh Look Proposal (HL7) > > > > hi conor, > > > > i think this discussion has been missing the point about how a standard is > developed and its relationship to ontologies/vocabularies that will be used > for it. > > > > for an EHR, for instance, when a DAM is developed, what is important are > the high level details such as 'patient', 'illness', 'disease state', not > how one will record those details. more important is the relationship > between the high level details. currently, in HL7, a flavor of UML is used, > that's not to say an ontology could equally well be used but it would still > be at this higher level. and even tho an ontology could be used for the > modeling, the amount of impedance and change to the bylaws of HL7 probably > precludes that, altho a companion ontology to the UML could be, but it would > not be normative. > > > > even when one goes to the implementation, RMIM, level, it is still > important that the specific ontologies/vocabularies like SNOMED, gene > symbols, etc, are loosely coupled to the standard. it's been shown that > this makes the standard more robust because as time moves on, new > vocabularies are created or the standard is used in another area where there > are more appropriate vocabularies the original creators of the standard > weren't aware of. so the paradigm of having a place for the term is always > accomplied by a place to say from what vocabulary the term came from. > (there are some places in the CG standards that specify LOINC codes but if > you look, there is always an out to use some other vocabulary if desired) > > > > that said, this discussion to me isn't about HL7, it's about the proper way > to use SNOMED regardless of where it is used. it's just that Dave is > interested in HL7 and so HL7 was the example at hand. right now there are > some v2 standards from CG that are being used (by harvard medical amongst > others) so i agree it is important to make these issues with the use of > ontologies/vocabularies part of the discussion now but it is important to > understand its place in the discussion. i would hazard a guess that the > applications that produce HL7 formatted documents, for the most part, do not > deal directly with the vocabularies but are reading the values from a > database where when the test was entered into the database, the term came > from a drop down list or some such. so it's not clear to me where the > target audience for getting SNOMED right is other than its probably not in > HL7 standards. > > > > cheers, > > michael > > > > > > > > *From:* conor dowling [mailto:conor-dowling@caregraf.com] > *Sent:* Friday, August 26, 2011 2:43 PM > *To:* Michael Miller > *Cc:* Hau, Dave (NIH/NCI) [E]; Jim McCusker; John Madden; > public-semweb-lifesci@w3.org > *Subject:* Re: A Fresh Look Proposal (HL7) > > > > > > "I think a SNOMED capable DAM should limit the coordination allowed." > > > > ... using SNOMED as your terminology is an implementation detail. > > > > Michael, > > > > one problem with leaving it to implementation is the variety allowed in a > concept scheme like SNOMED. Take a disorder like Hypercholesterolemia<http://datasets.caregraf.org/snomed#!13644009>: > and a patient record with ... > > > > :finding snomed:13644009 # Hypercholesterolemia > > > > another description of the same thing has ... > > > > :finding snomed:166830008 # Serum cholesterol raised > > > > which is effectively equivalent. The "bridge" is ... > > > > snomed:13644009 snomed:363705008 snomed:166830008 (More here<http://www.caregraf.com/blog/the-problem-with-picking-problems> > ) > > # *Hypercholesterolemia* *has definitional manifestation* *Serum > cholesterol raised*. > > > > the question is where the bridge goes. Is "has definitional manifestation" > defined consistently with the predicate "finding" or is it part of a > completely separate concept model and never bought into play by one > application? > > > > To me, all of this information goes into one "soup" - in linked data, you > have *one big graph of* medical expression. I don't see the point in > separate *media* for "statements about conditions" and "statements about > condition types". > > > > If in practice - well it's recommended - patient records use SNOMED then > tying down that expression should be front and center of any clinical-care > modeling effort. To be useful and implementable, we can't say "use any > scheme you want" because that's equivalent to saying "you can only do > trivial reasoning on this information". > > > > Conor > > > > > > *From:* conor dowling [mailto:conor-dowling@caregraf.com] > *Sent:* Wednesday, August 24, 2011 3:26 PM > > > *To:* Hau, Dave (NIH/NCI) [E] > > *Cc:* Michael Miller; Jim McCusker; John Madden; > public-semweb-lifesci@w3.org > > > *Subject:* Re: A Fresh Look Proposal (HL7) > > > > DAM: it's good to have a name. Were OWL to be used for them and then other > forms derived from that, you'd get the best of both worlds - get into > Semantics and move on. > > > > One other nuance to throw in for the "model-terminology" match up. SNOMED > raises a concern about the degree of "concept coordination" you should or > should not do, about what load the terminology should take and what should > be left to the model. A simple example is do you allow "disorder: allergy to > strawberry" or do you make the model carry "disorder: allergy + allergin: > strawberry" or do you allow both expressions? (see: > http://www.caregraf.com/blog/there-once-was-a-strawberry-allergy) > > > > I think a SNOMED capable DAM should limit the coordination allowed. It > should make the model carry qualifiers for severity, for progression, for > allergin ... To use it, you would need to normalize these "adjectives" out > of any concept. > > > > I suppose what I'm saying is that any useful DAM should severely limit > alternatives, in a way that goes beyond simple enumerations of permitted > values and the nice thing about concept schemes like SNOMED is that this > shouldn't be hard to do - crudely in SNOMED it would mean only allowing > primitive concepts, the atoms from which compound concepts are made. > > > > BTW, this doesn't effect what a doctor sees on a screen - it's a matter of > what expressions to use for interoperability. The two issues need to be > strictly separated and right now, if you look at how CCDs are viewed, > they're thick as thieves, > > > > Conor > > On Wed, Aug 24, 2011 at 2:49 PM, Hau, Dave (NIH/NCI) [E] < > haudt@mail.nih.gov> wrote: > > > the kind of reasoning, i think, that you want to do, conor, would run on > top of the information in the HL7 v3 formatted documents to take advantage > of, among other things, the linked data cloud. > > > > Agree. Earlier there was a discussion in HL7 on their Domain Analysis > Model (DAM) effort - what exactly is a DAM and what it's supposed to do. I > think one possible approach would be to consider these DAMs as ontologies > (i.e. conceptual models, knowledge), use OWL in the normative version of > these DAMs, then to develop UML models and XSDs from the DAMs to use in > applications. The DAMs can be harmonized with other domain ontologies out > there, and promoted for global adoption. The UML models can be encouraged > but not as strictly enforced, while alternatively allowing people to use RDF > to tie data directly to concepts in the ontologies / DAMs. > > > > - Dave > > > > > > > > > > *From:* Michael Miller [mailto:Michael.Miller@systemsbiology.org] > *Sent:* Wednesday, August 24, 2011 11:12 AM > *To:* conor dowling; Hau, Dave (NIH/NCI) [E] > > > *Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org > > *Subject:* RE: A Fresh Look Proposal (HL7) > > > > hi all, > > > > john, very well laid out argument in your email and what i've found in > practice (and didn't think that consciously about until i read your email). > > > > conor, i agree with your points. but i find it interesting that OWL is > expressed as XML for communication reasons. XML has become pretty much the > de facto standard for 'trading' information. it's how MAGE-ML was used by > the gene expression application i worked on at Rosetta to do import and > export. but the storage and presentation of the information was certainly > not XML, the analysis of the data would take forever. the trick is to make > very clear what the extra semantics are and that is well understood for OWL > as XML. when someone wants to use an ontology they've received as an XML > document, the first thing to do is transform the information in the XML so > that the logic can be run easily (this gets back to john's points) > > > > one thing the clinical genomics group has talked about is that with the > HL7 specs expressed in XML, the important part is that canonical validation > applications are written that verify whether a document is conformant with > the additional semantics plus provide boiler plate examples. this allows > the developers not to read the docs too closely but understand when they've > done something wrong! (not ideal but works, that's why OWL in XML works, > there's a great body of tools) > > > > (from dave) > > > > "One way would be as Michael suggested, to use ODM for mapping UML to OWL. > But is this mapping to OWL full or to a more computable dialect of OWL? And > would there be notions in UML that are not expressible in OWL and vice > versa? Should we maintain both the UML model and the OWL ontology as > normative, or one of the two, and if so, which one?" > > > > i think where things get dicey is in the business/logic (there's a good > discussion in the spec), so it is probably to a more computable dialect of > OWL. but in practice, the type of information that needs to be 'traded' by > HL7 specs tends to be straight-forward information with the controlled > vocabularies contributing the extra semantics of how a particular code > relates to the patient and the report in the document and also connects out > to the larger world. one thing the clinical genomics group has tried to do > is leave open what controlled vocabulary to use (this is something that i > think MAGE-OM was one of the first to get right). normally LOINC is > recommended but, in the genomics world it is true things become out of date > so to get the right term may require a new CV. the kind of reasoning, i > think, that you want to do, conor, would run on top of the information in > the HL7 v3 formatted documents to take advantage of, among other things, the > linked data cloud. > > > > so i guess what i'm saying here is that using XML as the language of > interchange is not a bad thing but that it is expected, and this needs to be > made clear, that the XML is almost certainly not the best storage mechanism > for the data. > > > > cheers, > > michael > > > > *From:* public-semweb-lifesci-request@w3.org [mailto: > public-semweb-lifesci-request@w3.org] *On Behalf Of *conor dowling > *Sent:* Tuesday, August 23, 2011 5:22 PM > *To:* Hau, Dave (NIH/NCI) [E] > *Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org > *Subject:* Re: A Fresh Look Proposal (HL7) > > > > So Conor if I understand you correctly, you're saying that the current gap > that should be addressed in Fresh Look is that the current HL7 v3 models are > not specified in a language that can be used for reasoning, i.e. they are > not OWL ontologies, otherwise publishing value sets would not be necessary > because the reasoning could determine whether a particular value (i.e. > "object" in your email) would be valid for a particular observation (i.e. > "verb). Is that what you're saying? > > > > Dave, > > > > exactly - that the patient information model and any recommended > terminologies be defined in the same medium and that the medium be capable > of capturing permitted ranges, appropriate domains etc. for all predicates: > I think a flavor of OWL with a closed-world assumption is the only real game > in town but ... > > > > One goal (always easier to agree on goals than technologies!) is that an > "allergic to allergy" misstep wouldn't happen - there would be no need to > read guidance and coders don't read! A meaningful use test would assert > permitted ranges (ex/ allergin class: > http://datasets.caregraf.org/snomed#!406455002 for a property "allergin"). > > > > Of course, 'correctness' isn't the only goal or result: transforming > between equivalent expressions supported by model+terminology should be > possible and promoted (take: > http://www.caregraf.com/blog/good-son-jones-diabetic-ma ). And then > there's the direct path to decision-support which you mention above. > > > > The focus on enforcing syntactic correctness would fade away and the model > specifier's demand for greater precision from terminologies should drive > improvements there. This is far from new: some HL7 and SNOMED documents > identify the need to marry model and terminology but go no further. > > > > I think the current meaningful-use CCD has six areas - allergies, problems, > procedures ... It would be interesting to try one or two, say look at > Kaiser's problem subset from SNOMED and see how a HL7-based OWL patient > model and that could work together. There are a lot of pieces in the wild > now: they just need a forum to play in. > > > > One last thing, slightly off the thread but still on topic I think. I don't > see any reason to mix up "human readable" and "machine processable". One > possibility for a patient model update, one that bypasses the need for > buy-in by everyone, irrespective of use case, is to call out the need for a > model of description purely for machine processing, one without the "we'll > XSLT the patient record in the doctor's browser". While the current > standards lead to human-readable data-dumps, a stricter parallel track could > take the best of current standards and re-state them in OWL to deliver > machine-processable health data exchange, > > > > Conor > > > > > > I agree OWL ontologies are useful in health informatics because reasoning > can be used for better validation, decision support etc. I'm wondering, is > there a need for both a UML type modeling language and OWL (or other > logic-based language) to be used simultaneously? If so, how? Should OWL be > used for representing knowledge, and UML be used for representing > application models? > > > > One way would be as Michael suggested, to use ODM for mapping UML to OWL. > But is this mapping to OWL full or to a more computable dialect of OWL? And > would there be notions in UML that are not expressible in OWL and vice > versa? Should we maintain both the UML model and the OWL ontology as > normative, or one of the two, and if so, which one? > > > > - Dave > > > > ps. Michael, nice meeting you at the caBIG F2F too! > > > > > > > > *From:* conor dowling [mailto:conor-dowling@caregraf.com] > *Sent:* Monday, August 22, 2011 12:28 PM > *To:* John Madden > *Cc:* Jim McCusker; Hau, Dave (NIH/NCI) [E]; public-semweb-lifesci@w3.org > *Subject:* Re: A Fresh Look Proposal (HL7) > > > > >> for each tool-chain, there are some kinds of content that are natural > and easy to express, and other kinds of content that are difficult and > imperspicuous to express > > > > it's the old "medium is the message" and as you say John, it's somewhat > unavoidable, But this connection doesn't imply all media are equally > expressive. > > > > Making XSD/XML the focus for definition rather than seeing it as just one > end-of-a-road serialization is limiting because as a medium, it puts the > focus on syntax, not semantics. That can't be said of OWL/SKOS/RDFS ... > > > > By way of example: you could have a patient data ontology, one that works > with a KOS like SNOMED and if an implementor likes XML, there's nothing to > stop ... > > > > RDF (turtle) conformant to ontologies/KOS --> RDF/XML ---- > XSLT ----> CCD (ex) > > > > as a chain. It's trivial. But if you start with lot's of XSL, well you get > only what that medium permits and promotes, which is a focus on syntax, on > the presence or absence of fields, as opposed to guidance on the correct > concept to use with this or that verb. > > > > Of course, a verb-object split is comfortable because those building > information models can work independently of those creating terminologies > but is such separation a good thing? Now, were both to work in a common > medium then inevitably ... > > > > Conor > > > > p.s. the public release by Kaiser of their subsets of SNOMED (CMT) is the > kind of thing that will make that KOS more practical. Now what's needed is > tighter definition of the model to use with that and similar sub schemes. > > On Mon, Aug 22, 2011 at 9:03 AM, John Madden <john.madden@duke.edu> wrote: > > I agree 95% with Jim and Conor. > > > > My 5% reservation is that for each tool-chain, there are some kinds of > content that are natural and easy to express, and other kinds of content > that are difficult and imperspicuous to express (is that a word?). > > > > Even this is not in itself a problem, except that it tends to make > architects favor some kinds of conceptualization and shun other kinds of > conceptualization, not on the merits, but because that's what's easy to > express in the given tool. > > > > For example, the fact that the decision was made to serialize all RIM-based > artifacts as XSD-valid XML meant that hierarchical modeling rather than > directed-graph modeling tended to be used in practice. (Even though the RIM > expressed as a Visio model has more in common with a directed-graph.) It > meant that derivation by restriction was made the favored extensibility > mechanism. > > > > These choices may not have been the most auspicious for the kind of > conceptualizations that needed to be expressed. None of these things are > "necessary" consequences of using XSD-valid XML as your language Rather, > they are the results that you tend to get in practice because the tool has > so much influence on the style that ends up, in practice, being > used. (id/idref//key/keyrefs are certainly part of XSD/XML, and make it > possible to express non-hierarchical relations, but where in any HL7 > artifact do you ever see key/keyref being used?? SImilarly, it is possible > to derive by extension in XSD, but the spec makes it less easy than deriving > by restriction). > > > > Or again, the fact that OIDs rather than http URLs were chosen as the > identifier of choice isn't in any way dispositive of whether you will be > tend to architect with RPC or REST principles in mind. (OIDs and http URLs > are actually quite interconvertible.) But I'd argue that if you are a person > who tends to think using http URLs, you'll more likely gravitate to REST > solutions out of the gate. > > > > So, I agree, what's important is the deep content, not the choice of > serialization of that content. But a bad serialization choice, coupled with > bad tools, can leave architects wandering in the wilderness for a long time. > So long, sometimes, that they lose track of what the deep conceptualization > was supposed to have been in the first place. > > > > > > > > On Aug 22, 2011, at 9:39 AM, Jim McCusker wrote: > > > > I was just crafting a mail about how our investment in XML technologies > hasn't paid off when this came in. What he said. :-) > > On Mon, Aug 22, 2011 at 9:33 AM, conor dowling <conor-dowling@caregraf.com> > wrote: > > >> The content matters, the format does not. > > > > should be front and center. Talk of XML that or JSON this, of RDF as XML in > a chain is a distraction - it's just plumbing. There are many tool-chains > and implementors are big boys - they can graze the buffet themselves. > > > > Central to any patient model rework (I hope) would be the interplay of > formal specifications for terminologies like SNOMED along with any patient > data information model. What should go in the terminology concept (the > "object" in RDF terms) - what is left in the model (the "predicate"). Right > now, this interplay is woefully under specified and implementors throw just > about any old concept into "appropriate" slots in RIM (I know this from > doing meaningful use tests: > http://www.caregraf.com/blog/being-allergic-to-allergies, > http://www.caregraf.com/blog/there-once-was-a-strawberry-allergy ) BTW, if > SNOMED is the terminology of choice (for most) then the dance of it and any > RIM-2 should drive much of RIM-2's form. > > > > This is a chance to get away from a fixation on formats/plumbing/"the > trucks for data" and focus on content and in that focus to consider every > aspect of expression, not just the verbs (RIM) or the objects (SNOMED) but > both. > > > > Back to "forget the plumbing": if you want to publish a patient's data as > an RDF graph or relational tables or you want a "document" to send on a > wire, if you want to query with a custom protocol or use SPARQL or SQL, you > should be able to and not be seen as an outlier. Each can be reduced to > equivalents in other formats for particular interoperability. The problem > right now is that so much time is spent talking about these containers and > working between them and too little time is given over to what they contain, > > > > Conor > > > > On Mon, Aug 22, 2011 at 6:01 AM, Hau, Dave (NIH/NCI) [E] < > haudt@mail.nih.gov> wrote: > > I see what you're saying and I agree. > > > > The appeal of XML (i.e. XML used with an XSD representing model syntactics, > not XML used as a serialization as in RDF/XML) is due in part to: > > > > - XML schema validation API is available on virtually all platforms e.g. > Java, Javascript, Google Web Toolkit, Android etc. > > - XML schema validation is relatively lightweight computationally. Pellet > ICV and similar mechanisms are more complete in their validation with the > model, but much more computationally expensive unless you restrict yourself > to a small subset of OWL which then limits the expressiveness of the > modeling language. > > - XML provides a convenient bridge from models such as OWL to relational > databases e.g. via JAXB or Castor to Java objects to Hibernate to any RDB. > > - Relational querying and XML manipulation skills are much more plentiful > in the market than SPARQL skills currently. > > - Some of the current HL7 artifacts are expressed in XSD format, such as > their datatypes (ISO 21090 ; although there are alternative representations > such as UML, and there is an abstract spec too from HL7). If we operate > with OWL and RDF exclusively, would need to convert these datatypes into > OWL. > > > > Maybe it'd be worthwhile to get a few of us who are interested in this > topic together, with some of the HL7 folks interested, and have a few calls > to flush this out and maybe write something up? > > > > - Dave > > > > > > > > > > *From:* Jim McCusker [mailto:james.mccusker@yale.edu] > *Sent:* Sunday, August 21, 2011 6:12 PM > *To:* Hau, Dave (NIH/NCI) [E] > *Cc:* public-semweb-lifesci@w3.org > *Subject:* Re: FW: A Fresh Look Proposal (HL7) > > > > I feel I need to cut to the chase with this one: XML schema cannot validate > semantic correctness. > > > > It can validate that XML conforms to a particular schema, but that is > syntactic. The OWL validator is nothing like a schema validator, first it > produces a closure of all statements that can be inferred from the asserted > information. This means that if a secondary ontology is used to describe > some data, and that ontology integrates with the ontology that you're > attempting to validate against, you will get a valid result. An XML schema > can only work with what's in front of it. > > > > Two, there are many different representations of information that go beyond > XML, and it should be possible to validate that information without anything > other than a mechanical, universal translation. For instance, there are a > few mappings of RDF into JSON, including JSON-LD, which looks the most > promising at the moment. Since RDF/XML and JSON-LD both parse to the same > abstract graph, there is a mechanical transformation between them. When > dealing with semantic validity, you want to check the graph that is parsed > from the document, not the document itself. > > > > The content matters, the format does not. For instance, let me define a new > RDF format called RDF/CSV: > > > > First column is the subject. First row is the predicate. All other cell > values are objects. URIs that are relative are relative to the document, as > in RDF/XML. > > > > I can write a parser for that in 1 hour and publish it. It's genuinely > useful, and all you would have to do to read and write it is to use my > parser or write one yourself. I can then use the parser, paired with Pellet > ICV, and validate the information in the file without any additional work > from anyone. > > > > Maybe we need a simplified XML representation for RDF that looks more like > regular XML. But to make a schema for an OWL ontology is too much work for > too little payoff. > > > > Jim > > On Sun, Aug 21, 2011 at 5:45 PM, Hau, Dave (NIH/NCI) [E] < > haudt@mail.nih.gov> wrote: > > Hi all, > > > > As some of you may have read, HL7 is rethinking their v3 and doing some > brainstorming on what would be a good replacement for a data exchange > paradigm grounded in robust semantic modeling. > > > > On the following email exchange, I was wondering, if OWL is used for > semantic modeling, are there good ways to accomplish the following: > > > > 1. Generate a wire format schema (for a subset of the model, the subset > they call a "resource"), e.g. XSD > > > > 2. Validate XML instances for conformance to the semantic model. (Here > I'm reminded of Clark and Parsia's work on their Integrity Constraint > Validator: http://clarkparsia.com/pellet/icv ) > > > > 3. Map an XML instance conformant to an earlier version of the "resource" > to the current version of the "resource" via the OWL semantic model > > > > I think it'd be great to get a semantic web perspective on this fresh look > effort. > > > > Cheers, > > Dave > > > > > > > > Dave Hau > > National Cancer Institute > > Tel: 301-443-2545 > > Dave.Hau@nih.gov > > > > > > > > *From:* owner-its@lists.hl7.org [mailto:owner-its@lists.hl7.org] *On > Behalf Of *Lloyd McKenzie > *Sent:* Sunday, August 21, 2011 12:07 PM > *To:* Andrew McIntyre > *Cc:* Grahame Grieve; Eliot Muir; Zel, M van der; HL7-MnM; RIMBAA; HL7 ITS > *Subject:* Re: A Fresh Look Proposal > > > > Hi Andrew, > > > > Tacking stuff on the end simply doesn't work if you're planning to use XML > Schema for validation. (Putting new stuff in the middle or the beginning > has the same effect - it's an unrecognized element.) The only alternative > is to say that all changes after "version 1" of the specification will be > done using the extension mechanism. That will create tremendous analysis > paralysis as we try to get things "right" for that first version, and will > result in increasing clunkiness in future versions. Furthermore, the > extension mechanism only works for the wire format. For the RIM-based > description, we still need proper modeling, and that can't work with "stick > it on the end" no matter what. > > > > That said, I'm not advocating for the nightmare we currently have with v3 > right now. > > > > I think the problem has three parts - how to manage changes to the wire > format, how to version resource definitions and how to manage changes to the > semantic model. > > > > Wire format: > > If we're using schema for validation, we really can't change anything > without breaking validation. Even making an existing non-repeating element > repeat is going to cause schema validation issues. That leaves us with two > options (if we discount the previously discussed option of "get it right the > first time and be locked there forever": > > 1. Don't use schema > > - Using Schematron or something else could easily allow validation of the > elements that are present, but ignore all "unexpected" elements > > - This would cause significant pain for implementers who like to use schema > to help generate code though > > > > 2. Add some sort of a version indicator on new content that allows a > pre-processor to remove all "new" content if processing using an "old" > handler > > - Unpleasant in that it involves a pre-processing step and adds extra > "bulk" to the instances, but other than that, quite workable > > > > I think we're going to have to go with option #2. It's not ideal, but is > still relatively painless for implementers. The biggest thing is that we > can insist on "no breaking x-path changes". We don't move stuff between > levels in a resource wire format definition or rename elements in a resource > wire format definition. In the unlikely event we have to deprecate the > entire resource and create a new version. > > > > Resource versioning: > > At some point, HL7 is going to find at least one resource where we blew it > with the original design and the only way to create a coherent wire format > is to break compatibility with the old one. This will then require > definition of a new resource, with a new name that occupies the same > semantic space as the original. I.e. We'll end up introducing "overlap". > Because overlap will happen, we need to figure out how we're going to deal > with it. I actually think we may want to introduce overlap in some places > from the beginning. Otherwise we're going to force a wire format on > implementers of simple community EMRs that can handle prescriptions for > fully-encoded chemo-therapy protocols. (They can ignore some of the data > elements, but they'd still have to support the full complexity of the nested > data structures.) > > > > I don't have a clear answer here, but I think we need to have a serious > discussion about how we'll handle overlap in those cases where it's > necessary, because at some point it'll be necessary. If we don't figure out > the approach before we start, we can't allow for it in the design. > > > > All that said, I agree with the approach of avoiding overlap as much as > humanly possible. For that reason, I don't advocate calling the Person > resource "Person_v1" or something that telegraphs we're going to have new > versions of each resource eventually (let alone frequent changes). > Introduction of a new version of a resource should only be done when the > pain of doing so is outweighed by the pain of trying to fit new content in > an old version, or requiring implementers of the simple to support the > structural complexity of our most complex use-cases. > > > > > > Semantic model versioning: > > This is the space where "getting it right" the first time is the most > challenging. (I think we've done that with fewer than half of the normative > specifications we've published so far.) V3 modeling is hard. The positive > thing about the RFH approach is that very few people need to care. We could > totally refactor every single resource's RIM-based model (or even remove > them entirely), and the bulk of implementers would go on merrily exchanging > wire syntax instances. However, that doesn't mean the RIM-based > representations aren't important. They're the foundation for the meaning of > what's being shared. And if you want to start sharing at a deeper level > such as RIMBAA-based designs, they're critical. This is the level where OWL > would come in. If you have one RIM-based model structure, and then need to > refactor and move to a different RIM-based model structure, you're going to > want to map the semantics between the two structures so that anyone who was > using the old structure can manage instances that come in with the new > structure (or vice versa). OWL can do that. And anyone who's got a complex > enough implementation to parse the wire format and trace the elements > through the their underlying RIM semantic model will likely be capable of > managing the OWL mapping component as well. > > > > > > In short, I think we're in agreement that separation of wire syntax and > semantic model are needed. That will make model refactoring much easier. > However we do have to address how we're going to handle wire-side and > resource refactoring too. > > > > > > Lloyd > > -------------------------------------- > Lloyd McKenzie > > +1-780-993-9501 > > > > Note: Unless explicitly stated otherwise, the opinions and positions > expressed in this e-mail do not necessarily reflect those of my clients nor > those of the organizations with whom I hold governance positions. > > On Sun, Aug 21, 2011 at 7:53 AM, Andrew McIntyre < > andrew@medical-objects.com.au> wrote: > > Hello Lloyd, > > While "tacking stuff on the end" in V2 may not at first glance seem like an > elegant solution I wonder if it isn't actually the best solution, and one > that has stood the test of time. The parsing rules in V2 do make version > updates quite robust wrt backward and forward inter-operability. > > I am sure it could be done with OWL but I doubt we can switch the world to > using OWL in any reasonable time frame and we probably need a less abstract > representation for commonly used things. In V2 OBX segments, used in a > hierarchy can create an OWL like object-attribute structure for information > that is not modeled by the standard itself. > > I do think the wire format and any overlying model should be distinct > entities so that the model can be evolved and the wire format be changed in > a backward compatible way, at least for close versions. > > I also think that the concept of templates/archetypes to extend the model > should not invalidate the wire format, but be a metadata layer over the wire > format. This is what we have done in Australia with an ISO 13606 Archetypes > in V2 projects. I think we do need a mechanism for people to develop > templates to describe hierarchical data and encode that in the wire format > in a way that does not invalidate its vanilla semantics (ie non templated V2 > semantics) when the template mechanism is unknown or not implemented. > > In a way the V2 specification does hit at underlying objects/Interfaces, > and there is a V2 model, but it is not prescriptive and there is no > requirement for systems to use the same internal model as long as they use > the bare bones V2 model in the same way. Obviously this does not always work > as well as we would like, even in V2, but it does work well enough to use it > for quite complex data when there are good implementation guides. > > If we could separate the wire format from the clinical models then the 2 > can evolve in their own way. We have done several trial implementations of > Virtual Medical Record Models (vMR) which used V3 datatypes and RIM like > classes and could build those models from V2 messages, or in some cases non > standard Web Services, although for specific clinical classes did use ISO > 13606 archetypes to structure the data in V2 messages. > > I think the dream of having direct model serializations as messages is > flawed for all the reasons that have made V3 impossible to implement in the > wider world. While the tack it on the end, lots of optionality rationale > might seem clunky, maybe its the best solution to a difficult problem. If we > define tight SOAP web services for everything we will end up with thousands > of slightly different SOAP calls for every minor variation and I am not sure > this is the path to enlightenment either. > > I am looking a Grahams proposal now, but I do wonder if the start again > from scratch mentality is not part of the problem. Perhaps that is a lesson > to be learned from the V3 process. Maybe the problem is 2 complex to solve > from scratch and like nature we have to evolve and accept there is lots of > junk DNA, but maintaining a working standard at all times is the only way to > avoid extinction. > > I do like the idea of a cohesive model for use in decision support, and > that's what the vMR/GELLO is about, but I doubt there will ever be a one > size fits all model and any model will need to evolve. Disconnecting the > model from the messaging, with all the pain that involves, might create a > layered approach that might allow the HL7 organism to evolve gracefully. I > do think part of the fresh look should be education on what V2 actually > offers, and can offer, and I suspect many people in HL7 have never seriously > looked at it in any depth. > > Andrew McIntyre > > > > Saturday, August 20, 2011, 4:37:37 AM, you wrote: > > Hi Grahame, > > Going to throw some things into the mix from our previous discussions > because I don't see them addressed yet. (Though I admit I haven't reread > the whole thing, so if you've addressed and I haven't seen, just point me at > the proper location.) > > One of the challenges that has bogged down much of the v3 work at the > international level (and which causes a great deal of pain at the > project/implementation level) is the issue of refactoring. The pain at the > UV level comes from the fact that we have a real/perceived obligation to > meet all known and conceivable use-cases for a particular domain. For > example, the pharmacy domain model needs to meet the needs of clinics, > hospitals, veterinarians, and chemotherapy protocols and must support the > needs of the U.S., Soviet union and Botswana. To make matters more > interesting, participation from the USSR and Botswana is a tad light. > However the fear is that if all of these needs aren't taken into account, > then when someone with those needs shows up at the door, the model will need > to undergo substantive change, and that will break all of the existing > systems. > > The result is a great deal of time spent gathering requirements and > refactoring and re-refactoring the model as part of the design process, > together with a tendency to make most, if not all data elements optional at > the UV level. A corollary is that the UV specs are totally unimplementable > in an interoperable fashion. The evil of optionality that manifested in v2 > that v3 was going to banish turned out to not be an issue of the standard, > but rather of the issues with creating a generic specification that > satisfies global needs and a variety of use-cases. > > The problem at the implementer/project level is that when you take the UV > model and tightly constrain it to fit your exact requirements, you discover > 6 months down the road that one or more of your constraints was wrong and > you need to loosen it, or you have a new requirement that wasn't thought of, > and this too requires refactoring and often results in wire-level > incompatibilities. > > One of the things that needs to be addressed if we're really going to > eliminate one of the major issues with v3 is a way to reduce the fear of > refactoring. Specifically, it should be possible to totally refactor the > model and have implementations and designs work seemlessly across versions. > > I think putting OWL under the covers should allows for this. If we can > assert equivalencies between data elements in old and new models, or even > just map the wire syntaxes of old versions to new versions of the definition > models, then this issue would be significantly addressed: > - Committees wouldn't have to worry about satisfying absolutely every > use-case to get something useful out because they know they can make changes > later without breaking everything. (They wouldn't even necessarily have to > meet all the use-cases of the people in the room! :>) > - Realms and other implementers would be able to have an interoperability > path that allowed old wire formats to interoperate with new wireformats > through the aid of appropriate tooling that could leverage the OWL under the > covers. (I think creating such tooling is *really* important because > version management is a significant issue with v3. And with XML and > schemas, the whole "ignore everything on the end you don't recognize" from > v2 isn't a terribly reasonable way forward. > > I think it's important to figure out exactly how refactoring and version > management will work in this new approach. The currently proposed approach > of "you can add stuff, but you can't change what's there" only scales so > far. > > > I think we *will* need to significantly increase the number of Resources > (from 30 odd to a couple of hundred). V3 supports things like invoices, > clinical study design, outbreak tracking and a whole bunch of other > healthcare-related topics that may not be primary-care centric but are still > healthcare centric. That doesn't mean all (or even most) systems will need > to deal with them, but the systems that care will definitely need them. The > good news is that most of these more esoteric areas have responsible > committees that can manage the definition of these resources, and as you > mention, we can leverage the RMIMs and DMIMs we already have in defining > these structures. > > > The specification talks about robust capturing of requirements and > traceability to them, but gives no insight into how this will occur. It's > something we've done a lousy job of with v3, but part of the reason for that > is it's not exactly an easy thing to do. The solution needs to flesh out > exactly how this will happen. > > > We need a mapping that explains exactly what's changed in the datatypes > (and for stuff that's been removed, how to handle that use-case). > > There could still be a challenge around granularity of text. As I > understand it, you can have a text representation for an attribute, or for > any XML element. However, what happens if you have a text blob in your > interface that covers 3 of 7 attributes inside a given XML element. You > can't use the text property of the element, because the text only covers 3 > of 7. You can't use the text property of one of the attributes because it > covers 3 separate attributes. You could put the same text in each of the 3 > attributes, but that's somewhat redundant and is going to result in > rendering issues. One solution might be to allow the text specified at the > element level to identify which of the attributes the text covers. A > rendering system could then use that text for those attributes, and then > render the discrete values of the remaining specified attributes. What this > would mean is that an attribute might be marked as "text" but not have text > content directly if the parent element had a text blob that covered that > attribute. > > > > New (to Grahame) comments: > > I didn't see anything in the HTML section or the transaction section on how > collisions are managed for updates. A simple requirement (possibly > optional) to include the version id of the resource being updated or deleted > should work. > > To my knowledge, v3 (and possibly v2) has never supported true "deletes". > At best, we do an update and change the status to nullified. Is that the > intention of the "Delete" transaction, or do we really mean a true "Delete"? > Do we have any use-cases for true deletes? > > I wasn't totally clear on the context for uniqueness of ids. Is it within > a given resource or within a given base URL? What is the mechanism for > referencing resources from other base URLs? (We're likely to have networks > of systems that play together.) > > Nitpick: I think "id" might better be named "resourceId" to avoid any > possible confusion with "identifier". I recognize that from a coding > perspective, shorter is better. However, I think that's outweightd by the > importance of avoiding confusion. > > In the resource definitions, you repeated definitions for resources > inherited from parent resources. E.g. Person.created inherited from > Resource.Base.created. Why? That's a lot of extra maintenance and > potential for inconsistency. It also adds unnecessary volume. > > Suggest adding a caveat to the draft that the definitions are placeholders > and will need significant work. (Many are tautological and none meet the > Vocab WG's guidelines for quality definitions.) > > Why is Person.identifier mandatory? > > You've copied "an element from Resource.Base.???" to all of the Person > attributes, including those that don't come from Resource.Base. > > Obviously the workflow piece and the conformance rules that go along with > it need some fleshing out. (Looks like this may be as much fun in v4 as it > has been in v3 :>) > > The list of identifier types makes me queasy. It looks like we're > reintroducing the mess that was in v2. Why? Trying to maintain an ontology > of identifier types is a lost cause. There will be a wide range of > granularity requirements and at fine granularity, there will be 10s of > thousands. The starter list is pretty incoherent. If you're going to have > types at all, the vocabulary should be constrained to a set of codes based > on the context in which the real-world identifier is present. If there's no > vocabulary defined for the property in that context, then you can use text > for a label and that's it. > > I didn't see anything on conformance around datatypes. Are we going to > have datatype flavors? How is conformance stated for datatype properties? > > I didn't see templateId or flavorId or any equivalent. How do instances > (or portions there-of) declare conformance to "additional" constraint > specifications/conformance profiles than the base one for that particular > server? > > We need to beef up the RIM mapping portion considerably. Mapping to a > single RIM class or attribute isn't sufficient. Most of the time, we're > going to need to map to a full context model that talks about the > classCodes, moodCodes and relationships. Also, you need to relate > attributes to the context of the RIM location of your parent. > > There's no talk about context conduction, which from an implementation > perspective is a good thing. However, I think it's still needed behind the > scenes. Presumably this would be covered as part of the RIM semantics > layer? > > In terms of the "validate" transaction, we do a pseudo-validate in > pharmacy, but a 200 response isn't sufficient. We can submit a draft > prescription and say "is this ok?". The response might be as simple as > "yes" (i.e. a 200). However, it could also be a "no" or "maybe" with a list > of possible contraindications, dosage issues, allergy alerts and other > detected issues. How would such a use-case be met in this paradigm? > > At the risk of over-complicating things, it might be useful to think about > data properties as being identifying or not to aid in exposing resources in > a de-identified way. (Not critical, just wanted to plant the seed in your > head about if or how this might be done.) > > > All questions and comments aside, I definitely in favour of fleshing out > this approach and looking seriously at moving to it. To that end, I think > we need a few things: > - A list of the open issues that need to be resolved in the new approach. > (You have "todo"s scattered throughout. A consolidated list of the "big" > things would be useful.) > - An analysis of how we move from existing v3 to the new approach, both in > terms of leveraging existing artifacts and providing a migration path for > existing solutions as well as what tools, etc. we need. > - A plan for how to engage the broader community for review. (Should > ideally do this earlier rather than later.) > > Thanks to you, Rene and others for all the work you've done. > > > Lloyd > > -------------------------------------- > Lloyd McKenzie > > +1-780-993-9501 > > > > Note: Unless explicitly stated otherwise, the opinions and positions > expressed in this e-mail do not necessarily reflect those of my clients nor > those of the organizations with whom I hold governance positions. > > > On Fri, Aug 19, 2011 at 9:08 AM, Grahame Grieve <grahame@kestral.com.au > > > wrote: > > > hi All > > Responses to comments > > #Michael > > > 1. I would expect more functional interface to use these resources. > > as you noted in later, this is there, but I definitely needed to make > more of it. That's where I ran out of steam > > > 2. One of the things that was mentioned (e.g. at the Orlando > > WGM RIMBAA Fresh Look discussion) is that we want to use > > industry standard tooling, right? Are there enough libraries that > > implement REST? > > this doesn't need tooling. There's schemas if you want to bind to them > > > 2b. A lot of vendors now implement WebServices. I think we should > > go for something vendors already have or will easilly adopt. Is that the > case with REST? > > Speaking as a vendor/programmer/writer of an open source web services > toolkit, I prefer REST. Way prefer REST > > > Keep up the good work! > > ta > > #Mark > > > I very much like the direction of this discussion towards web services > > and in particular RESTful web services. > > yes, though note that REST is a place to start, not a place to finish. > > > At MITRE we have been advocating this approach for some time with our > hData initiative. > > yes. you'll note my to do: how does this relate to hData, which is a > higher level > specification than the CRUD stuff here. > > #Eliot > > > Hats off - I think it's an excellent piece of work and definitely a step > in right direction. > > thanks. > > > I didn't know other people in the HL7 world other than me were talking > about > > (highrise). Who are they? > > not in Hl7. you were one. it came up in some other purely IT places that I > play > > > 5) Build it up by hand with a wiki - it is more scalable really since > you > > wiki's have their problems, though I'm not against them. > > > 1) I think it would be better not to use inheritance to define a patient > as > > a sub type of a person. The trouble with that approach is that people > can > > On the wire, a patient is not a sub type of person. The relationship > between the two is defined in the definitions. > > > A simpler approach is associate additional data with a person if and when > > they become a patient. > > in one way, this is exactly what RFH does. On the other hand, it creates a > new identity for the notion of patient (for integrity). We can discuss > whether that's good or bad. > > > 2) I'd avoid language that speaks down to 'implementers'. It's > enterprise > > really? Because I'm one. down the bottom of your enterprise pole. And > I'm happy to be one of those stinking implementers down in the mud. > I wrote it first for me. But obviously we wouldn't want to cause offense. > I'm sure I haven't caused any of that this week ;-) > > > 3) If you want to reach a broader audience, then simplify the language. > > argh, and I thought I had. how can we not use the right terms? But I > agree that the introduction is not yet direct enough - and that's after > 4 rewrites to try and make it so.... > > Grahame > > > ************************************************ > To access the Archives of this or other lists or change your list settings > and information, go to: > > http://www.hl7.org/listservice > > > > ************************************************ > To access the Archives of this or other lists or change your list settings > and information, go to: http://www.hl7.org/listservice > > > > > > *-- > Best regards, > Andrew *mailto:andrew@Medical-Objects.com.au<andrew@Medical-Objects.com.au> > > *sent from a real computer* > > > > > > ************************************************ > > To access the Archives of this or other lists or change your list settings and information, go to: http://www.hl7.org/listservice > > > > > > -- > Jim McCusker > Programmer Analyst > Krauthammer Lab, Pathology Informatics > Yale School of Medicine > james.mccusker@yale.edu | (203) 785-6330 > http://krauthammerlab.med.yale.edu > > PhD Student > Tetherless World Constellation > Rensselaer Polytechnic Institute > mccusj@cs.rpi.edu > http://tw.rpi.edu > > > > > > > > -- > Jim McCusker > Programmer Analyst > Krauthammer Lab, Pathology Informatics > Yale School of Medicine > james.mccusker@yale.edu | (203) 785-6330 > http://krauthammerlab.med.yale.edu > > PhD Student > Tetherless World Constellation > Rensselaer Polytechnic Institute > mccusj@cs.rpi.edu > http://tw.rpi.edu > > > > > > > > > > > -- Helena F. Deus Post-Doctoral Researcher at DERI/NUIG http://lenadeus.info/
Received on Friday, 2 September 2011 14:41:49 UTC