RE: A Fresh Look Proposal (HL7) from Michael Miller on 2011-09-02 (public-semweb-lifesci@w3.org from September 2011)

From: Michael Miller <Michael.Miller@systemsbiology.org>
Date: Fri, 2 Sep 2011 07:25:30 -0700
To: "Hau, Dave (NIH/NCI) [E]" <haudt@mail.nih.gov>, Helena Deus <helenadeus@gmail.com>
Cc: Jim McCusker <james.mccusker@yale.edu>, John Madden <john.madden@duke.edu>, public-semweb-lifesci@w3.org, conor dowling <conor-dowling@caregraf.com>
Message-ID: <68ce26741f588facfba76fa9bddc3734@mail.gmail.com>
hi dave,



thanks for the info.



'http://tcga.s3db.org'



the web site no longer seems viable.  it looks like it went up in 2008 so
could be out of date.  it just spins on 'loading domain description' and
eventually errors out.  i tried the links and they didn't bring anything up.



lena, you are listed as one of the co-authors of the page, any clue?



cheers,

michael





*From:* Hau, Dave (NIH/NCI) [E] [mailto:haudt@mail.nih.gov]
*Sent:* Thursday, September 01, 2011 10:00 AM
*To:* Michael Miller; conor dowling
*Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org
*Subject:* RE: A Fresh Look Proposal (HL7)



Last week NCI published a list of 24 provocative questions and corresponding
RFA's (funding announcements):



http://provocativequestions.nci.nih.gov/rfa



I'd like to encourage everyone to review this list, and see if there's any
question(s) we could work on collaboratively in the coming year, to see how
well we could tackle them from an informatics perspective using semantic web
technology, esp. in the context of our discussion on integrating semantics
between life science and clinical research and care.



In the process, if we could make use of the TCGA data set (e.g. via the
SPARQL endpoint: http://tcga.s3db.org), or other datasets or reference
domain ontologies, that would be great.



I think in harmonizing life science and clinical semantics, focusing on such
"rubber meets the road" kind of use cases would help ground our discussion
solidly in real science.



- Dave









*From:* Michael Miller [mailto:Michael.Miller@systemsbiology.org]
*Sent:* Wednesday, August 31, 2011 3:25 PM
*To:* Hau, Dave (NIH/NCI) [E]; conor dowling
*Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org
*Subject:* RE: A Fresh Look Proposal (HL7)



hi all,



conor, excellent points in your last email.



"...is an increase in the interdependencies and overlaps between the
information model and the terminology "



my experience has been that this doesn't necessarily have to happen.  just
as linked data experience has shown, one can reason across ontology
boundaries without prior knowledge of the links.



"...to reason on the information model and the value set together to
determine the right values for a particular field..."



one can do this without the information model having knowledge of the value
set.  the information model sets the meta-expectation of what is expected
and then the value set can be examined for the best possible fit without
there needing to be knowledge of the value set by the information model.
one can, of course, couple an information model to a terminology but i
believe that is bad modeling and with a little more effort can be avoided.



"If the information model could be expressed in a language that supports
reasoning..."



yes, this would allow the statement above to be computable.



cheers,

michael





*From:* Hau, Dave (NIH/NCI) [E] [mailto:haudt@mail.nih.gov]
*Sent:* Wednesday, August 31, 2011 11:58 AM
*To:* Michael Miller; conor dowling
*Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org
*Subject:* RE: A Fresh Look Proposal (HL7)



Michael I agree with you and I see where Conor is coming from too.  I agree
the information model should be decoupled from the value set to a certain
extent, so each can evolve on its own and sustain over time.



OTOH, I see what Conor meant by being able to reason on the information
model and the value set together to determine the right values for a
particular field.  SNOMED in particular, has a flexible grammar that allows
a wide variety of post-coordinated expressions, so it would be quite
impossible to exhaustively list out all allowed values as in an extensional
definition of a value set.



I think this is exactly where CTS2 comes in, in terms of improving CTS so
that value sets can be more computable with reasoning.  The distinction
between the intensional and extensional definition of a value set would be
very useful in this regard, because the intensional definition if defined in
a computable way, can certainly be used to accomplish the above.



If the information model could be expressed in a language that supports
reasoning, that would be even better because now you can reason across the
field, the intensional value set, and the particular value a user has
chosen.



- Dave









*From:* Michael Miller [mailto:Michael.Miller@systemsbiology.org]
*Sent:* Tuesday, August 30, 2011 4:56 PM
*To:* conor dowling
*Cc:* Hau, Dave (NIH/NCI) [E]; Jim McCusker; John Madden;
public-semweb-lifesci@w3.org
*Subject:* RE: A Fresh Look Proposal (HL7)



hi conor,



i think this discussion has been missing the point about how a standard is
developed and its relationship to ontologies/vocabularies that will be used
for it.



for an EHR, for instance, when a DAM is developed, what is important are the
high level details such as 'patient', 'illness', 'disease state', not how
one will record those details.  more important is the relationship between
the high level details.  currently, in HL7, a flavor of UML is used, that's
not to say an ontology could equally well be used but it would still be at
this higher level.  and even tho an ontology could be used for the modeling,
the amount of impedance and change to the bylaws of HL7 probably precludes
that, altho a companion ontology to the UML could be, but it would not be
normative.



even when one goes to the implementation, RMIM, level, it is still important
that the specific ontologies/vocabularies like SNOMED, gene symbols, etc,
are loosely coupled to the standard.  it's been shown that this makes the
standard more robust because as time moves on, new vocabularies are created
or the standard is used in another area where there are more appropriate
vocabularies the original creators of the standard weren't aware of.  so the
paradigm of having a place for the term is always accomplied by a place to
say from what vocabulary the term came from.  (there are some places in the
CG standards that specify LOINC codes but if you look, there is always an
out to use some other vocabulary if desired)



that said, this discussion to me isn't about HL7, it's about the proper way
to use SNOMED regardless of where it is used.  it's just that Dave is
interested in HL7 and so HL7 was the example at hand.  right now there are
some v2 standards from CG that are being used (by harvard medical amongst
others) so i agree it is important to make these issues with the use of
ontologies/vocabularies part of the discussion now but it is important to
understand its place in the discussion.  i would hazard a guess that the
applications that produce HL7 formatted documents, for the most part, do not
deal directly with the vocabularies but are reading the values from a
database where when the test was entered into the database, the term came
from a drop down list or some such.  so it's not clear to me where the
target audience for getting SNOMED right is other than its probably not in
HL7 standards.



cheers,

michael







*From:* conor dowling [mailto:conor-dowling@caregraf.com]
*Sent:* Friday, August 26, 2011 2:43 PM
*To:* Michael Miller
*Cc:* Hau, Dave (NIH/NCI) [E]; Jim McCusker; John Madden;
public-semweb-lifesci@w3.org
*Subject:* Re: A Fresh Look Proposal (HL7)





 "I think a SNOMED capable DAM should limit the coordination allowed."



... using SNOMED as your terminology is an implementation detail.



Michael,



one problem with leaving it to implementation is the variety allowed in a
concept scheme like SNOMED. Take a disorder like
Hypercholesterolemia<http://datasets.caregraf.org/snomed#!13644009>:
and a patient record with ...



              :finding snomed:13644009 # Hypercholesterolemia



another description of the same thing has ...



              :finding snomed:166830008 # Serum cholesterol raised



which is effectively equivalent. The "bridge" is ...



              snomed:13644009 snomed:363705008 snomed:166830008 (More
here<http://www.caregraf.com/blog/the-problem-with-picking-problems>
)

              # *Hypercholesterolemia* *has definitional manifestation* *Serum
cholesterol raised*.



the question is where the bridge goes. Is "has definitional manifestation"
defined consistently with the predicate "finding" or is it part of a
completely separate concept model and never bought into play by one
application?



To me, all of this information goes into one "soup" - in linked data, you
have *one big graph of* medical expression. I don't see the point in
separate *media* for "statements about conditions" and "statements about
condition types".



If in practice - well it's recommended - patient records use SNOMED then
tying down that expression should be front and center of any clinical-care
modeling effort. To be useful and implementable, we can't say "use any
scheme you want" because that's equivalent to saying "you can only do
trivial reasoning on this information".



Conor





*From:* conor dowling [mailto:conor-dowling@caregraf.com]
*Sent:* Wednesday, August 24, 2011 3:26 PM


*To:* Hau, Dave (NIH/NCI) [E]

*Cc:* Michael Miller; Jim McCusker; John Madden;
public-semweb-lifesci@w3.org


*Subject:* Re: A Fresh Look Proposal (HL7)



DAM: it's good to have a name. Were OWL to be used for them and then other
forms derived from that, you'd get the best of both worlds - get into
Semantics and move on.



One other nuance to throw in for the "model-terminology" match up. SNOMED
raises a concern about the degree of "concept coordination" you should or
should not do, about what load the terminology should take and what should
be left to the model. A simple example is do you allow "disorder: allergy to
strawberry" or do you make the model carry "disorder: allergy + allergin:
strawberry" or do you allow both expressions? (see:
http://www.caregraf.com/blog/there-once-was-a-strawberry-allergy)



I think a SNOMED capable DAM should limit the coordination allowed. It
should make the model carry qualifiers for severity, for progression, for
allergin ... To use it, you would need to normalize these "adjectives" out
of any concept.



I suppose what I'm saying is that any useful DAM should severely limit
alternatives, in a way that goes beyond simple enumerations of permitted
values and the nice thing about concept schemes like SNOMED is that this
shouldn't be hard to do - crudely in SNOMED it would mean only allowing
primitive concepts, the atoms from which compound concepts are made.



BTW, this doesn't effect what a doctor sees on a screen - it's a matter of
what expressions to use for interoperability. The two issues need to be
strictly separated and right now, if you look at how CCDs are viewed,
they're thick as thieves,



Conor

On Wed, Aug 24, 2011 at 2:49 PM, Hau, Dave (NIH/NCI) [E] <haudt@mail.nih.gov>
wrote:

> the kind of reasoning, i think, that you want to do, conor, would run on
top of the information in the HL7 v3 formatted documents to take advantage
of, among other things, the linked data cloud.



Agree.  Earlier there was a discussion in HL7 on their Domain Analysis Model
(DAM) effort - what exactly is a DAM and what it's supposed to do.  I think
one possible approach would be to consider these DAMs as ontologies (i.e.
conceptual models, knowledge), use OWL in the normative version of these
DAMs, then to develop UML models and XSDs from the DAMs to use in
applications.  The DAMs can be harmonized with other domain ontologies out
there, and promoted for global adoption.  The UML models can be encouraged
but not as strictly enforced, while alternatively allowing people to use RDF
to tie data directly to concepts in the ontologies / DAMs.



- Dave









*From:* Michael Miller [mailto:Michael.Miller@systemsbiology.org]
*Sent:* Wednesday, August 24, 2011 11:12 AM
*To:* conor dowling; Hau, Dave (NIH/NCI) [E]


*Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org

*Subject:* RE: A Fresh Look Proposal (HL7)



hi all,



john, very well laid out argument in your email and what i've found in
practice (and didn't think that consciously about until i read your email).



conor, i agree with your points.   but i find it interesting that OWL is
expressed as XML for communication reasons.  XML has become pretty much the
de facto standard for 'trading' information.  it's how MAGE-ML was used by
the gene expression application i worked on at Rosetta to do import and
export.  but the storage and presentation of the information was certainly
not XML, the analysis of the data would take forever.  the trick is to make
very clear what the extra semantics are and that is well understood for OWL
as XML.  when someone wants to use an ontology they've received as an XML
document, the first thing to do is transform the information in the XML so
that the logic can be run easily (this gets back to john's points)



one thing the  clinical genomics group has talked about  is that with the
HL7 specs expressed in XML, the important part is that canonical validation
applications are written that verify whether a document is conformant with
the additional semantics plus provide boiler plate examples.  this allows
the developers not to read the docs too closely but understand when they've
done something wrong!  (not ideal but works, that's why OWL in XML works,
there's a great body of tools)



(from dave)



"One way would be as Michael suggested, to use ODM for mapping UML to OWL.
But is this mapping to OWL full or to a more computable dialect of OWL?  And
would there be notions in UML that are not expressible in OWL and vice
versa?  Should we maintain both the UML model and the OWL ontology as
normative, or one of the two, and if so, which one?"



i think where things get dicey is in the business/logic (there's a good
discussion in the spec), so it is probably to a more computable dialect of
OWL.  but in practice, the type of information that needs to be 'traded' by
HL7 specs tends to be straight-forward information with the controlled
vocabularies contributing the extra semantics of how a particular code
relates to the patient and the report in the document and also connects out
to the larger world.  one thing the clinical genomics group has tried to do
is leave open what controlled vocabulary to use (this is something that i
think MAGE-OM was one of the first to get right).  normally LOINC is
recommended but, in the genomics world it is true things become out of date
so to get the right term may require a new CV.  the kind of reasoning, i
think, that you want to do, conor, would run on top of the information in
the HL7 v3 formatted documents to take advantage of, among other things, the
linked data cloud.



so i guess what i'm saying here is that using XML as the language of
interchange is not a bad thing but that it is expected, and this needs to be
made clear, that the XML is almost certainly not the best storage mechanism
for the data.



cheers,

michael



*From:* public-semweb-lifesci-request@w3.org [mailto:
public-semweb-lifesci-request@w3.org] *On Behalf Of *conor dowling
*Sent:* Tuesday, August 23, 2011 5:22 PM
*To:* Hau, Dave (NIH/NCI) [E]
*Cc:* Jim McCusker; John Madden; public-semweb-lifesci@w3.org
*Subject:* Re: A Fresh Look Proposal (HL7)



So Conor if I understand you correctly, you're saying that the current gap
that should be addressed in Fresh Look is that the current HL7 v3 models are
not specified in a language that can be used for reasoning, i.e. they are
not OWL ontologies, otherwise publishing value sets would not be necessary
because the reasoning could determine whether a particular value (i.e.
"object" in your email) would be valid for a particular observation (i.e.
"verb).  Is that what you're saying?



Dave,



exactly - that the patient information model and any recommended
terminologies be defined in the same medium and that the medium be capable
of capturing permitted ranges, appropriate domains etc. for all predicates:
I think a flavor of OWL with a closed-world assumption is the only real game
in town but ...



One goal (always easier to agree on goals than technologies!) is that an
"allergic to allergy" misstep wouldn't happen - there would be no need to
read guidance and coders don't read! A meaningful use test would assert
permitted ranges (ex/ allergin class:
http://datasets.caregraf.org/snomed#!406455002 for a property "allergin").



Of course, 'correctness' isn't the only goal or result: transforming between
equivalent expressions supported by model+terminology should be possible and
promoted (take: http://www.caregraf.com/blog/good-son-jones-diabetic-ma ).
And then there's the direct path to decision-support which you mention
above.



The focus on enforcing syntactic correctness would fade away and the model
specifier's demand for greater precision from terminologies should drive
improvements there. This is far from new: some HL7 and SNOMED documents
identify the need to marry model and terminology but go no further.



I think the current meaningful-use CCD has six areas - allergies, problems,
procedures ... It would be interesting to try one or two, say look at
Kaiser's problem subset from SNOMED and see how a HL7-based OWL patient
model and that could work together. There are a lot of pieces in the wild
now: they just need a forum to play in.



One last thing, slightly off the thread but still on topic I think. I don't
see any reason to mix up "human readable" and "machine processable". One
possibility for a patient model update, one that bypasses the need for
buy-in by everyone, irrespective of use case, is to call out the need for a
model of description purely for machine processing, one without the "we'll
XSLT the patient record in the doctor's browser". While the current
standards lead to human-readable data-dumps, a stricter parallel track could
take the best of current standards and re-state them in OWL to deliver
machine-processable health data exchange,



Conor





I agree OWL ontologies are useful in health informatics because reasoning
can be used for better validation, decision support etc.  I'm wondering, is
there a need for both a UML type modeling language and OWL (or other
logic-based language) to be used simultaneously?  If so, how?  Should OWL be
used for representing knowledge, and UML be used for representing
application models?



One way would be as Michael suggested, to use ODM for mapping UML to OWL.
But is this mapping to OWL full or to a more computable dialect of OWL?  And
would there be notions in UML that are not expressible in OWL and vice
versa?  Should we maintain both the UML model and the OWL ontology as
normative, or one of the two, and if so, which one?



- Dave



ps.  Michael, nice meeting you at the caBIG F2F too!







*From:* conor dowling [mailto:conor-dowling@caregraf.com]
*Sent:* Monday, August 22, 2011 12:28 PM
*To:* John Madden
*Cc:* Jim McCusker; Hau, Dave (NIH/NCI) [E]; public-semweb-lifesci@w3.org
*Subject:* Re: A Fresh Look Proposal (HL7)



>> for each tool-chain, there are some kinds of content that are natural and
easy to express, and other kinds of content that are difficult and
imperspicuous to express



it's the old "medium is the message" and as you say John, it's somewhat
unavoidable, But this connection doesn't imply all media are equally
expressive.



Making XSD/XML the focus for definition rather than seeing it as just one
end-of-a-road serialization is limiting because as a medium, it puts the
focus on syntax, not semantics. That can't be said of OWL/SKOS/RDFS ...



By way of example: you could have a patient data ontology, one that works
with a KOS like SNOMED and if an implementor likes XML, there's nothing to
stop ...



              RDF (turtle) conformant to ontologies/KOS --> RDF/XML ----
XSLT ----> CCD (ex)



as a chain. It's trivial. But if you start with lot's of XSL, well you get
only what that medium permits and promotes, which is a focus on syntax, on
the presence or absence of fields, as opposed to guidance on the correct
concept to use with this or that verb.



Of course, a verb-object split is comfortable because those building
information models can work independently of those creating terminologies
but is such separation a good thing? Now, were both to work in a common
medium then inevitably ...



Conor



p.s. the public release by Kaiser of their subsets of SNOMED (CMT) is the
kind of thing that will make that KOS more practical. Now what's needed is
tighter definition of the model to use with that and similar sub schemes.

On Mon, Aug 22, 2011 at 9:03 AM, John Madden <john.madden@duke.edu> wrote:

I agree 95% with Jim and Conor.



My 5% reservation is that for each tool-chain, there are some kinds of
content that are natural and easy to express, and other kinds of content
that are difficult and imperspicuous to express (is that a word?).



Even this is not in itself a problem, except that it tends to make
architects favor some kinds of conceptualization and shun other kinds of
conceptualization, not on the merits, but because that's what's easy to
express in the given tool.



For example, the fact that the decision was made to serialize all RIM-based
artifacts as XSD-valid XML meant that hierarchical modeling rather than
directed-graph modeling tended to be used in practice. (Even though the RIM
expressed as a Visio model has more in common with a directed-graph.) It
meant that derivation by restriction was made the favored extensibility
mechanism.



These choices may not have been the most auspicious for the kind of
conceptualizations that needed to be expressed. None of these things are
"necessary" consequences of using XSD-valid XML as your language Rather,
they are the results that you tend to get in practice because the tool has
so much influence on the style that ends up, in practice, being
used. (id/idref//key/keyrefs are certainly part of XSD/XML, and make it
possible to express non-hierarchical relations, but where in any HL7
artifact do you ever see key/keyref being used?? SImilarly, it is possible
to derive by extension in XSD, but the spec makes it less easy than deriving
by restriction).



Or again, the fact that OIDs rather than http URLs were chosen as the
identifier of choice isn't in any way dispositive of whether you will be
tend to architect with RPC or REST principles in mind. (OIDs and http URLs
are actually quite interconvertible.) But I'd argue that if you are a person
who tends to think using http URLs, you'll more likely gravitate to REST
solutions out of the gate.



So, I agree, what's important is the deep content, not the choice of
serialization of that content. But a bad serialization choice, coupled with
bad tools, can leave architects wandering in the wilderness for a long time.
So long, sometimes, that they lose track of what the deep conceptualization
was supposed to have been in the first place.







On Aug 22, 2011, at 9:39 AM, Jim McCusker wrote:



I was just crafting a mail about how our investment in XML technologies
hasn't paid off when this came in. What he said. :-)

On Mon, Aug 22, 2011 at 9:33 AM, conor dowling <conor-dowling@caregraf.com>
wrote:

>> The content matters, the format does not.



should be front and center. Talk of XML that or JSON this, of RDF as XML in
a chain is a distraction - it's just plumbing. There are many tool-chains
and implementors are big boys - they can graze the buffet themselves.



Central to any patient model rework (I hope) would be the interplay of
formal specifications for terminologies like SNOMED along with any patient
data information model. What should go in the terminology concept (the
"object" in RDF terms) - what is left in the model (the "predicate"). Right
now, this interplay is woefully under specified and implementors throw just
about any old concept into "appropriate" slots in RIM (I know this from
doing meaningful use tests:
http://www.caregraf.com/blog/being-allergic-to-allergies,
http://www.caregraf.com/blog/there-once-was-a-strawberry-allergy ) BTW, if
SNOMED is the terminology of choice (for most) then the dance of it and any
RIM-2 should drive much of RIM-2's form.



This is a chance to get away from a fixation on formats/plumbing/"the trucks
for data" and focus on content and in that focus to consider every aspect of
expression, not just the verbs (RIM) or the objects (SNOMED) but both.



Back to "forget the plumbing": if you want to publish a patient's data as an
RDF graph or relational tables or you want a "document" to send on a wire,
if you want to query with a custom protocol or use SPARQL or SQL, you should
be able to and not be seen as an outlier. Each can be reduced to equivalents
in other formats for particular interoperability. The problem right now is
that so much time is spent talking about these containers and working
between them and too little time is given over to what they contain,



Conor



On Mon, Aug 22, 2011 at 6:01 AM, Hau, Dave (NIH/NCI) [E] <haudt@mail.nih.gov>
wrote:

I see what you're saying and I agree.



The appeal of XML (i.e. XML used with an XSD representing model syntactics,
not XML used as a serialization as in RDF/XML) is due in part to:



- XML schema validation API is available on virtually all platforms e.g.
Java, Javascript, Google Web Toolkit, Android etc.

- XML schema validation is relatively lightweight computationally.  Pellet
ICV and similar mechanisms are more complete in their validation with the
model, but much more computationally expensive unless you restrict yourself
to a small subset of OWL which then limits the expressiveness of the
modeling language.

- XML provides a convenient bridge from models such as OWL to relational
databases e.g. via JAXB or Castor to Java objects to Hibernate to any RDB.

- Relational querying and XML manipulation skills are much more plentiful in
the market than SPARQL skills currently.

- Some of the current HL7 artifacts are expressed in XSD format, such as
their datatypes (ISO 21090 ; although there are alternative representations
such as UML, and there is an abstract spec too from HL7).  If we operate
with OWL and RDF exclusively, would need to convert these datatypes into
OWL.



Maybe it'd be worthwhile to get a few of us who are interested in this topic
together, with some of the HL7 folks interested, and have a few calls to
flush this out and maybe write something up?



- Dave









*From:* Jim McCusker [mailto:james.mccusker@yale.edu]
*Sent:* Sunday, August 21, 2011 6:12 PM
*To:* Hau, Dave (NIH/NCI) [E]
*Cc:* public-semweb-lifesci@w3.org
*Subject:* Re: FW: A Fresh Look Proposal (HL7)



I feel I need to cut to the chase with this one: XML schema cannot validate
semantic correctness.



It can validate that XML conforms to a particular schema, but that is
syntactic. The OWL validator is nothing like a schema validator, first it
produces a closure of all statements that can be inferred from the asserted
information. This means that if a secondary ontology is used to describe
some data, and that ontology integrates with the ontology that you're
attempting to validate against, you will get a valid result. An XML schema
can only work with what's in front of it.



Two, there are many different representations of information that go beyond
XML, and it should be possible to validate that information without anything
other than a mechanical, universal translation. For instance, there are a
few mappings of RDF into JSON, including JSON-LD, which looks the most
promising at the moment. Since RDF/XML and JSON-LD both parse to the same
abstract graph, there is a mechanical transformation between them. When
dealing with semantic validity, you want to check the graph that is parsed
from the document, not the document itself.



The content matters, the format does not. For instance, let me define a new
RDF format called RDF/CSV:



First column is the subject. First row is the predicate. All other cell
values are objects. URIs that are relative are relative to the document, as
in RDF/XML.



I can write a parser for that in 1 hour and publish it. It's genuinely
useful, and all you would have to do to read and write it is to use my
parser or write one yourself. I can then use the parser, paired with Pellet
ICV, and validate the information in the file without any additional work
from anyone.



Maybe we need a simplified XML representation for RDF that looks more like
regular XML. But to make a schema for an OWL ontology is too much work for
too little payoff.



Jim

On Sun, Aug 21, 2011 at 5:45 PM, Hau, Dave (NIH/NCI) [E] <haudt@mail.nih.gov>
wrote:

Hi all,



As some of you may have read, HL7 is rethinking their v3 and doing some
brainstorming on what would be a good replacement for a data exchange
paradigm grounded in robust semantic modeling.



On the following email exchange, I was wondering, if OWL is used for
semantic modeling, are there good ways to accomplish the following:



1.  Generate a wire format schema (for a subset of the model, the subset
they call a "resource"), e.g. XSD



2.  Validate XML instances for conformance to the semantic model.  (Here I'm
reminded of Clark and Parsia's work on their Integrity Constraint
Validator:  http://clarkparsia.com/pellet/icv )



3.  Map an XML instance conformant to an earlier version of the "resource"
to the current version of the "resource" via the OWL semantic model



I think it'd be great to get a semantic web perspective on this fresh look
effort.



Cheers,

Dave







Dave Hau

National Cancer Institute

Tel: 301-443-2545

Dave.Hau@nih.gov







*From:* owner-its@lists.hl7.org [mailto:owner-its@lists.hl7.org] *On Behalf
Of *Lloyd McKenzie
*Sent:* Sunday, August 21, 2011 12:07 PM
*To:* Andrew McIntyre
*Cc:* Grahame Grieve; Eliot Muir; Zel, M van der; HL7-MnM; RIMBAA; HL7 ITS
*Subject:* Re: A Fresh Look Proposal



Hi Andrew,



Tacking stuff on the end simply doesn't work if you're planning to use XML
Schema for validation.  (Putting new stuff in the middle or the beginning
has the same effect - it's an unrecognized element.)  The only alternative
is to say that all changes after "version 1" of the specification will be
done using the extension mechanism.  That will create tremendous analysis
paralysis as we try to get things "right" for that first version, and will
result in increasing clunkiness in future versions.  Furthermore, the
extension mechanism only works for the wire format.  For the RIM-based
description, we still need proper modeling, and that can't work with "stick
it on the end" no matter what.



That said, I'm not advocating for the nightmare we currently have with v3
right now.



I think the problem has three parts - how to manage changes to the wire
format, how to version resource definitions and how to manage changes to the
semantic model.



Wire format:

If we're using schema for validation, we really can't change anything
without breaking validation.  Even making an existing non-repeating element
repeat is going to cause schema validation issues.  That leaves us with two
options (if we discount the previously discussed option of "get it right the
first time and be locked there forever":

1. Don't use schema

- Using Schematron or something else could easily allow validation of the
elements that are present, but ignore all "unexpected" elements

- This would cause significant pain for implementers who like to use schema
to help generate code though



2. Add some sort of a version indicator on new content that allows a
pre-processor to remove all "new" content if processing using an "old"
handler

- Unpleasant in that it involves a pre-processing step and adds extra "bulk"
to the instances, but other than that, quite workable



I think we're going to have to go with option #2.  It's not ideal, but is
still relatively painless for implementers.  The biggest thing is that we
can insist on "no breaking x-path changes".  We don't move stuff between
levels in a resource wire format definition or rename elements in a resource
wire format definition.  In the unlikely event we have to deprecate the
entire resource and create a new version.



Resource versioning:

At some point, HL7 is going to find at least one resource where we blew it
with the original design and the only way to create a coherent wire format
is to break compatibility with the old one.  This will then require
definition of a new resource, with a new name that occupies the same
semantic space as the original.  I.e. We'll end up introducing "overlap".
 Because overlap will happen, we need to figure out how we're going to deal
with it.  I actually think we may want to introduce overlap in some places
from the beginning.  Otherwise we're going to force a wire format on
implementers of simple community EMRs that can handle prescriptions for
fully-encoded chemo-therapy protocols.  (They can ignore some of the data
elements, but they'd still have to support the full complexity of the nested
data structures.)



I don't have a clear answer here, but I think we need to have a serious
discussion about how we'll handle overlap in those cases where it's
necessary, because at some point it'll be necessary.  If we don't figure out
the approach before we start, we can't allow for it in the design.



All that said, I agree with the approach of avoiding overlap as much as
humanly possible.  For that reason, I don't advocate calling the Person
resource "Person_v1" or something that telegraphs we're going to have new
versions of each resource eventually (let alone frequent changes).
 Introduction of a new version of a resource should only be done when the
pain of doing so is outweighed by the pain of trying to fit new content in
an old version, or requiring implementers of the simple to support the
structural complexity of our most complex use-cases.





Semantic model versioning:

This is the space where "getting it right" the first time is the most
challenging.  (I think we've done that with fewer than half of the normative
specifications we've published so far.)  V3 modeling is hard.  The positive
thing about the RFH approach is that very few people need to care.  We could
totally refactor every single resource's RIM-based model (or even remove
them entirely), and the bulk of implementers would go on merrily exchanging
wire syntax instances.  However, that doesn't mean the RIM-based
representations aren't important.  They're the foundation for the meaning of
what's being shared.  And if you want to start sharing at a deeper level
such as RIMBAA-based designs, they're critical.  This is the level where OWL
would come in.  If you have one RIM-based model structure, and then need to
refactor and move to a different RIM-based model structure, you're going to
want to map the semantics between the two structures so that anyone who was
using the old structure can manage instances that come in with the new
structure (or vice versa).  OWL can do that.  And anyone who's got a complex
enough implementation to parse the wire format and trace the elements
through the their underlying RIM semantic model will likely be capable of
managing the OWL mapping component as well.





In short, I think we're in agreement that separation of wire syntax and
semantic model are needed.  That will make model refactoring much easier.
 However we do have to address how we're going to handle wire-side and
resource refactoring too.





Lloyd

--------------------------------------
Lloyd McKenzie

+1-780-993-9501



Note: Unless explicitly stated otherwise, the opinions and positions
expressed in this e-mail do not necessarily reflect those of my clients nor
those of the organizations with whom I hold governance positions.

On Sun, Aug 21, 2011 at 7:53 AM, Andrew McIntyre <
andrew@medical-objects.com.au> wrote:

Hello Lloyd,

While "tacking stuff on the end" in V2 may not at first glance seem like an
elegant solution I wonder if it isn't actually the best solution, and one
that has stood the test of time. The parsing rules in V2 do make version
updates quite robust wrt backward and forward inter-operability.

I am sure it could be done with OWL but I doubt we can switch the world to
using OWL in any reasonable time frame and we probably need a less abstract
representation for commonly used things. In V2 OBX segments, used in a
hierarchy can create an OWL like object-attribute structure for information
that is not modeled by the standard itself.

I do think the wire format and any overlying model should be distinct
entities so that the model can be evolved and the wire format be changed in
a backward compatible way, at least for close versions.

I also think that the concept of templates/archetypes to extend the model
should not invalidate the wire format, but be a metadata layer over the wire
format. This is what we have done in Australia with an ISO 13606 Archetypes
in V2 projects. I think we do need a mechanism for people to develop
templates to describe hierarchical data and encode that in the wire format
in a way that does not invalidate its vanilla semantics (ie non templated V2
semantics) when the template mechanism is unknown or not implemented.

In a way the V2 specification does hit at underlying objects/Interfaces, and
there is a V2 model, but it is not prescriptive and there is no requirement
for systems to use the same internal model as long as they use the bare
bones V2 model in the same way. Obviously this does not always work as well
as we would like, even in V2, but it does work well enough to use it for
quite complex data when there are good implementation guides.

If we could separate the wire format from the clinical models then the 2 can
evolve in their own way. We have done several trial implementations of
Virtual Medical Record Models (vMR) which used V3 datatypes and RIM like
classes and could build those models from V2 messages, or in some cases non
standard Web Services, although for specific clinical classes did use ISO
13606 archetypes to structure the data in V2 messages.

I think the dream of having direct model serializations as messages is
flawed for all the reasons that have made V3 impossible to implement in the
wider world. While the tack it on the end, lots of optionality rationale
might seem clunky, maybe its the best solution to a difficult problem. If we
define tight SOAP web services for everything we will end up with thousands
of slightly different SOAP calls for every minor variation and I am not sure
this is the path to enlightenment either.

I am looking a Grahams proposal now, but I do wonder if the start again from
scratch mentality is not part of the problem. Perhaps that is a lesson to be
learned from the V3 process. Maybe the problem is 2 complex to solve from
scratch and like nature we have to evolve and accept there is lots of junk
DNA, but maintaining a working standard at all times is the only way to
avoid extinction.

I do like the idea of a cohesive model for use in decision support, and
that's what the vMR/GELLO is about, but I doubt there will ever be a one
size fits all model and any model will need to evolve. Disconnecting the
model from the messaging, with all the pain that involves, might create a
layered approach that might allow the HL7 organism to evolve gracefully. I
do think part of the fresh look should be education on what V2 actually
offers, and can offer, and I suspect many people in HL7 have never seriously
looked at it in any depth.

Andrew McIntyre



Saturday, August 20, 2011, 4:37:37 AM, you wrote:

Hi Grahame,

Going to throw some things into the mix from our previous discussions
because I don't see them addressed yet.  (Though I admit I haven't reread
the whole thing, so if you've addressed and I haven't seen, just point me at
the proper location.)

One of the challenges that has bogged down much of the v3 work at the
international level (and which causes a great deal of pain at the
project/implementation level) is the issue of refactoring.  The pain at the
UV level comes from the fact that we have a real/perceived obligation to
meet all known and conceivable use-cases for a particular domain.  For
example, the pharmacy domain model needs to meet the needs of clinics,
hospitals, veterinarians, and chemotherapy protocols and must support the
needs of the U.S., Soviet union and Botswana.  To make matters more
interesting, participation from the USSR and Botswana is a tad light.
 However the fear is that if all of these needs aren't taken into account,
then when someone with those needs shows up at the door, the model will need
to undergo substantive change, and that will break all of the existing
systems.

The result is a great deal of time spent gathering requirements and
refactoring and re-refactoring the model as part of the design process,
together with a tendency to make most, if not all data elements optional at
the UV level.  A corollary is that the UV specs are totally unimplementable
in an interoperable fashion.  The evil of optionality that manifested in v2
that v3 was going to banish turned out to not be an issue of the standard,
but rather of the issues with creating a generic specification that
satisfies global needs and a variety of use-cases.

The problem at the implementer/project level is that when you take the UV
model and tightly constrain it to fit your exact requirements, you discover
6 months down the road that one or more of your constraints was wrong and
you need to loosen it, or you have a new requirement that wasn't thought of,
and this too requires refactoring and often results in wire-level
incompatibilities.

One of the things that needs to be addressed if we're really going to
eliminate one of the major issues with v3 is a way to reduce the fear of
refactoring.  Specifically, it should be possible to totally refactor the
model and have implementations and designs work seemlessly across versions.

I think putting OWL under the covers should allows for this.  If we can
assert equivalencies between data elements in old and new models, or even
just map the wire syntaxes of old versions to new versions of the definition
models, then this issue would be significantly addressed:
- Committees wouldn't have to worry about satisfying absolutely every
use-case to get something useful out because they know they can make changes
later without breaking everything.  (They wouldn't even necessarily have to
meet all the use-cases of the people in the room! :>)
- Realms and other implementers would be able to have an interoperability
path that allowed old wire formats to interoperate with new wireformats
through the aid of appropriate tooling that could leverage the OWL under the
covers.  (I think creating such tooling is *really* important because
version management is a significant issue with v3.  And with XML and
schemas, the whole "ignore everything on the end you don't recognize" from
v2 isn't a terribly reasonable way forward.

I think it's important to figure out exactly how refactoring and version
management will work in this new approach.  The currently proposed approach
of "you can add stuff, but you can't change what's there" only scales so
far.


I think we *will* need to significantly increase the number of Resources
(from 30 odd to a couple of hundred).  V3 supports things like invoices,
clinical study design, outbreak tracking and a whole bunch of other
healthcare-related topics that may not be primary-care centric but are still
healthcare centric.  That doesn't mean all (or even most) systems will need
to deal with them, but the systems that care will definitely need them.  The
good news is that most of these more esoteric areas have responsible
committees that can manage the definition of these resources, and as you
mention, we can leverage the RMIMs and DMIMs we already have in defining
these structures.


The specification talks about robust capturing of requirements and
traceability to them, but gives no insight into how this will occur.  It's
something we've done a lousy job of with v3, but part of the reason for that
is it's not exactly an easy thing to do.  The solution needs to flesh out
exactly how this will happen.


We need a mapping that explains exactly what's changed in the datatypes (and
for stuff that's been removed, how to handle that use-case).

There could still be a challenge around granularity of text.  As I
understand it, you can have a text representation for an attribute, or for
any XML element.  However, what happens if you have a text blob in your
interface that covers 3 of 7 attributes inside a given XML element.  You
can't use the text property of the element, because the text only covers 3
of 7.  You can't use the text property of one of the attributes because it
covers 3 separate attributes.  You could put the same text in each of the 3
attributes, but that's somewhat redundant and is going to result in
rendering issues.  One solution might be to allow the text specified at the
element level to identify which of the attributes the text covers.  A
rendering system could then use that text for those attributes, and then
render the discrete values of the remaining specified attributes.  What this
would mean is that an attribute might be marked as "text" but not have text
content directly if the parent element had a text blob that covered that
attribute.



New (to Grahame) comments:

I didn't see anything in the HTML section or the transaction section on how
collisions are managed for updates.  A simple requirement (possibly
optional) to include the version id of the resource being updated or deleted
should work.

To my knowledge, v3 (and possibly v2) has never supported true "deletes".
 At best, we do an update and change the status to nullified.  Is that the
intention of the "Delete" transaction, or do we really mean a true "Delete"?
 Do we have any use-cases for true deletes?

I wasn't totally clear on the context for uniqueness of ids.  Is it within a
given resource or within a given base URL?  What is the mechanism for
referencing resources from other base URLs?  (We're likely to have networks
of systems that play together.)

Nitpick: I think "id" might better be named "resourceId" to avoid any
possible confusion with "identifier".  I recognize that from a coding
perspective, shorter is better.  However, I think that's outweightd by the
importance of avoiding confusion.

In the resource definitions, you repeated definitions for resources
inherited from parent resources.  E.g. Person.created inherited from
Resource.Base.created.  Why?  That's a lot of extra maintenance and
potential for inconsistency.  It also adds unnecessary volume.

Suggest adding a caveat to the draft that the definitions are placeholders
and will need significant work.  (Many are tautological and none meet the
Vocab WG's guidelines for quality definitions.)

Why is Person.identifier mandatory?

You've copied "an element from Resource.Base.???" to all of the Person
attributes, including those that don't come from Resource.Base.

Obviously the workflow piece and the conformance rules that go along with it
need some fleshing out.  (Looks like this may be as much fun in v4 as it has
been in v3 :>)

The list of identifier types makes me queasy.  It looks like we're
reintroducing the mess that was in v2.  Why?  Trying to maintain an ontology
of identifier types is a lost cause.  There will be a wide range of
granularity requirements and at fine granularity, there will be 10s of
thousands.  The starter list is pretty incoherent.  If you're going to have
types at all, the vocabulary should be constrained to a set of codes based
on the context in which the real-world identifier is present.  If there's no
vocabulary defined for the property in that context, then you can use text
for a label and that's it.

I didn't see anything on conformance around datatypes.  Are we going to have
datatype flavors?  How is conformance stated for datatype properties?

I didn't see templateId or flavorId or any equivalent.  How do instances (or
portions there-of) declare conformance to "additional" constraint
specifications/conformance profiles than the base one for that particular
server?

We need to beef up the RIM mapping portion considerably.  Mapping to a
single RIM class or attribute isn't sufficient.  Most of the time, we're
going to need to map to a full context model that talks about the
classCodes, moodCodes and relationships.  Also, you need to relate
attributes to the context of the RIM location of your parent.

There's no talk about context conduction, which from an implementation
perspective is a good thing.  However, I think it's still needed behind the
scenes.  Presumably this would be covered as part of the RIM semantics
layer?

In terms of the "validate" transaction, we do a pseudo-validate in pharmacy,
but a 200 response isn't sufficient.  We can submit a draft prescription and
say "is this ok?".  The response might be as simple as "yes" (i.e. a 200).
 However, it could also be a "no" or "maybe" with a list of possible
contraindications, dosage issues, allergy alerts and other detected issues.
 How would such a use-case be met in this paradigm?

At the risk of over-complicating things, it might be useful to think about
data properties as being identifying or not to aid in exposing resources in
a de-identified way.  (Not critical, just wanted to plant the seed in your
head about if or how this might be done.)


All questions and comments aside, I definitely in favour of fleshing out
this approach and looking seriously at moving to it.  To that end, I think
we need a few things:
- A list of the open issues that need to be resolved in the new approach.
 (You have "todo"s scattered throughout.  A consolidated list of the "big"
things would be useful.)
- An analysis of how we move from existing v3 to the new approach, both in
terms of leveraging existing artifacts and providing a migration path for
existing solutions as well as what tools, etc. we need.
- A plan for how to engage the broader community for review.  (Should
ideally do this earlier rather than later.)

Thanks to you, Rene and others for all the work you've done.


Lloyd

--------------------------------------
Lloyd McKenzie

+1-780-993-9501



Note: Unless explicitly stated otherwise, the opinions and positions
expressed in this e-mail do not necessarily reflect those of my clients nor
those of the organizations with whom I hold governance positions.


On Fri, Aug 19, 2011 at 9:08 AM, Grahame Grieve <grahame@kestral.com.au

> wrote:


hi All

Responses to comments

#Michael

> 1. I would expect more functional interface to use these resources.

as you noted in later, this is there, but I definitely needed to make
more of it. That's where I ran out of steam

> 2. One of the things that was mentioned (e.g. at the Orlando
> WGM RIMBAA Fresh Look discussion) is that we want to use
> industry standard tooling, right? Are there enough libraries that
> implement REST?

this doesn't need tooling. There's schemas if you want to bind to them

> 2b. A lot of vendors now implement WebServices. I think we should
> go for something vendors already have or will easilly adopt. Is that the
case with REST?

Speaking as a vendor/programmer/writer of an open source web services
toolkit, I prefer REST. Way prefer REST

> Keep up the good work!

ta

#Mark

> I very much like the direction of this discussion towards web services
> and in particular RESTful web services.

yes, though note that REST is a place to start, not a place to finish.

> At MITRE we have been advocating this approach for some time with our
hData initiative.

yes. you'll note my to do: how does this relate to hData, which is a
higher level
specification than the CRUD stuff here.

#Eliot

> Hats off - I think it's an excellent piece of work and definitely a step
in right direction.

thanks.

> I didn't know other people in the HL7 world other than me were talking
about
> (highrise).  Who are they?

not in Hl7. you were one. it came up in some other purely IT places that I
play

>  5) Build it up by hand with a wiki - it is more scalable really since you

wiki's have their problems, though I'm not against them.

> 1) I think it would be better not to use inheritance to define a patient
as
> a sub type of a person.  The trouble with that approach is that people can

On the wire, a patient is not a sub type of person. The relationship
between the two is defined in the definitions.

> A simpler approach is associate additional data with a person if and when
> they become a patient.

in one way, this is exactly what RFH does. On the other hand, it creates a
new identity for the notion of patient (for integrity). We can discuss
whether that's good or bad.

> 2) I'd avoid language that speaks down to 'implementers'.  It's enterprise

really? Because I'm one. down the bottom of your enterprise pole. And
I'm happy to be one of those stinking implementers down in the mud.
I wrote it first for me. But obviously we wouldn't want to cause offense.
I'm sure I haven't caused any of that this week ;-)

> 3) If you want to reach a broader audience, then simplify the language.

argh, and I thought I had. how can we not use the right terms? But I
agree that the introduction is not yet direct enough - and that's after
4 rewrites to try and make it so....

Grahame


************************************************
To access the Archives of this or other lists or change your list settings
and information, go to:

http://www.hl7.org/listservice



************************************************
To access the Archives of this or other lists or change your list settings
and information, go to: http://www.hl7.org/listservice





*--
Best regards,
 Andrew
*mailto:andrew@Medical-Objects.com.au<andrew@Medical-Objects.com.au>

*sent from a real computer*





************************************************

To access the Archives of this or other lists or change your list
settings and information, go to: http://www.hl7.org/listservice





-- 
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu







-- 
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu
Received on Friday, 2 September 2011 14:26:04 UTC