Re: [TMO] patient record normalization from Chimezie Ogbuji on 2010-09-10 (public-semweb-lifesci@w3.org from September 2010)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Fri, 10 Sep 2010 13:45:41 -0400
To: "Eric Prud'hommeaux" <eric@w3.org>, Michel_Dumontier <Michel_Dumontier@carleton.ca>
cc: "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-ID: <C8AFE705.135D2%ogbujic@ccf.org>
Hello.  Very interesting thread =).  My $0.02.  You say in your original
email:

>>> This greatly simplifies our life as we are otherwise likely to have a
>>> variety of e.g. BP data in the database: 120/80 mmHg, 12/8 DmHg,
>>> 16000/10667 Pa,
>>> 16/11 MPa, 13 (PAM)

I'm not so sure if the idea that databases with measurement data are likely
to have mixed units is very compelling in the realm of patient data.
Patient data is more than often local to a particular institution and their
conventions and so I would think that it is more likely that you will find a
more homogenous combination of units (where BP, for instance is primarily
measured in one unit or another depending on the institution).  Certainly,
if you have an integrated dataset this assumption is less likely to hold,
but even then 1) I don't think the range of units you are likely to see in
such combined datasets will be that diverse - international conventions not
withstanding - and 2) Normalization into a canonical set (Vipul's
suggestion) seems a reasonable approach to adopt as part of the integration,
rather than to delay this normalization until the point when you query the
the data (making the query and any reasoning involved more complex).

Personally, the idea of 'embedding' the units into the predicate doesn't
appeal to me, mainly because the predicate (for example):

trans:systolicMmHg

is overloaded to capture the meaning associated with systolic pressure *and*
the particular unit in which it was captured or represented.  The former is
ontological and the latter is epistemic.  However, the more practical issue
is that the set of terms for measurement will grow (quite rapidly) with the
number of different units you want to be able to represent in your dataset.

On 9/10/10 12:53 PM, "Eric Prud'hommeaux" <eric@w3.org> wrote:
> At W3, standardization includes detecting and eliminating redundant
> flexibility. If someone says "<img src='X'/> == <img href='X'/>", we
> say "pick exactly one or there will be bugs and inefficiency". To that
> end, I'd like the TMO task force to have exactly format for the tests
> worth standardizing, e.g. blood pressure. Further, I'd like users of
> the TMO to benefit from this stake in the ground; specificaly, I don't
> want them to query data that's half in MPa and half in mmHg. Voila my
> desire for one inflexible representation.

So, isn't this an argument for normalization not so much for how you
represent measured values and their units?

> ..snip..
> Normalization can also be enforced in the choice of
> predicate; we can say that the object of cpr:systolicBpMPa¹ is in MPa.
> We can write this down in the schema, and also as an OWL restriction.
> This moves the burden of inference from users of the standard to those
> who are mixing with data which has other units (a shrinking group when
> standardization is successful).

I'm not sure I follow this rational.  If you implement normalization in this
way and with such (overloaded) predicates, then determining the relationship
between the value and its unit is now a reasoning problem (i.e., you need to
'interpret' the predicate WRT the ontology to determine the appropriate
units).  It just seems more straight forward to have generic,
domain-independent predicates that directly relate a 'quality value' with
its units and scalar value, transformations and normalizations can then
happen at the point when data is being integrated, and the semantics of the
measured value is still understood.

Also, having domain-independent predicates makes it easier to render a view
of the data (for human consumption) that includes visual cues regarding the
units of measures associated with values directly from the data since such
tools will always expect the same set of terms to capture a value and its
unit of measurement.
 
> I believe the principle counter argument to normalization is that this
> would be an obstacle to adoption; that e.g. clinics or pharmas who
> would otherwise be tempted to express their clinical data in CPR would
> be discouraged by the requirement of input normalization.

Unless I have misunderstood you, it sounds like you think that the use of
muo:measuredIn and muo:numericalValue *requires* input normalization.  I
don't think this is the case.  These predicates say nothing about whether or
not the use of units are homogeneous or not.

> I think that
> group is vanishingly small, especially if they face heterogeneous data
> and have to normalize anyways. It's possible that the arguments for
> homogeneous data (no query/inference-time normalization, trivial
> federation, etc.) are too subtle to persuade the above group, but I
> think the clinical web will be much better off if we can eliminate
> redundant flexibility.
> 
> ¹ Chimezie, what do you think of this imposition on CPR?

I don't think there is any imposition at all, especially if you use the
convention where there are separate predicates that relate the unit and the
value.  If anything, the approach using overloaded predicates discourages
heterogeneous use of units, because people who query such datasets and
compose ontologies using these terms will then be faced with a proliferation
of terms.  Whereas, even in a dataset with a heterogeneous set of units (for
the same kinds of measures), the way you write queries involving measured
data and the inferences involved are the same.

-- Chime


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News & World Report (2009).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Friday, 10 September 2010 17:47:19 UTC