RE: [TMO] patient record normalization

* Michel_Dumontier <Michel_Dumontier@carleton.ca> [2010-09-09 14:52-0400]
> Hi,
>   I think the model where we separate the unit from the value is preferred (as per CPR) because it is highly flexible. You are right however that people could then refer to any unit, although an ontology could specify the unit involved (universal restriction). The alternative, which Bijan worked on was a stylesheet to convert values in different units to a common unit.

At W3, standardization includes detecting and eliminating redundant
flexibility. If someone says "<img src='X'/> == <img href='X'/>", we
say "pick exactly one or there will be bugs and inefficiency". To that
end, I'd like the TMO task force to have exactly format for the tests
worth standardizing, e.g. blood pressure. Further, I'd like users of
the TMO to benefit from this stake in the ground; specificaly, I don't
want them to query data that's half in MPa and half in mmHg. Voila my
desire for one inflexible representation.

This draconian measure could be enforced by OWL restrictions, if folks
chose to run consistency checkers on their data, but some folks won't,
and some folks won't in time, and some folks will resent us for
imposing on their pipeline, and some folks won't even descry this
imposition. Normalization can also be enforced in the choice of
predicate; we can say that the object of cpr:systolicBpMPa¹ is in MPa.
We can write this down in the schema, and also as an OWL restriction.
This moves the burden of inference from users of the standard to those
who are mixing with data which has other units (a shrinking group when
standardization is successful).

I believe the principle counter argument to normalization is that this
would be an obstacle to adoption; that e.g. clinics or pharmas who
would otherwise be tempted to express their clinical data in CPR would
be discouraged by the requirement of input normalization. I think that
group is vanishingly small, especially if they face heterogeneous data
and have to normalize anyways. It's possible that the arguments for
homogeneous data (no query/inference-time normalization, trivial
federation, etc.) are too subtle to persuade the above group, but I
think the clinical web will be much better off if we can eliminate
redundant flexibility.

¹ Chimezie, what do you think of this imposition on CPR?

> m.
> 
> > -----Original Message-----
> > From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-
> > lifesci-request@w3.org] On Behalf Of Eric Prud'hommeaux
> > Sent: Thursday, September 09, 2010 1:06 PM
> > To: public-semweb-lifesci@w3.org
> > Subject: [TMO] patient record normalization
> > 
> > We have choices about how to model units. per the first TMO RDF
> > patient data, we can keep the units as datatypes:
> > 
> > :X trans:bloodPressure
> >   [ trans:systolic "120"^^u:mmHg ;
> >     trans:diastolic "80"^^u:mmHg ] .
> > 
> > per CPR, as a pair of value and datatype:
> > 
> > … [ trans:systolic [ muo:measuredIn trans1:mmHg ; muo:numericalValue
> > "120" ] ;
> >     trans:diastolic [ muo:measuredIn trans1:mmHg ; muo:numericalValue
> > "80" ] ] .
> > 
> > Another, potentially more attractive option, is to model units in the
> > predicate:
> > 
> > :X trans:bloodPressure
> >   [ trans:systolicMmHg "120" ;
> >     trans:diastolicMmHg "80" ] .
> > 
> > This greatly simplifies our life as we are otherwise likely to have a
> > variety of e.g. BP data in the database: 120/80 mmHg, 12/8 DmHg,
> > 16000/10667 Pa,
> > 16/11 MPa, 13 (PAM)
> > 
> > which would lead to rediculous queries when we want to use the data:
> > 
> > SELET ?sysM ?diaM {
> >           ?x trans:bloodPressure [ trans:systolic ?sys ;
> > trans:diastolic ]
> >           FILTER (datatype(?sys) = u:mmHg) && datatype(?dia) = u:mmHg)
> > }
> >   UNION SELECT (?sys*10 as ?sysM) (?dia*10 as ?diaM) {
> >           ?x trans:bloodPressure [ trans:systolic ?sys ;
> > trans:diastolic ]
> >           FILTER (datatype(?sys) = u:dmHg) && datatype(?dia) = u:dmHg)
> > }
> >   UNION SELECT (?sys*133 as ?sysM) (?dia*133 as ?diaM) {
> >           ?x trans:bloodPressure [ trans:systolic ?sys ;
> > trans:diastolic ]
> >           FILTER (datatype(?sys) = u:MPa) && datatype(?dia) = u:MPA) }
> > … }
> > 
> > 
> > 
> > --
> > -ericP
> 

-- 
-ericP

Received on Friday, 10 September 2010 16:54:05 UTC