- From: Chimezie Ogbuji <ogbujic@ccf.org>
- Date: Fri, 10 Sep 2010 13:45:41 -0400
- To: "Eric Prud'hommeaux" <eric@w3.org>, Michel_Dumontier <Michel_Dumontier@carleton.ca>
- cc: "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Hello. Very interesting thread =). My $0.02. You say in your original email: >>> This greatly simplifies our life as we are otherwise likely to have a >>> variety of e.g. BP data in the database: 120/80 mmHg, 12/8 DmHg, >>> 16000/10667 Pa, >>> 16/11 MPa, 13 (PAM) I'm not so sure if the idea that databases with measurement data are likely to have mixed units is very compelling in the realm of patient data. Patient data is more than often local to a particular institution and their conventions and so I would think that it is more likely that you will find a more homogenous combination of units (where BP, for instance is primarily measured in one unit or another depending on the institution). Certainly, if you have an integrated dataset this assumption is less likely to hold, but even then 1) I don't think the range of units you are likely to see in such combined datasets will be that diverse - international conventions not withstanding - and 2) Normalization into a canonical set (Vipul's suggestion) seems a reasonable approach to adopt as part of the integration, rather than to delay this normalization until the point when you query the the data (making the query and any reasoning involved more complex). Personally, the idea of 'embedding' the units into the predicate doesn't appeal to me, mainly because the predicate (for example): trans:systolicMmHg is overloaded to capture the meaning associated with systolic pressure *and* the particular unit in which it was captured or represented. The former is ontological and the latter is epistemic. However, the more practical issue is that the set of terms for measurement will grow (quite rapidly) with the number of different units you want to be able to represent in your dataset. On 9/10/10 12:53 PM, "Eric Prud'hommeaux" <eric@w3.org> wrote: > At W3, standardization includes detecting and eliminating redundant > flexibility. If someone says "<img src='X'/> == <img href='X'/>", we > say "pick exactly one or there will be bugs and inefficiency". To that > end, I'd like the TMO task force to have exactly format for the tests > worth standardizing, e.g. blood pressure. Further, I'd like users of > the TMO to benefit from this stake in the ground; specificaly, I don't > want them to query data that's half in MPa and half in mmHg. Voila my > desire for one inflexible representation. So, isn't this an argument for normalization not so much for how you represent measured values and their units? > ..snip.. > Normalization can also be enforced in the choice of > predicate; we can say that the object of cpr:systolicBpMPa©ö is in MPa. > We can write this down in the schema, and also as an OWL restriction. > This moves the burden of inference from users of the standard to those > who are mixing with data which has other units (a shrinking group when > standardization is successful). I'm not sure I follow this rational. If you implement normalization in this way and with such (overloaded) predicates, then determining the relationship between the value and its unit is now a reasoning problem (i.e., you need to 'interpret' the predicate WRT the ontology to determine the appropriate units). It just seems more straight forward to have generic, domain-independent predicates that directly relate a 'quality value' with its units and scalar value, transformations and normalizations can then happen at the point when data is being integrated, and the semantics of the measured value is still understood. Also, having domain-independent predicates makes it easier to render a view of the data (for human consumption) that includes visual cues regarding the units of measures associated with values directly from the data since such tools will always expect the same set of terms to capture a value and its unit of measurement. > I believe the principle counter argument to normalization is that this > would be an obstacle to adoption; that e.g. clinics or pharmas who > would otherwise be tempted to express their clinical data in CPR would > be discouraged by the requirement of input normalization. Unless I have misunderstood you, it sounds like you think that the use of muo:measuredIn and muo:numericalValue *requires* input normalization. I don't think this is the case. These predicates say nothing about whether or not the use of units are homogeneous or not. > I think that > group is vanishingly small, especially if they face heterogeneous data > and have to normalize anyways. It's possible that the arguments for > homogeneous data (no query/inference-time normalization, trivial > federation, etc.) are too subtle to persuade the above group, but I > think the clinical web will be much better off if we can eliminate > redundant flexibility. > > ©ö Chimezie, what do you think of this imposition on CPR? I don't think there is any imposition at all, especially if you use the convention where there are separate predicates that relate the unit and the value. If anything, the approach using overloaded predicates discourages heterogeneous use of units, because people who query such datasets and compose ontologies using these terms will then be faced with a proliferation of terms. Whereas, even in a dataset with a heterogeneous set of units (for the same kinds of measures), the way you write queries involving measured data and the inferences involved are the same. -- Chime =================================== P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News & World Report (2009). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use only by the individual or entity to which it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. Thank you.
Received on Friday, 10 September 2010 17:47:19 UTC