RE: [TMO] patient record normalization from Mark Wilkinson on 2010-09-12 (public-semweb-lifesci@w3.org from September 2010)

From: Mark Wilkinson <markw@illuminae.com>
Date: Sun, 12 Sep 2010 15:09:07 -0700
To: Eric Prud'hommeaux <eric@w3.org>,Michel_Dumontier <Michel_Dumontier@carleton.ca>
CC: Lee Feigenbaum <lee@thefigtrees.net>,Chimezie Ogbuji <ogbujic@ccf.org>,"public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-ID: <6a7dc784-bf12-4fd2-bd4b-4de2b3b3e5c8@email.android.com>
I think there's a delicate balance between doing things "the right way" (whatever we as the "expert community" decides that means for us), and making things overly simplistic such that we then have to bolt-on a more correct solution in the future, potentially over thousands of resources.  We (the HCLS community) are going to be the ones that everyone looks-to for guidance when the late-adopters come on-stream.  As such, i tend towards making the hard choices that are robust and scalable versus making things too simple for the sake of increasing the rate of adoption... though i may well regret taking that position in the future!

What we've been trying to do with our SADI service ontologies is to create sub-predicates of "hasattribute" that are more descriptive, yet the underlying data structure that Michel proposes is intact and "pure".  So far, this seems to be a good compromise ("good" meaning that it accomplishes what we need it to accomplish given our set of use-cases).

Granted, we're in the Semantic Web camp more than we are in the Linked Data camp; we don't envision a world where there are no reasoners to sort-out these problems for us...  that may be a dangerous assumption!  ...but it's the choice we have made :-)

I also agree with Hilmar that the data structures we are supporting have a fairly good history of success in other projects, and haven't caused any painful backlash from the community...

M



-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

"Eric Prud'hommeaux" <eric@w3.org> wrote:

>* Michel_Dumontier <Michel_Dumontier@carleton.ca> [2010-09-12 12:52-0400]
>> 
>> 
>> > -----Original Message-----
>> > From: Eric Prud'hommeaux [mailto:ericw3c@gmail.com] On Behalf Of Eric
>> > Prud'hommeaux
>> > Sent: Saturday, September 11, 2010 5:13 PM
>> > To: Michel_Dumontier
>> > Cc: Lee Feigenbaum; Chimezie Ogbuji; public-semweb-lifesci@w3.org
>> > Subject: RE: [TMO] patient record normalization
>> > 
>> > * Michel_Dumontier <Michel_Dumontier@carleton.ca> [2010-09-11 11:31-
>> > 0400]
>> > > Hi Lee!
>> > >
>> > > > -----Original Message-----
>> > > > From: Lee Feigenbaum [mailto:figtree@gmail.com] On Behalf Of Lee
>> > > > Feigenbaum
>> > > > Sent: Saturday, September 11, 2010 2:54 AM
>> > > > To: Michel_Dumontier
>> > > > Cc: Eric Prud'hommeaux; Chimezie Ogbuji; public-semweb-
>> > lifesci@w3.org
>> > > > Subject: Re: [TMO] patient record normalization
>> > > >
>> > > > On 9/11/2010 2:04 AM, Michel_Dumontier wrote:
>> > > > >>> It's not a restriction on the predicates - it's a restriction
>> > on
>> > > > >> instances of a certain class - like that of blood pressure
>> > > > >> measurements. Checking consistency would tell you whether your
>> > data
>> > > > >> conforms to the specification described by the ontology
>> > document.
>> > > > >>
>> > > > >> Right, but tells whom, and when? including :measuredInUnits
>> > > > advertises
>> > > > >> a flexibility which you do not intend to honor.
>> > > > >
>> > > > > The predicate would only advertise that the domain would be a
>> > > > quantity and the range a unit.
>> > > >
>> > > > Speaking as someone just browsing this discussion (so take my
>> > comments
>> > > > for what they're worth, which isn't much), I'd tend to agree with
>> > Eric
>> > > > here. If I (as a human) saw this in an ontology, I'd expect that I
>> > can
>> > > > freely mix and match units in my data and that any software
>> > processing
>> > > > the data will cope with it or raise a reasonable error.
>> > >
>> > > You are most certainly able use a variety of units - but if an
>> > ontology specifies the unit, a a dataset imports the ontology, then a
>> > valid dataset would conform to this specification.
>> > 
>> > As several have pointed out, that "dataset imports the ontology"
>> > implies an OWL reasoner. This imposition on authoring or consumption
>> > is an expensive presumption and likely to lead to non-conformant data,
>> > which will impare our ability to query in the bazaar.
>> 
>> Non-conforming data will always occur. The point is, that if you want to make it compatible with the data model, you have to follow the spec.  This will occur independent of our specific discussion here about specialized or generic predicates.
>
>It is not at all "independent of our specific discussion"; the
>simplicity and intuitiveness of specs has a huge impact on
>conformance. We should do what we can to make this spec meet the
>most reallistic use cases.
>
>Everybody here who has an A-box consistency checker, raise your hand.
>
>> > > > >> If I dereference
>> > > > >> :systolicMPa, I learn that the units are exactly MPa. If I
>> > > > dereference
>> > > > >> muo:numericalValue and muo:measuredUnits, I learn that I can use
>> > any
>> > > > >> units (misleading).
>> > > > >
>> > > > > It isn't misleading, it's exactly as advertised.
>> > > >
>> > > > Would you expect my above assumption to be accurate? It sounded
>> > from
>> > > > some other messages in the thread that there's a thought that even
>> > with
>> > > > the "generic" approach that systems would in general handle data in
>> > > > homogeneous units?
>> > >
>> > > The requirement here is that any and all units can be specified using
>> > the relation, but an ontology can restrict the number, *kinds* of
>> > units, or specific units applicable.
>> > 
>> > > > >> If I wade through the OWL for TMO, I learn that
>> > > > >> there's a restriction for say:
>> > > > >>
>> > > > >>    Class: tmo:BloodSystolicPressureReading EquivalentTo:
>> > > > >>          (:value exactly 1)
>> > > > >>           and (:measuredInUnits exactly u:mmHg)
>> > > > >>
>> > > > >   and (:measureInUnits only u:mmHg)
>> > > > >
>> > > > >> which, if I think hard, tells me that I must normalize my data,
>> > but
>> > > > >> this is pretty far from follow-your-nose semantics.
>> > > > >
>> > > > > There's no thinking required - the semantics are clearly spelled
>> > out
>> > > > in the axioms. Instances of this class refer to mmHg as the unit.
>> > Any
>> > > > instance that refers to a different unit is not a member of this
>> > class.
>> > > >
>> > > > There's no thinking required if you have an OWL reasoner as an
>> > integral
>> > > > part of your tool chain.
>> > >
>> > > I think, given that the TMO *is* an OWL2 ontology, that use of the
>> > toolchain *is* a requirement.
>> > 
>> > I don't see any benefit to imposing that requirement on the use of
>> > what we'd like to be an adopted ontology. We can describe it in OWL,
>> > but to require OWL to use it will alienate most of the world.
>> 
>> Well, this it's kinda like saying - I'm going to make an XML schema, but you can put whatever you want in it in the XML without validating. I'm having a hard time believing that this is your position.
>
>If I had a choice between two validatable XML schemas, one of which
>was more intuitive, I would choose the intuitive one because it would
>lead to less invalid data. Offering people a units knob which they are
>forbidden turn is an invitation for well-intentioned invalid data.
>
>> > > > Otherwise, there is thinking required. And
>> > > > even
>> > > > if you have an OWL reasoner in your tool chain, you'd probably have
>> > to
>> > > > be doing something clever with integrity constraints a la Clark &
>> > > > Parsia
>> > > > to catch errors this way, rather than just to end up asserting
>> > bogus
>> > > > data.
>> > >
>> > > No, I don't believe that is the case.
>> > >
>> > > m.
>> > 
>> > Regardless, you'd have to have it and you'd have to be motivated use
>> > it.
>> 
>> Integrity constraints? Or the tool chain? 
>
>The potential author of the data would have to be motivated to acquire
>and implement the OWL constraints. I'm not saying no one will; I'm
>saying that not everyone will, and we should increase the chances that
>the others will intuitively produce valid data.
>
>> > > > Again, apologies if my comments are off-base as I'm mainly just
>> > passing
>> > > > through here!
>> > > >
>> > > > Lee
>> > > >
>> > > > >> I think I have described why authoring is less fault-prone if
>> > the
>> > > > >> normalized date in TMO uses precise predicates. Do you have
>> > other
>> > > > use
>> > > > >> cases which override that one?
>> > 
>> > Let's keep the concrete propositions around so we can test these
>> > theses:
>> > 
>> > single-unit predicate:
>> > :X trans:bloodPressure
>> >   [ trans:systolicMPa 120 ;
>> >     trans:diastolicMPa 80 ] .
>> > 
>> > generic-unit predicate:
>> > :X trans:bloodPressure
>> >   [ trans:systolic [ muo:measuredIn trans1:MPa ; muo:numericalValue 120
>> > ] ;
>> >     trans:diastolic [ muo:measuredIn trans1:MPa ; muo:numericalValue 80
>> > ] ] .
>> 
>> or
>> 
>> :x :has-attribute
>>   [ a :systolic-blood-pressure;  :has-value 120; :has-unit unit:mPa ]
>>   [ a :diastolic-blood-pressure; :has-value 80; :has-unit unit:mPa ]
>> 
>> So we have 3 generic predicates; has-attribute, :has-value, :has-unit, and now, as a general design pattern, all we do is specify the kind of measurement value, for which there are thousands. Each of those types can be further described, in terms of the qualities or dispositions they measure, or the material parts they enumerate, or whatever. 
>> 
>> In contrast, the specialized predicate means that for every value in a test panel would require a predicate between the individual and the test value, and then a predicate for each of the components of a test value. 
>
>Both approaches require a specialized name to differentiate each test
>in the panel (class for yours, predicate for mine). Both also work
>with either :bloodPressure or :has-attribute as the link from the
>individual, so we can factor that out as well. Both can be implied by
>the other, so the choice is effectively, which surface syntax is more
>useful; which do we want in peoples faces in the absense of inference.
>I prefer the terser one with fewer opportunities for misinterpretation
>in the absense of OWL.
>
>
>> An ontology that means to specify which unit for use with a given measurement value can do so, by adding the axiom
>> 
>> rdfs:subClassOf :has-unit only unit:mPa;
>> rdfs:subClassOf :has-value only xsd:int; (or whatever)
>> 
>> 
>> > > > > The counter argument to using a specialized predicate is that
>> > > > > 1) we cannot describe a unit
>> > 
>> > I'm not sure what the use case is, but we can say that the set of
>> > things with a trans:systolicMPa->X is equivalent to the set of things
>> > with muo:measuredIn->trans1:MPa, muo:numericalValue X . I don't think
>> > generic-unit predicates buy us any more than that.
>> 
>> Rather, I mean to develop an ontology of units - that a unit is a unit for a certain kind of quality, and how the units are related to one another.
>
>I'm excited about that as well. I'm keen for the OWL to say that
>systolicMPa is in u:MPa so when we need to interface TMO with non-TMO,
>we can make good use of the closure.
>
>
>> > > > > 2) there is a proliferation of relations as there are countless
>> > > > quantities multiplied by each of their respective units.
>> > 
>> > I see 3 in either case.
>> > 
>> > > > > 3) relations can only be weakly described (they do not have the
>> > class
>> > > > constructors available to describe them)
>> > 
>> > Sorry, I don't follow this one. Can you describe in terms of the
>> > proposed vocabulary?
>> 
>> In OWL2, object properties can be said to be functional, inverse functional, transitive, symmetric, anti-symmetric, reflexive, irreflexive, disjoint, inverse, equivalent to another relation or composed (role chain). What we can't say is that a relation is equivalent to the composition of a relation and a type. That said, what you write 
>> 
>> > things with a trans:systolicMPa->X 
>> > is equivalent to 
>> >   things with muo:measuredIn->trans1:MPa, muo:numericalValue X . 
>> 
>> which using OWL, would only get you to the class equivalence, but not the transfer of type and "X" value. We would need to at least 
>> 
>>  things with a trans:systolicMPa some xsd:int
>>   is equivalent to
>>       systolic-blood-pressure 
>>          and muo:measuredIn some trans1:MPa 
>>          and muo:numericalValue some xsd:int
>> 
>> and then to transfer the value
>>   trans:systomicMPa rdfs:subPropertyOf muo:numericalValue 
>
>ahh, tx.
>
>I'm trying to work out specific use cases to test the effectiveness of
>this. With RIF, we could say (pseudo-N3 notation):
>  { ?x uom:numericalValue ?v ; muo:measuredIn u:mmHg }
>  => { ?x uom:numericalValue ?v ; muo:measuredIn u:MPa }
>meeting some TMO to non-TMO interface use cases.
>
>With just OWL, I guess the most practical would be
>  u:N_per_mm_squared = u:MPa
>, enabling a pretty restricted set of use cases.
>What windfalls have you got in mind for the units ontology? Any use
>cases we should model here?
>
>> > > > > 4) requires one to query the labels instead of the semantics to
>> > find
>> > > > the appropriate relation.
>> > 
>> > Can you give an example here as well?
>> 
>> - e.g. how do I report systolic blood pressure
>
>yeah, for that you'd have to use some inference.
>
>
>> > > > > 5) requires one to parse the label for the intended unit.
>> > 
>> > I'm not sure the practicality of querying for everything in the
>> > database which is in MPa, but if you're motivated to do inference,
>> > it's in the OWL.
>> 
>> - e.g. in what unit should I report blood pressure?
>
>this is essentially the same question as above, and similarly requires
>some inference.
>
>
>> > > > > It's a shortcut that makes linked data prettier, but weakens
>> > formal
>> > > > knowledge representation.
>> > > >
>> > > >
>> > > >  > m.
>> > > > >
>> > > > >
>> > > > >
>> > 
>> > --
>> > -ericP
>
>-- 
>-ericP
>
Received on Sunday, 12 September 2010 22:09:48 UTC