RE: [TMO] patient record normalization

* Michel_Dumontier <Michel_Dumontier@carleton.ca> [2010-09-12 12:52-0400]
> 
> 
> > -----Original Message-----
> > From: Eric Prud'hommeaux [mailto:ericw3c@gmail.com] On Behalf Of Eric
> > Prud'hommeaux
> > Sent: Saturday, September 11, 2010 5:13 PM
> > To: Michel_Dumontier
> > Cc: Lee Feigenbaum; Chimezie Ogbuji; public-semweb-lifesci@w3.org
> > Subject: RE: [TMO] patient record normalization
> > 
> > * Michel_Dumontier <Michel_Dumontier@carleton.ca> [2010-09-11 11:31-
> > 0400]
> > > Hi Lee!
> > >
> > > > -----Original Message-----
> > > > From: Lee Feigenbaum [mailto:figtree@gmail.com] On Behalf Of Lee
> > > > Feigenbaum
> > > > Sent: Saturday, September 11, 2010 2:54 AM
> > > > To: Michel_Dumontier
> > > > Cc: Eric Prud'hommeaux; Chimezie Ogbuji; public-semweb-
> > lifesci@w3.org
> > > > Subject: Re: [TMO] patient record normalization
> > > >
> > > > On 9/11/2010 2:04 AM, Michel_Dumontier wrote:
> > > > >>> It's not a restriction on the predicates - it's a restriction
> > on
> > > > >> instances of a certain class - like that of blood pressure
> > > > >> measurements. Checking consistency would tell you whether your
> > data
> > > > >> conforms to the specification described by the ontology
> > document.
> > > > >>
> > > > >> Right, but tells whom, and when? including :measuredInUnits
> > > > advertises
> > > > >> a flexibility which you do not intend to honor.
> > > > >
> > > > > The predicate would only advertise that the domain would be a
> > > > quantity and the range a unit.
> > > >
> > > > Speaking as someone just browsing this discussion (so take my
> > comments
> > > > for what they're worth, which isn't much), I'd tend to agree with
> > Eric
> > > > here. If I (as a human) saw this in an ontology, I'd expect that I
> > can
> > > > freely mix and match units in my data and that any software
> > processing
> > > > the data will cope with it or raise a reasonable error.
> > >
> > > You are most certainly able use a variety of units - but if an
> > ontology specifies the unit, a a dataset imports the ontology, then a
> > valid dataset would conform to this specification.
> > 
> > As several have pointed out, that "dataset imports the ontology"
> > implies an OWL reasoner. This imposition on authoring or consumption
> > is an expensive presumption and likely to lead to non-conformant data,
> > which will impare our ability to query in the bazaar.
> 
> Non-conforming data will always occur. The point is, that if you want to make it compatible with the data model, you have to follow the spec.  This will occur independent of our specific discussion here about specialized or generic predicates.

It is not at all "independent of our specific discussion"; the
simplicity and intuitiveness of specs has a huge impact on
conformance. We should do what we can to make this spec meet the
most reallistic use cases.

Everybody here who has an A-box consistency checker, raise your hand.

> > > > >> If I dereference
> > > > >> :systolicMPa, I learn that the units are exactly MPa. If I
> > > > dereference
> > > > >> muo:numericalValue and muo:measuredUnits, I learn that I can use
> > any
> > > > >> units (misleading).
> > > > >
> > > > > It isn't misleading, it's exactly as advertised.
> > > >
> > > > Would you expect my above assumption to be accurate? It sounded
> > from
> > > > some other messages in the thread that there's a thought that even
> > with
> > > > the "generic" approach that systems would in general handle data in
> > > > homogeneous units?
> > >
> > > The requirement here is that any and all units can be specified using
> > the relation, but an ontology can restrict the number, *kinds* of
> > units, or specific units applicable.
> > 
> > > > >> If I wade through the OWL for TMO, I learn that
> > > > >> there's a restriction for say:
> > > > >>
> > > > >>    Class: tmo:BloodSystolicPressureReading EquivalentTo:
> > > > >>          (:value exactly 1)
> > > > >>           and (:measuredInUnits exactly u:mmHg)
> > > > >>
> > > > > 		and (:measureInUnits only u:mmHg)
> > > > >
> > > > >> which, if I think hard, tells me that I must normalize my data,
> > but
> > > > >> this is pretty far from follow-your-nose semantics.
> > > > >
> > > > > There's no thinking required - the semantics are clearly spelled
> > out
> > > > in the axioms. Instances of this class refer to mmHg as the unit.
> > Any
> > > > instance that refers to a different unit is not a member of this
> > class.
> > > >
> > > > There's no thinking required if you have an OWL reasoner as an
> > integral
> > > > part of your tool chain.
> > >
> > > I think, given that the TMO *is* an OWL2 ontology, that use of the
> > toolchain *is* a requirement.
> > 
> > I don't see any benefit to imposing that requirement on the use of
> > what we'd like to be an adopted ontology. We can describe it in OWL,
> > but to require OWL to use it will alienate most of the world.
> 
> Well, this it's kinda like saying - I'm going to make an XML schema, but you can put whatever you want in it in the XML without validating. I'm having a hard time believing that this is your position.

If I had a choice between two validatable XML schemas, one of which
was more intuitive, I would choose the intuitive one because it would
lead to less invalid data. Offering people a units knob which they are
forbidden turn is an invitation for well-intentioned invalid data.

> > > > Otherwise, there is thinking required. And
> > > > even
> > > > if you have an OWL reasoner in your tool chain, you'd probably have
> > to
> > > > be doing something clever with integrity constraints a la Clark &
> > > > Parsia
> > > > to catch errors this way, rather than just to end up asserting
> > bogus
> > > > data.
> > >
> > > No, I don't believe that is the case.
> > >
> > > m.
> > 
> > Regardless, you'd have to have it and you'd have to be motivated use
> > it.
> 
> Integrity constraints? Or the tool chain? 

The potential author of the data would have to be motivated to acquire
and implement the OWL constraints. I'm not saying no one will; I'm
saying that not everyone will, and we should increase the chances that
the others will intuitively produce valid data.

> > > > Again, apologies if my comments are off-base as I'm mainly just
> > passing
> > > > through here!
> > > >
> > > > Lee
> > > >
> > > > >> I think I have described why authoring is less fault-prone if
> > the
> > > > >> normalized date in TMO uses precise predicates. Do you have
> > other
> > > > use
> > > > >> cases which override that one?
> > 
> > Let's keep the concrete propositions around so we can test these
> > theses:
> > 
> > single-unit predicate:
> > :X trans:bloodPressure
> >   [ trans:systolicMPa 120 ;
> >     trans:diastolicMPa 80 ] .
> > 
> > generic-unit predicate:
> > :X trans:bloodPressure
> >   [ trans:systolic [ muo:measuredIn trans1:MPa ; muo:numericalValue 120
> > ] ;
> >     trans:diastolic [ muo:measuredIn trans1:MPa ; muo:numericalValue 80
> > ] ] .
> 
> or
> 
> :x :has-attribute
>   [ a :systolic-blood-pressure;  :has-value 120; :has-unit unit:mPa ]
>   [ a :diastolic-blood-pressure; :has-value 80; :has-unit unit:mPa ]
> 
> So we have 3 generic predicates; has-attribute, :has-value, :has-unit, and now, as a general design pattern, all we do is specify the kind of measurement value, for which there are thousands. Each of those types can be further described, in terms of the qualities or dispositions they measure, or the material parts they enumerate, or whatever. 
> 
> In contrast, the specialized predicate means that for every value in a test panel would require a predicate between the individual and the test value, and then a predicate for each of the components of a test value. 

Both approaches require a specialized name to differentiate each test
in the panel (class for yours, predicate for mine). Both also work
with either :bloodPressure or :has-attribute as the link from the
individual, so we can factor that out as well. Both can be implied by
the other, so the choice is effectively, which surface syntax is more
useful; which do we want in peoples faces in the absense of inference.
I prefer the terser one with fewer opportunities for misinterpretation
in the absense of OWL.


> An ontology that means to specify which unit for use with a given measurement value can do so, by adding the axiom
> 
> rdfs:subClassOf :has-unit only unit:mPa;
> rdfs:subClassOf :has-value only xsd:int; (or whatever)
> 
> 
> > > > > The counter argument to using a specialized predicate is that
> > > > > 1) we cannot describe a unit
> > 
> > I'm not sure what the use case is, but we can say that the set of
> > things with a trans:systolicMPa->X is equivalent to the set of things
> > with muo:measuredIn->trans1:MPa, muo:numericalValue X . I don't think
> > generic-unit predicates buy us any more than that.
> 
> Rather, I mean to develop an ontology of units - that a unit is a unit for a certain kind of quality, and how the units are related to one another.

I'm excited about that as well. I'm keen for the OWL to say that
systolicMPa is in u:MPa so when we need to interface TMO with non-TMO,
we can make good use of the closure.


> > > > > 2) there is a proliferation of relations as there are countless
> > > > quantities multiplied by each of their respective units.
> > 
> > I see 3 in either case.
> > 
> > > > > 3) relations can only be weakly described (they do not have the
> > class
> > > > constructors available to describe them)
> > 
> > Sorry, I don't follow this one. Can you describe in terms of the
> > proposed vocabulary?
> 
> In OWL2, object properties can be said to be functional, inverse functional, transitive, symmetric, anti-symmetric, reflexive, irreflexive, disjoint, inverse, equivalent to another relation or composed (role chain). What we can't say is that a relation is equivalent to the composition of a relation and a type. That said, what you write 
> 
> > things with a trans:systolicMPa->X 
> > is equivalent to 
> >   things with muo:measuredIn->trans1:MPa, muo:numericalValue X . 
> 
> which using OWL, would only get you to the class equivalence, but not the transfer of type and "X" value. We would need to at least 
> 
>  things with a trans:systolicMPa some xsd:int
>   is equivalent to
>       systolic-blood-pressure 
>          and muo:measuredIn some trans1:MPa 
>          and muo:numericalValue some xsd:int
> 
> and then to transfer the value
>   trans:systomicMPa rdfs:subPropertyOf muo:numericalValue 

ahh, tx.

I'm trying to work out specific use cases to test the effectiveness of
this. With RIF, we could say (pseudo-N3 notation):
  { ?x uom:numericalValue ?v ; muo:measuredIn u:mmHg }
  => { ?x uom:numericalValue ?v ; muo:measuredIn u:MPa }
meeting some TMO to non-TMO interface use cases.

With just OWL, I guess the most practical would be
  u:N_per_mm_squared = u:MPa
, enabling a pretty restricted set of use cases.
What windfalls have you got in mind for the units ontology? Any use
cases we should model here?

> > > > > 4) requires one to query the labels instead of the semantics to
> > find
> > > > the appropriate relation.
> > 
> > Can you give an example here as well?
> 
> - e.g. how do I report systolic blood pressure

yeah, for that you'd have to use some inference.


> > > > > 5) requires one to parse the label for the intended unit.
> > 
> > I'm not sure the practicality of querying for everything in the
> > database which is in MPa, but if you're motivated to do inference,
> > it's in the OWL.
> 
> - e.g. in what unit should I report blood pressure?

this is essentially the same question as above, and similarly requires
some inference.


> > > > > It's a shortcut that makes linked data prettier, but weakens
> > formal
> > > > knowledge representation.
> > > >
> > > >
> > > >  > m.
> > > > >
> > > > >
> > > > >
> > 
> > --
> > -ericP

-- 
-ericP

Received on Sunday, 12 September 2010 21:05:24 UTC