Re: [TMO] patient record normalization from Mark on 2010-09-10 (public-semweb-lifesci@w3.org from September 2010)

From: Mark <markw@illuminae.com>
Date: Fri, 10 Sep 2010 15:26:00 -0700
To: "Eric Prud'hommeaux" <eric@w3.org>, "M. Scott Marshall" <mscottmarshall@gmail.com>
Cc: "Chimezie Ogbuji" <ogbujic@ccf.org>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>, Michel_Dumontier <Michel_Dumontier@carleton.ca>
Message-ID: <op.vit8lmfzdeqt07@bighome>
As usual, I agree with Scott (this is becoming a habit!  LOL!  Scott, we  
should really try to work together more closely!)

It speaks to a conversation that I had with my review committee this  
morning about how The Web was built by simply being completely open.   
Anyone could (can) publish anything in any way they want, so long as they  
adhere to the simple rules of HTML.  I am very concerned that the Semantic  
Web is not learning its lessons from the WWW.  We are trying to  
institutionalize everything, and that simply doesn't work (it doesn't  
scale!).

If we, as the HCLS community, simply say "it is best-practice to be  
explicit about your units when you publish a value", and create the  
ontologies that make it easy and obvious how to do so, then we let the  
semantic web organically grow... people will eventually see the wisdom of  
doing it "our way", and moreover, we can add a semantic layer on top of  
that ontology that easily allows us to switch between different units  
(e.g. we do this in SADI when calculating Body Mass Index through a SADI  
Web Service - Luke pointed-out to the review committee this morning that  
we even accept the British "Stone" as a unit of weight when doing our  
calculations... because any unit is legitimate and we have to accept that  
different people have different preferred units!  The semantic layer  
ensures that we don't make Mars Lander kinds of errors.)

Trying to impose rules on a global population is simply a non-starter.  We  
really need to be prepared to deal with (in our semantic layer) any  
possibility... the only "rule" on the Semantic Web (IMO) is that you  
*must* be explicit about everything you publish.

...but as Chris Mungall said to me at ISMB - I am a Centralization  
Skeptic... and he's right! :-)  (he actually said that I am a "big  
ontology skeptic", but I have generalized his statement)

I'm a Semantic Web Libertarian!  ...and I agree with Scott!

M



On Fri, 10 Sep 2010 14:42:02 -0700, M. Scott Marshall  
<mscottmarshall@gmail.com> wrote:

> Hi Eric,
>
> The business of standardizing units reminds me of:
>
> http://science.nasa.gov/science-news/science-at-nasa/2007/08jan_metricmoon/
> followed by:
> http://news.bbc.co.uk/2/hi/science/nature/462264.stm
>
> For me, the story of losing an orbiter because of an accidental clash
> between imperial and metric units was a poster child for Semantic Web,
> as well as the problem you describe. You see, the machines will never
> know what the numbers mean unless we use a Semantic layer as well as a
> syntactic layer. The problem with units is that they seem to be
> somehow both semantic and syntactic, somewhere in between.
>
> Hard as I try, I don't understand why you want to change the way that
> you describe data to constrain the data that is being described. Well,
> actually I do. You want to force anyone annotating or publishing data
> in the TMO vocabulary to use a single set of units (right?). It could
> be an effective way to achieve the goal but it seems rather heavy
> handed. Overloading a predicate and adding English parameters to it
> might make the requirements obvious to people that they're only
> supposed to use your units (because you provide no others) when they
> use your  ontology but it doesn't solve the problem. Yes,
> normalization of units is necessary in order to integrate data. But
> the problem of normalization won't go away if you glob two semantic
> aspects together in the *description of the data* (i.e. blood pressure
> measurement type and units). I see from your language that you think
> that it will force users to "inject" data into the data model with the
> preferred units when publishing data in the TMO vocabulary but doesn't
> this just point to the processing that is unavoidable for
> integrating/comparing data? We will always need to get data into the
> same units in order to integrate it. I feel your pain as you try to
> solve it in SPARQL (and I see that it can be a very real problem), but
> I think there must be a better way than to overload a predicate and
> thereby obfuscate the data model. If nothing else, let's depend on
> consistency checks and good documentation, as already suggested. We
> can't expect to accomplish *everything* in SPARQL.
>
> Actually, isn't this a data publishing issue? If someone publishes
> systolic blood pressure values as linked data using TMO, shouldn't
> they refer to the TMO ontology and the units that they used in the
> provenance of the named graph containing it? If we know from the
> provenance about the named graph that it uses TMO [<graphURI>
> void:usesVocabulary TMO] and MmHg [bloodPressureMeasurements hasUnits
> MmHg] to describe blood pressure, then we can use that information in
> order to pre-select the graph during federation (in a world of
> abundance and sloppy units). In this way, we could automatically
> convert values as needed, presumably based on conversions that derive
> from the unit ontology (non?). Although such a software feat might
> require coding or reasoning outside SPARQL, it already does.
>
> Clear tagging of the data with units should be a best practice in and
> outside the Semantic Web. I am in favor of a two component approach,
> complemented by good provenance practice.
>
> -Scott
>
> On Fri, Sep 10, 2010 at 10:30 PM, Michel_Dumontier
> <Michel_Dumontier@carleton.ca> wrote:
>>
>>> But then anyone merging two TMO documents with different units has the
>>> normalization burden. If we pick a unit and annotate the predicates,
>>> then the folks who would have to do the work of merging with non-TMO
>>> documents (who would have to introduce some rules/canonicalization
>>> pipeline anyways) have the OWL hooks to automate that merging.
>>
>> Again, if we are considering TMO, then we can impose a restriction to  
>> specify the unit - we can also make this clear in documentation  
>> relating to the measurements with units.
>>
>>> > Also, having domain-independent predicates makes it easier to render
>>> a view
>>> > of the data (for human consumption) that includes visual cues
>>> regarding the
>>> > units of measures associated with values directly from the data since
>>> such
>>> > tools will always expect the same set of terms to capture a value and
>>> its
>>> > unit of measurement.
>>>
>>> If you've bought the argument for early normalization, isn't it
>>> needlessly dangerous to offer the freedom to express BP in mmHg in an
>>> ontology that's required to have BP in MPa? It does put more burden on
>>> the use of generic data browsers (they'd have to read the OWL in order
>>> to present the user with units), but I think that use case is small
>>> compared to the cost of data consumption.
>>
>> I don't think we should tailor our data model to generic data browsers  
>> - they are far too simple for the complex knowledge that we have to  
>> represent.
>>
>> m.
Received on Friday, 10 September 2010 22:26:50 UTC