Re: [TMO] patient record normalization from Matthias Samwald on 2010-09-11 (public-semweb-lifesci@w3.org from September 2010)

From: Matthias Samwald <samwald@gmx.at>
Date: Sat, 11 Sep 2010 12:04:07 +0200
To: "Lee Feigenbaum" <lee@thefigtrees.net>, "Michel_Dumontier" <Michel_Dumontier@carleton.ca>
Cc: "Eric Prud'hommeaux" <eric@w3.org>, "Chimezie Ogbuji" <ogbujic@ccf.org>, <public-semweb-lifesci@w3.org>
Message-ID: <061105541D7442419768E0CF9F195B8D@zetsu>

I guess we should keep in mind that this discussion was (at least 
originally) not about how units are represented on the Semantic Web, but how 
they should be represented for a specific project: the TMO. Different 
people, projects and communities will have different needs, and we will not 
be able to achieve a consensus that will make everyone happy. Therefore, it 
might be reasonable to focus on the specific case of TMO -- and maybe some 
of the consensus we reach there can be generalized to other areas.

David wrote:
> the Mars Climate Orbiter was famously lost because one team assumed Metric 
> units and another team assumed English units

It is silly not to include explicit information about units, but it might be 
equally silly not to use SI units in a science or technology environment. I 
guess it might be easy to say this as a continental European, but non-SI 
units should be eradicated from sci/tech data. That might have more impact 
on interoperability than any standardized vocabularies, mapping algorithms 
etc., and it might be simpler to implement in the long run.

However, I see one problem with requiring data providers to convert their 
units to standard units (besides the extra effort involved): in some 
settings it might be important to capture the _original_ value and unit of 
the measurement, just for the sake of knowing the original datum. This might 
even be a legal requirement in some clinical settings. In my understanding, 
the goal of TMO is to be used in translational research, not clinical 
practice, and therefore this will probably not be an issue.

Mark wrote:
> It speaks to a conversation that I had with my review committee this 
> morning about how The Web was built by simply being completely open. 
> Anyone could (can) publish anything in any way they want, so long as they 
> adhere to the simple rules of HTML.  I am very concerned that the Semantic 
> Web is not learning its lessons from the WWW.  We are trying to 
> institutionalize everything, and that simply doesn't work (it doesn't 
> scale!).

I guess the classic web and its tremendous global success is a good 
inspiration, but I am not sure about how easily the principles of the web 
can be translated into principles of the web of data. The 'anything goes' 
approach might just shift the problem from the data publishing phase to the 
data consumption phase, which could result in the temporary belief of having 
solved the problem.
Let me make a bold statement: there is no lack of biomedical RDF data 
anymore. In fact, we are now in a situation where the same open dataset is 
often RDFized several times by different groups. This growing number of 
duplicated efforts is an interesting new development, and I might try to 
document and analyze this trend when I find the time.
Still, it is far from trivial to actually query these datasets, because of 
their heterogeneity. The answer is not to institutionalize everything, but 
to simply make RDF publishers better aware of concerns about overabundant 
heterogeneity and lack of transparency. And it could be a good reason to 
reduce sources of heterogeneity in a project that is under our control, such 
as the TMO.

Cheers,
Matthias Samwald

// DERI Galway, Ireland
// Konrad Lorenz Institute for Evolution and Cognition Research, Austria
// http://samwald.info

Received on Saturday, 11 September 2010 10:04:47 UTC