- From: M. Scott Marshall <mscottmarshall@gmail.com>
- Date: Fri, 10 Sep 2010 23:42:02 +0200
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: Chimezie Ogbuji <ogbujic@ccf.org>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>, Michel_Dumontier <Michel_Dumontier@carleton.ca>
Hi Eric, The business of standardizing units reminds me of: http://science.nasa.gov/science-news/science-at-nasa/2007/08jan_metricmoon/ followed by: http://news.bbc.co.uk/2/hi/science/nature/462264.stm For me, the story of losing an orbiter because of an accidental clash between imperial and metric units was a poster child for Semantic Web, as well as the problem you describe. You see, the machines will never know what the numbers mean unless we use a Semantic layer as well as a syntactic layer. The problem with units is that they seem to be somehow both semantic and syntactic, somewhere in between. Hard as I try, I don't understand why you want to change the way that you describe data to constrain the data that is being described. Well, actually I do. You want to force anyone annotating or publishing data in the TMO vocabulary to use a single set of units (right?). It could be an effective way to achieve the goal but it seems rather heavy handed. Overloading a predicate and adding English parameters to it might make the requirements obvious to people that they're only supposed to use your units (because you provide no others) when they use your ontology but it doesn't solve the problem. Yes, normalization of units is necessary in order to integrate data. But the problem of normalization won't go away if you glob two semantic aspects together in the *description of the data* (i.e. blood pressure measurement type and units). I see from your language that you think that it will force users to "inject" data into the data model with the preferred units when publishing data in the TMO vocabulary but doesn't this just point to the processing that is unavoidable for integrating/comparing data? We will always need to get data into the same units in order to integrate it. I feel your pain as you try to solve it in SPARQL (and I see that it can be a very real problem), but I think there must be a better way than to overload a predicate and thereby obfuscate the data model. If nothing else, let's depend on consistency checks and good documentation, as already suggested. We can't expect to accomplish *everything* in SPARQL. Actually, isn't this a data publishing issue? If someone publishes systolic blood pressure values as linked data using TMO, shouldn't they refer to the TMO ontology and the units that they used in the provenance of the named graph containing it? If we know from the provenance about the named graph that it uses TMO [<graphURI> void:usesVocabulary TMO] and MmHg [bloodPressureMeasurements hasUnits MmHg] to describe blood pressure, then we can use that information in order to pre-select the graph during federation (in a world of abundance and sloppy units). In this way, we could automatically convert values as needed, presumably based on conversions that derive from the unit ontology (non?). Although such a software feat might require coding or reasoning outside SPARQL, it already does. Clear tagging of the data with units should be a best practice in and outside the Semantic Web. I am in favor of a two component approach, complemented by good provenance practice. -Scott On Fri, Sep 10, 2010 at 10:30 PM, Michel_Dumontier <Michel_Dumontier@carleton.ca> wrote: > >> But then anyone merging two TMO documents with different units has the >> normalization burden. If we pick a unit and annotate the predicates, >> then the folks who would have to do the work of merging with non-TMO >> documents (who would have to introduce some rules/canonicalization >> pipeline anyways) have the OWL hooks to automate that merging. > > Again, if we are considering TMO, then we can impose a restriction to specify the unit - we can also make this clear in documentation relating to the measurements with units. > >> > Also, having domain-independent predicates makes it easier to render >> a view >> > of the data (for human consumption) that includes visual cues >> regarding the >> > units of measures associated with values directly from the data since >> such >> > tools will always expect the same set of terms to capture a value and >> its >> > unit of measurement. >> >> If you've bought the argument for early normalization, isn't it >> needlessly dangerous to offer the freedom to express BP in mmHg in an >> ontology that's required to have BP in MPa? It does put more burden on >> the use of generic data browsers (they'd have to read the OWL in order >> to present the user with units), but I think that use case is small >> compared to the cost of data consumption. > > I don't think we should tailor our data model to generic data browsers - they are far too simple for the complex knowledge that we have to represent. > > m.
Received on Friday, 10 September 2010 21:42:31 UTC