- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 10 Sep 2009 13:29:43 -0400
- To: public-semweb-lifesci@w3.org
- Cc: Oshani Seneviratne <oshani@csail.mit.edu>
There's a one-day CSAIL (comp sci and AI -- the lab which hosts W3C) workshop where all the grad students and professors get together and talk about their work. Oshani (Cc'd) is organizing this event and I told her I'd stand up for 20 or 30 mins (I forget) to talk about HCLS with the goal of enticing students to work with us. Following is a brief outline (expected to be flushed to two pages by the end of the day) of what we can write in the proceedings. Any help or prepared material greatly appreciated. Likewise, guidance from Oshani on what would be useful to have in the proceedings. @@ indicates that I don't just want input, I neeeed it. Intro: W3C is an international industrial standards organization. Where IETF standardizes internet wire protocols, we standardize web payloads. You may have heard of HTML, XML, Semantic Web... We cover work in a broad set of domains, Interaction, Ubiquitous Web, Accessibility, and the catch-all, Technology and Society. Following is an introduction to the work of one group, the Semantic Web in Health Care and Life Sciences Interest Group (HCLS IG). As the name suggests, the folks in the HCLS group are focused on the application of Semantic Web technologies to the challenges in their domains. The participants come from: life sciences: proteomics, neurology, genetics health care: hospitals, clinics, insurance companies and everything in between: pharmaceuticals, clinical research organizations Each of these concentrations incurs large costs when classifying and sharing their copius knowledge, and when integrating data with conceptually adjacent concentrations. This leads to losses in money, productivity and satisfaction (no one enjoys using post-its to do work that a computer should do). One of the principle obstacles to sharing knowledge is sharing a coding system. If I say \1People\04\0ID\0FN\0LN\0ADR\0\212817\0Eric\0Prud'hommeaux\023\0 you might, with sufficient interest and patience, guess that I'm dumping some binary databse. A form like <People> <ID>12817</ID><FN>Eric</FN><LN>Prud'hommeaux</LN><ADR href="#a23"> ... is more human-parsable, but doesn't tell you if this is the same Eric as any other Eric on the web ("12817" is still ambiguous). The Semantic Web combines simple declarations using URIs for disambiguation with a culture of schema re-use and extension for maximum interoperability. Here is how the HCLS IG is applying these ideas: Terminology: The foundation of unambiguous statements is unambiguous terms. Consistent, sharable identifiers connect the vertexes of conceptually adjacent graphs, providing a spanning schema with very little coding effort. That's the ideal world. In reality, any coding system takes effort, and the real engineering comes in making a system where, by carrot or stick, people or systems are incented to find the correct term for e.g. a protein receptor or an increased pulmanary adema due to failing atrioventricular valve. Because we want to use the existing infrastructure and corpus of data, we need to re-use existing term sets and extend them to give us unambiguous semantics which machines can use. There are about 20 @@JohnM? medical and anatomical terms sets in popular use in clinics today. They've largely grown organically, with insufficent mechanism to prevent either duplication or ambiguous definitions. Given different use cases, they've captured different levels of formal relationships between the terms. For instance, many SNOMED terms are related by an |isa| relationship, but that stands for both |type| and |sub class| (as well as a few other terms). Different use cases motivate different intimacies of models. For instance, SNOMED can be expressed in the Semantic Web by simply quoting this noncommittal |isa|, or we can express *some* of these isa relationships as inherently transitive subclass relationships. SNOMED has been expressed in very general non-transitive languages and in intimate description logic languages which can help you debug your model by discovering inconsistancies and unsatisfiable classes. BioRDF: This task force started by contributing neurology and micro-anatomical data to a large data warehouse with the goal of answering drug discovery queries. This work involved the construction modeling of existing databases as RDF and the mechanics of converting and dumping that data into this materialized view of the semantic web. The group is continuing with the modeling aspect, though now the conversion is done by having Semantic Web query wrappers around existing databases, reducing storage and latency. This work has inspired extensions to the SPARQL query language, which should be incorporated into the standard within a year. Linking Open Drug Data: Where the BioRDF Task Force focuses on neuroscience queries, the LODD Task Force focuses on expressing the masses of publicly available (pharmaceutical) drug data in the Semantic Web. While the FDA has collected this data as part of the drug approval process, the data has never been colated in a consistent form and has had very little use beyond providing a paper trail for liability cases. Building on the Linked Open Data project, the LODD extends this large crystal to enable use cases like longitudinal studies of drug safety and efficacy. The data comes from public sources like |clinicaltrials.gov| as well as private contributions like |@@lilly's data@@| and depends on the LOD cloud for terms for e.g. drugs, drug classes, chemical compounds, etc. Clinical Observations Interoperability: Selecting the correct patients for a clinical study is critical to measuring the safety and efficacy of the drug. While hospitals and clinics have most of the data needed to find candidates, conventional approaches are hampered by the diversity of schemas and insufficiently intimate security models (often no access, full access, or access to expensive anonymized dumps). This task force has used a simple language to translate real hospital data to SemWeb-friendly views, mapping from the relational database to a shared ontology based on the HL7 RIM standards. The mapping language enables query language, which creates a virtual view of the database, but available on the Semantic Web in a number of popular shared schemas. This task force produced a pipeline in which a researcher was able to compose a query in researcher-speak, a rule translated that to hospital-speak, another rule translated it to the schema for an individual hospital, and finally the query was expressed and executed as SQL. The group is not developing a security model using the same mapping language, providing a correspondance between security levels in the virtual views and those mandated by law and reallized conventionally enforced in XACML. Translational Medicine Ontology: Translational medicine is an area of pharmacology which incorporates data from a wide set of sources. The goal of "getting the right medication to the right person at the right time" requires access to many aspects of the patient's health, physiology and behavoir, possible chemical and bilogical reactions associated with the candidate medication, patient's diet and metabolism, and the history of data gathered during drug studies and post-market data acquisition. Translational medicine is perhaps the ultimate data integration use case. This task force is drawing on expertise from several pharmaceuticals to create a network of ontologies. Starting from a set of health care roles and the questions they would ask, the group is creating the infrastrucure, both conceptual and programmatic, to answer these life- saving questions. Scientific Discourse: The Alzforum <http://www/alzforum.org/> has provided researchers with a gathering and dissemination point which has become the focal point (@@too strong?) of Alzheimer Research. The Drupal plugin Science Collaboration Framework provides the core functionality for that system, as well as providing a testbed for how increased coding can improve the utility and user experience. At the core is a scientific discourse ontology which describes theories, citations, hypotheses and evidence. This is tied to a popular Semantic Web schema for associating persona with publications, including modern variants like blogs and wiki articles. The product is a representation of supporting and conflicting theories, chains of evidence, etc. Use cases of course include finding scientists with certain areas of interest/expertise, as well as surprising ones like finding necessary research areas based on conflicting theories. Invitation: You've seen a taste of one group at W3C, and perhaps have a taste of what else we do. We have no shortage of interesting research ideas. I invite any of you to come to contribute your own expertise and insights. You can take formal steps by filling in <http://www.w3.org/2002/09/wbs/1/ieapp/> and joining a working group, or just come to the W3C ghetto to talk with us and get an idea about what we do. We look forward to working with you. -- -eric office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA mobile: +1.617.599.3509 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution. There are subtle nuances encoded in font variation and clever layout which can only be seen by printing this message on high-clay paper.
Received on Thursday, 10 September 2009 17:30:25 UTC