Re: A survey of work done within HCLS

Hi Eric et al,
Thank you very much for agreeing to give a talk about the HCLS IG at the
CSAIL Student Workshop. I am sure it will be very useful for the student
community here at MIT/CSAIL.

As for the proceedings, it would be nice if you can send me a short paper
(not more than 2 pages long) giving :
1. a brief intro about the Semantic Web (as most of the attendees of the
workshop will not be from the Semweb community),
2. overview of the work done by HCLS within the W3C,
3. any related research work carried out by the members of this group,
and basically anything you think that might interest the CSAIL student
community.

While the short paper is not required, we need an abstract of the talk to be
published ASAP at [1]. So, I would really appreciate if you can send me the
talk abstract and /or the short paper by noon Friday Sep 25th the latest.

Thanks!
Oshani

[1] http://projects.csail.mit.edu/csw/2009/index.php?page=sched

On Thu, Sep 10, 2009 at 1:29 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

> There's a one-day CSAIL (comp sci and AI -- the lab which hosts W3C)
> workshop where all the grad students and professors get together and
> talk about their work. Oshani (Cc'd) is organizing this event and I
> told her I'd stand up for 20 or 30 mins (I forget) to talk about HCLS
> with the goal of enticing students to work with us. Following is a
> brief outline (expected to be flushed to two pages by the end of the
> day) of what we can write in the proceedings. Any help or prepared
> material greatly appreciated. Likewise, guidance from Oshani on what
> would be useful to have in the proceedings. @@ indicates that I don't
> just want input, I neeeed it.
>
> Intro:
> W3C is an international industrial standards organization. Where IETF
> standardizes internet wire protocols, we standardize web payloads. You
> may have heard of HTML, XML, Semantic Web...
>
> We cover work in a broad set of domains, Interaction, Ubiquitous Web,
> Accessibility, and the catch-all, Technology and Society. Following is
> an introduction to the work of one group, the Semantic Web in Health
> Care and Life Sciences Interest Group (HCLS IG).
>
> As the name suggests, the folks in the HCLS group are focused on the
> application of Semantic Web technologies to the challenges in their
> domains. The participants come from:
>  life sciences: proteomics, neurology, genetics
>  health care: hospitals, clinics, insurance companies
>  and everything in between: pharmaceuticals, clinical research
> organizations
>
> Each of these concentrations incurs large costs when classifying and
> sharing their copius knowledge, and when integrating data with
> conceptually adjacent concentrations. This leads to losses in money,
> productivity and satisfaction (no one enjoys using post-its to do work
> that a computer should do). One of the principle obstacles to sharing
> knowledge is sharing a coding system. If I say
>  \1People\04\0ID\0FN\0LN\0ADR\0\212817\0Eric\0Prud'hommeaux\023\0
> you might, with sufficient interest and patience, guess that I'm
> dumping some binary databse. A form like
>  <People>
>    <ID>12817</ID><FN>Eric</FN><LN>Prud'hommeaux</LN><ADR href="#a23">
>  ...
> is more human-parsable, but doesn't tell you if this is the same Eric
> as any other Eric on the web ("12817" is still ambiguous). The
> Semantic Web combines simple declarations using URIs for
> disambiguation with a culture of schema re-use and extension for
> maximum interoperability. Here is how the HCLS IG is applying these
> ideas:
>
> Terminology:
> The foundation of unambiguous statements is unambiguous
> terms. Consistent, sharable identifiers connect the vertexes of
> conceptually adjacent graphs, providing a spanning schema with very
> little coding effort. That's the ideal world. In reality, any coding
> system takes effort, and the real engineering comes in making a system
> where, by carrot or stick, people or systems are incented to find the
> correct term for e.g. a protein receptor or an increased pulmanary
> adema due to failing atrioventricular valve. Because we want to use
> the existing infrastructure and corpus of data, we need to re-use
> existing term sets and extend them to give us unambiguous semantics
> which machines can use.
>
> There are about 20 @@JohnM? medical and anatomical terms sets in
> popular use in clinics today. They've largely grown organically, with
> insufficent mechanism to prevent either duplication or ambiguous
> definitions. Given different use cases, they've captured different
> levels of formal relationships between the terms. For instance, many
> SNOMED terms are related by an |isa| relationship, but that stands for
> both |type| and |sub class| (as well as a few other terms).
>
> Different use cases motivate different intimacies of models. For
> instance, SNOMED can be expressed in the Semantic Web by simply
> quoting this noncommittal |isa|, or we can express *some* of these isa
> relationships as inherently transitive subclass relationships. SNOMED
> has been expressed in very general non-transitive languages and in
> intimate description logic languages which can help you debug your
> model by discovering inconsistancies and unsatisfiable classes.
>
>
> BioRDF:
> This task force started by contributing neurology and micro-anatomical
> data to a large data warehouse with the goal of answering drug
> discovery queries. This work involved the construction modeling of
> existing databases as RDF and the mechanics of converting and dumping
> that data into this materialized view of the semantic web.
>
> The group is continuing with the modeling aspect, though now the
> conversion is done by having Semantic Web query wrappers around
> existing databases, reducing storage and latency. This work has
> inspired extensions to the SPARQL query language, which should be
> incorporated into the standard within a year.
>
> Linking Open Drug Data:
> Where the BioRDF Task Force focuses on neuroscience queries, the LODD
> Task Force focuses on expressing the masses of publicly available
> (pharmaceutical) drug data in the Semantic Web. While the FDA has
> collected this data as part of the drug approval process, the data has
> never been colated in a consistent form and has had very little use
> beyond providing a paper trail for liability cases.
>
> Building on the Linked Open Data project, the LODD extends this large
> crystal to enable use cases like longitudinal studies of drug safety
> and efficacy. The data comes from public sources like
> |clinicaltrials.gov| as well as private contributions like |@@lilly's
> data@@| and depends on the LOD cloud for terms for e.g. drugs, drug
> classes, chemical compounds, etc.
>
> Clinical Observations Interoperability:
> Selecting the correct patients for a clinical study is critical to
> measuring the safety and efficacy of the drug. While hospitals and
> clinics have most of the data needed to find candidates, conventional
> approaches are hampered by the diversity of schemas and insufficiently
> intimate security models (often no access, full access, or access to
> expensive anonymized dumps). This task force has used a simple
> language to translate real hospital data to SemWeb-friendly views,
> mapping from the relational database to a shared ontology based on the
> HL7 RIM standards.
>
> The mapping language enables query language, which creates a virtual
> view of the database, but available on the Semantic Web in a number of
> popular shared schemas.  This task force produced a pipeline in which
> a researcher was able to compose a query in researcher-speak, a rule
> translated that to hospital-speak, another rule translated it to the
> schema for an individual hospital, and finally the query was expressed
> and executed as SQL.
>
> The group is not developing a security model using the same mapping
> language, providing a correspondance between security levels in the
> virtual views and those mandated by law and reallized conventionally
> enforced in XACML.
>
> Translational Medicine Ontology:
> Translational medicine is an area of pharmacology which incorporates
> data from a wide set of sources. The goal of "getting the right
> medication to the right person at the right time" requires access to
> many aspects of the patient's health, physiology and behavoir,
> possible chemical and bilogical reactions associated with the
> candidate medication, patient's diet and metabolism, and the history
> of data gathered during drug studies and post-market data
> acquisition. Translational medicine is perhaps the ultimate data
> integration use case.
>
>
> This task force is drawing on expertise from several pharmaceuticals
> to create a network of ontologies. Starting from a set of health care
> roles and the questions they would ask, the group is creating the
> infrastrucure, both conceptual and programmatic, to answer these life-
> saving questions.
>
> Scientific Discourse:
> The Alzforum <http://www/alzforum.org/> has provided researchers with
> a gathering and dissemination point which has become the focal point
> (@@too strong?) of Alzheimer Research. The Drupal plugin Science
> Collaboration Framework provides the core functionality for that
> system, as well as providing a testbed for how increased coding can
> improve the utility and user experience.
>
> At the core is a scientific discourse ontology which describes
> theories, citations, hypotheses and evidence. This is tied to a
> popular Semantic Web schema for associating persona with publications,
> including modern variants like blogs and wiki articles. The product is
> a representation of supporting and conflicting theories, chains of
> evidence, etc. Use cases of course include finding scientists with
> certain areas of interest/expertise, as well as surprising ones like
> finding necessary research areas based on conflicting theories.
>
> Invitation:
> You've seen a taste of one group at W3C, and perhaps have a taste of
> what else we do. We have no shortage of interesting research ideas. I
> invite any of you to come to contribute your own expertise and
> insights. You can take formal steps by filling in
> <http://www.w3.org/2002/09/wbs/1/ieapp/> and joining a working group,
> or just come to the W3C ghetto to talk with us and get an idea about
> what we do. We look forward to working with you.
> --
> -eric
>
> office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA
> mobile: +1.617.599.3509
>
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose other than
> email address distribution.
>
> There are subtle nuances encoded in font variation and clever layout
> which can only be seen by printing this message on high-clay paper.
>

Received on Wednesday, 23 September 2009 18:44:57 UTC