Data wrangler position at Oregon Health & Science University, Portland, OR

The Oregon Health & Science University (OHSU) Library in Portland seeks a
skilled Data Wrangler to lead in data ingestion, transformation, and
quality assurance for a cutting-edge bioinformatics project.

Project Description:
Clinical and translational researchers face a daunting challenge in using
the vast amount of biomedical data to inform their understanding of human
disease mechanisms and develop new therapies. To address this challenge,
the Monarch project is aggregating information about model organisms, in
vitro models, genes, pathways, gene expression, protein and genetic
interactions, orthology, disease, phenotypes, publications, and authors.
The system we are building will provide an ability to navigate multi-scale
spatial and temporal phenotypes across in vivo and in vitro model systems
in the context of genetic and genomic data, using semantics and statistics.

Workplace Description:
OHSU is the state's only comprehensive academic health center and is made
up of the Schools of Dentistry, Medicine, and Nursing; College of Pharmacy;
OHSU Healthcare; and related programs. The OHSU Library, the largest health
sciences library in Oregon, serves the faculty, staff, and students of
OHSU, as well as health professionals and residents of the State of Oregon.
The Data Wrangler will be part of the Ontology Development Group (ODG) and
will work under the guidance of Dr. Carlo Torniai and Dr. Melissa Haendel,
but will also be expected to contribute to the library more generally on
committees, etc., based on the candidate’s experience and interest.

The Data Wrangler serves as a member of the OHSU Library Ontology
Development Group.  This position works in the context of the Monarch
project to develop a research platform in support of investigations of
phenotype-genotype correlations across species. The Data Wrangler will work
with ontologists and bioinformaticians at OHSU and consortium sites to
design and implement tools and strategies for semantically mapping and
manipulating data.

The primary duty of the Data Wrangler will be to research and develop
automation for the ingestion and quality control of data coming from
several biomedical and informatics databases. This will involve the
development of custom scripts and ad-hoc SQL queries, semantic mapping, and
data normalization strategies. After ingestion, s/he will contribute to the
development of optimization strategies in order to transform these data
sets to RDF triples through D2RQ mapping, to be published via a Virtuoso
Server instance. S/he will also develop QA pipelines to ensure consistency
and accuracy of the ingested data before and after transformation.
Moreover, s/he will provide feedback and change requests to the ontologists
in the project in order to ensure a consistent and accurate representation
of the data. This position will require the ability of explore possible
solutions and make decisions that will lead to the identification and
implementation of effective end-user displays of the data, novel approaches
for data analysis, and efficient testing to support data transformation.

Position Conditions/Qualifications:

Required:
• Master’s degree with major courses in relevant field or;  Bachelor's
degree with major courses in field of research plus 4 additional years
related experience.
• 3 years of relevant work experience
• Ability to perform research and make independent decisions about
approaches and tools to reach specific goals
• Experience with semantically annotated data
• Experience with Software Project Management tools (Jira, Confluence, SVN,
Git)
• Hands-on experience with one or more scripting languages (e.g. Perl,
Python, Ruby, Bash)
• Hands-on experience with SQL (Postgres preferred)
• Strong programming skills with a solid understanding of object oriented
languages and principles
• Experience in Java programming
• Strong verbal, written, and interpersonal communication skills
(especially via teleconferencing venues)

Preferred:
• Experience developing and evaluating data curation workflows
• Experience developing ontologies and data models
• Experience in developing Extract Transform Load (ETL) scripts
• Knowledge of SPARQL, RDF, OWL
• Experience in bioinformatics
• Experience in end-user usability for bioinformatics platforms

Duration of this appointment and indicated salary may be changed or
eliminated if gift, grant, or contract funds supporting this position
become unavailable.

Applications and Nominations: To apply please visit ohsujobs.com and search
for position IRC 40016. Applications should include a resume, a letter of
introduction, and contact information for three references. Screening of
applications will commence immediately and continue until the position is
filled. OHSU is an AA/EO employer.

Received on Thursday, 18 July 2013 23:48:04 UTC