Re: making statements on the semantic web from Greg Tyrelle on 2007-08-09 (public-semweb-lifesci@w3.org from August 2007)

From: Greg Tyrelle <gregtyrelle@phalanxbiotech.com>
Date: Thu, 9 Aug 2007 18:41:42 +0800
To: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Cc: "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>
Message-ID: <d152fd3d0708090341n70b08635n959b54d702d28b25@mail.gmail.com>

On 8/7/07, Michel_Dumontier <Michel_Dumontier@carleton.ca> wrote:
>   So a key concern for me is how I, as a user of public resources,
> should make statements about them on the semantic web. While certain
> data providers might already providing RDF/OWL data with some URI, what
> about those that have yet to do this? How should I reference a public
> resource provided by the SGD [1] or candidadb [2]? Moreover, what about
> the ~1000 database [3] with valuable content, much of it locked away in
> relational databases or flat files? How do I make statements about these
> resources, without taking the responsibility of serving it up in my own
> namespace [4], which might ultimately not integrate with content from
> another 3rd party content provider.

Do you want to make statements about the HTML representation of the
database records in SGD ? I will assume this is not the case as these
records already have URL identifiers. Or do you want to make
statements about yeast proteins/genes, where SGD is likely to be the
authority for providing stable identifiers for said proteins/genes ?

If it is the second case, and if I understand you correctly, then your
problem is that currently SGD does not provide stable URIs for yeast
genes (non-information resources, not database records), but
nonetheless you want to make statements about these non-information
resources now, without creating further data integration hassles by
minting your own identifiers for these non-information resources which
will ultimately be equivalent to the identifiers provided by SGD, if
and when they do start providing these stable identifiers ?

>   Inline with my previous comments about the value of the semantic web
> for data integration, it would be of great value to have data providers
> _register_ the namespace of their resources. In fact, coupling NAR
> database issue with base URI registration would open up entirely new
> worlds for data integration. Do you think this is worthwhile or
> feasible? What other approaches might be considered to alleviate this
> problem?

A centralized registry, PURL schemes etc. have been suggested, and
they will *potentially* solve this problem, but they don't help a
yeast biologist from making statements about the yest protein GCN4,
right now. Which stable URI should you use for that protein if one
doesn't already exist and you're not the authority ? You don't want to
wait for one to be made available...

The zen moment is, you are an authority, just not the authority. In
which case it doesn't matter. Create URIs in your own namespace for
whatever non-information resources you want, proteins, genes etc. and
worry about the data integration problem after the fact. After all RDF
itself does not do data integration, it just facilitates data
integration. If your URI identifiers contain SGD gene names or other
database identifiers, then direct identifier mapping should be
feasible. If not various smushing [1] techniques could be employed.

_greg

[1] http://esw.w3.org/topic/RdfSmushing

--
Greg Tyrelle, Ph.D.
Bioinformatics Department
Phalanx Biotech Group, Inc.
Hsinchu, Taiwan
Tel: 886-3-5781168 Ext.504

Received on Thursday, 9 August 2007 15:04:56 UTC