- From: Greg Tyrelle <gregtyrelle@phalanxbiotech.com>
- Date: Thu, 9 Aug 2007 18:41:42 +0800
- To: Michel_Dumontier <Michel_Dumontier@carleton.ca>
- Cc: "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>
On 8/7/07, Michel_Dumontier <Michel_Dumontier@carleton.ca> wrote: > So a key concern for me is how I, as a user of public resources, > should make statements about them on the semantic web. While certain > data providers might already providing RDF/OWL data with some URI, what > about those that have yet to do this? How should I reference a public > resource provided by the SGD [1] or candidadb [2]? Moreover, what about > the ~1000 database [3] with valuable content, much of it locked away in > relational databases or flat files? How do I make statements about these > resources, without taking the responsibility of serving it up in my own > namespace [4], which might ultimately not integrate with content from > another 3rd party content provider. Do you want to make statements about the HTML representation of the database records in SGD ? I will assume this is not the case as these records already have URL identifiers. Or do you want to make statements about yeast proteins/genes, where SGD is likely to be the authority for providing stable identifiers for said proteins/genes ? If it is the second case, and if I understand you correctly, then your problem is that currently SGD does not provide stable URIs for yeast genes (non-information resources, not database records), but nonetheless you want to make statements about these non-information resources now, without creating further data integration hassles by minting your own identifiers for these non-information resources which will ultimately be equivalent to the identifiers provided by SGD, if and when they do start providing these stable identifiers ? > Inline with my previous comments about the value of the semantic web > for data integration, it would be of great value to have data providers > _register_ the namespace of their resources. In fact, coupling NAR > database issue with base URI registration would open up entirely new > worlds for data integration. Do you think this is worthwhile or > feasible? What other approaches might be considered to alleviate this > problem? A centralized registry, PURL schemes etc. have been suggested, and they will *potentially* solve this problem, but they don't help a yeast biologist from making statements about the yest protein GCN4, right now. Which stable URI should you use for that protein if one doesn't already exist and you're not the authority ? You don't want to wait for one to be made available... The zen moment is, you are an authority, just not the authority. In which case it doesn't matter. Create URIs in your own namespace for whatever non-information resources you want, proteins, genes etc. and worry about the data integration problem after the fact. After all RDF itself does not do data integration, it just facilitates data integration. If your URI identifiers contain SGD gene names or other database identifiers, then direct identifier mapping should be feasible. If not various smushing [1] techniques could be employed. _greg [1] http://esw.w3.org/topic/RdfSmushing -- Greg Tyrelle, Ph.D. Bioinformatics Department Phalanx Biotech Group, Inc. Hsinchu, Taiwan Tel: 886-3-5781168 Ext.504
Received on Thursday, 9 August 2007 15:04:56 UTC