- From: Jonathan Rees <jar@creativecommons.org>
- Date: Tue, 23 Oct 2007 10:03:16 -0400
- To: Peter Ansell <ansell.peter@gmail.com>
- Cc: public-semweb-lifesci@w3.org, p.roe@qut.edu.au, j.hogan@qut.edu.au
First let me thank you for taking a serious look at the requirements. I appreciate it. On Oct 21, 2007, at 7:44 PM, Peter Ansell wrote: > Hi all, > > I have been using the Bio2Rdf markup system and I personally do not > see what all the fuss is about but there must be something so here are > my opinions based solely on the requirements document > > http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/ > URI_Best_Practices/Recommendations/Requirements > > # For our own resources, what URIs to mint and what contracts to > adhere to regarding well-definedness and documentation > > Publically retrievable metadata for ones personally produced/published > information (if not data as well) should be available using URI's > matched to one's institution/organisation, with relevant owl:sameAs > and rdfs:seeAlso tags to specify their relationships to other known > uri's. As Eric Jain pointed out, what HCLS has been trying to do (for the most part) is not to establish tools for private use, but rather to establish a *protocol* for clear communication of information (ideas, facts, observations, conclusions) in RDF over the Internet. Whether this protocol is a formal one involving published specifications or simply informal agreement about what we're going to do is not to the point. My conjecture is that the central thing we need to agree on is the minimal meaning of terms (URIs): when you should use certain terms (in talking to me) and when you shouldn't. Absent common terms with common meanings there is little hope of communication. Maybe this sounds trivial, but the records of this mailing indicate that it's not (record vs. protein, ontology boundary cases, time dependence, ontology versioning, points of view, document variants, etc.). > Advantages: One does not need to negotiate with the original author in > order to augment their definition, and people who actually want to > know things have clear unambiguous ways of getting to their goal. Advantages over what? The requirements take no position on HTTP protocol vs. any other resolution method (including sameAs, DNS, LSID, handle, info:, CORBA, dewey decimal, ...). Clearly there are differences of opinion about definitions, so we need to be somewhat careful about this. You can define your internal terms as you like, but as soon as you use sameAs (or equivalentClass etc.) in order to communicate with me, you are making a very strong statement - that the things you've said about the thing you're talking about are true of the thing I would talk about using the shared term. We have to have a shared understanding to communicate. We may not share a definition, but we have to share some aspect of meaning or use of the term, and I would say that has to be documented somewhere. When you assert sameAs, this is a matter of judgment or hypothesis, and you and I might argue about it as we'd argue about any kind of assertion in science; but if we don't have a starting point, an agreement on what minimally required of *any* description (your "definition") you OR I make, we won't have any basis for disagreement. (Just as in this discussion!) We agree that if definitions are different then different terms are needed; I think that's the effect of "matched to one's institution" which I agree is a good safe position. If we have a quarrel here it is probably tactical, or maybe about what "definition" means as opposed to description, not fundamental. > Follows the process of how knowledge is developed, ie, someone comes > up with an idea and develops it themselves with citations to outside > publications. Absolutely - we're on the same page. I'll look at your statements and see which can be added to a revision (or explanation) of the requirements. > Disadvantages, sparql queries are not simple, but I use programmatic > level access and enable the retrieval of sameAs items through code > which then abstracts queries to utilise all known identifiers when > querying. People don't actually want to write sparql queries > themselves, they are biologists or doctors, who just want to click on > a button and have it work for them, whether the program does one or > three queries is basically inconsequential to them. To me this is an acceptable use case. When you assert a sameAs you're saying that for the purposes of this query you would like to assume that these URIs all name the same thing. This can be hypothetical, just as any of your biological relationships are hypothetical, with sameness assertions relative to "trust" in sources for a particular application - roughly speaking, you might assert lots of sameAses for higher recall, or fewer for higher precision. SameAs is a very strong statement, so you might consider using a superproperty of it, but let's not get into that as Alan R and Richard C have already been over this territory recently. > > # What particular URI's to use for resources related to public > databases (esp. database records) (>4 proposals on table) > > Admittedly this is an issue, but so far I like being able to have the > best of lsid and http: uri's with the bio2rdf markup schemata. Simple > text URI's not matching is inconsequential if one has metadata > identifying two URI's as identical. I think you try to match URIs or use aliased URIs depending on what you're trying to talk about. If you want to talk about what someone else is talking about, you have to use a shared term. But private terms are important if you don't know whether what you're talking about is the same as what they're talking about. If you discover some difference you can always retract the sameAs. The alternative is to retract all of your assertions about the thing and rewrite in terms of a new URI at that point. I think we're in agreement. Personally I like to be able to have good recall using query engines that don't infer sameAs, without having to muddy up the query by making it look for sameAs relationships. But I'm willing to allow that this is point is not central to this discussion. What URI would *you* use were you to desire to talk to *me* (or my RDF-understanding agent) about (pick a favorite bioinformatical entity, among those I probably know about given that I have worked in bioinformatics)? > > * What entity is responsible for choosing and maintaining these > URIs > > What is wrong with a simple scheme that "bio2rdf.org" uses? With my > local "myBio2Rdf" installation I populate my database from the > original supplier. Why do the metadata records need to be preprocessed > and maintained by another entity? > > What is the difference between their scheme and any other, apart from > prejudice against a particular opening identifer which people can > translate and use without relying on the actual organisation to exist > anyway. This is treated above - the problem is communication, not use. Without sharing we're merely talking bioinformatics, not semantic web. > > # How to get stuff > > Personally, I would stick with HTTP GET here. This answer is favored right now, with qualification, but apparently is still controversial, and the other side has to be heard out (assuming they're not too fatigued to speak up) or the whole effort to create a semantic web for health care & life sciences will be weaker. As editor I'm trying hard to stay neutral and to try, if possible, to make everyone happy. Of course each side of the http vs. lsid debate now thinks that I'm on the other side... > > * How to use a URI to get metadata (RDF) about an identified > resource > > I have no problems with getting metadata using the explicit URI object > reference and then having to follow another url to find the actual > data. It is the way things in society pretty much work, you find the > identifying information before you find the data, so when you find the > data you know what you were looking for and that you actually wanted > to expend resources to get the data > > Ie, I would never follow the following url's until I verified that > http://bio2rdf.org/identifier described what I wanted to know. > > http://bio2rdf.org/data/identifier > http://bio2rdf.org/html/identifier > http://bio2rdf.org/image/identifier > > Where one knows about what html and image mean to them for their goal > as basic information types. I'm not sure what you're suggesting here but I think we're probably in agreement, with the detail (probably of no import to any existing client) that there should be a 303 redirect or #fragid-removal on the way to getting the "metadata". Get yourself some RDF first that tells you what documents are available and what their roles are. Then use that information to decide what documents to look at next. > > * How to use a URI to retrieve the bits of an information resource > > Not sure what the difficulties are here. I spent a week making up a > perfectly good browser page for bio2rdf information using my local > database which assumed that the browser already knew how to follow > HTTP standards... and it works so far. > > Essentially, given all of that, I have an adaptable system which > utilises what I see as the best of the distributed semantic web (Web > 3.0) with personal touches (Web 2.0). > > What would change if people all decided for instance to only use lsid > and deprecated http:// uri's? Essentially, I could continue my > personal methods as lsid is included already in my rdf data. I infer that your approach is to use sameAs to "redirect" a URI to a different one, maybe a local one, that you can use. This is not too different, in practice, from the resolution ontology idea I've been working on (based on Alan R's work), in that it represents known tactics for getting to the data in RDF, rather than relegating them to some external API, web proxy, DNS configuration, or whatever. > What would change if people decided to access data by default with > object references instead of metadata? Bio2RDF already allows for this > within itself (ie, http://bio2rdf.org#rdfdata, although it is designed > with what I see to be a more intuitive metadata by default approach. > > Is there any other change that would break my way of doing things? And > does everyone need to decide on one standard, as opposed to utilising > common elements well enough to combine them. Personally I do not like > the idea of anonymous elements, ie bnodes, in RDF describing realistic > scientific or medical data, but that is a minor issue I guess. Good! I also consider blank node notation to be a minor issue in this context. For the purposes of the URI note I don't want to take a stand on the issue of naming a record ("metadata" document) vs. naming the thing described by that record - that's a separate issue to be argued on its own merits - but I do want to make sure that if both have names, that the names are different. Document requirements aren't supposed to imply any change; that's left for recommendations (advice, etc) to do. Requirements for a recommendation-producing process like this one are supposed to inspire you to say four kinds of things: 1. "Those requirements don't make sense to me" 2. "a recommendation meeting these requirements won't be complete" (meaning they're not going to ask enough of *other* people to satisfy me) 3. "a recommendation (etc) may end be being too strong" (meaning I fear it may ask too much of *me*). 4. "I know how those requirements can be met - here are my recommendations" You have done some of these - I hope others will too. I'll update the wiki page today with some clarifications. After I do this, I would be grateful if you would recast your message above as proposed recommendations (what you would do if others would) that meet the document requirements. Best Jonathan
Received on Tuesday, 23 October 2007 13:59:23 UTC