Re: what would change for me? from Jonathan Rees on 2007-10-23 (public-semweb-lifesci@w3.org from October 2007)

From: Jonathan Rees <jar@creativecommons.org>
Date: Tue, 23 Oct 2007 10:03:16 -0400
To: Peter Ansell <ansell.peter@gmail.com>
Cc: public-semweb-lifesci@w3.org, p.roe@qut.edu.au, j.hogan@qut.edu.au
Message-Id: <6A070D0C-5415-4685-8AEA-6CFC9F822A8E@creativecommons.org>
First let me thank you for taking a serious look at the requirements.  
I appreciate it.

On Oct 21, 2007, at 7:44 PM, Peter Ansell wrote:

> Hi all,
>
> I have been using the Bio2Rdf markup system and I personally do not
> see what all the fuss is about but there must be something so here are
> my opinions based solely on the requirements document
>
> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/ 
> URI_Best_Practices/Recommendations/Requirements
>
> # For our own resources, what URIs to mint and what contracts to
> adhere to regarding well-definedness and documentation
>
> Publically retrievable metadata for ones personally produced/published
> information (if not data as well) should be available using URI's
> matched to one's institution/organisation, with relevant owl:sameAs
> and rdfs:seeAlso tags to specify their relationships to other known
> uri's.

As Eric Jain pointed out, what HCLS has been trying to do (for the  
most part) is not to establish tools for private use, but rather to  
establish a *protocol* for clear communication of information (ideas,  
facts, observations, conclusions) in RDF over the Internet.  Whether  
this protocol is a formal one involving published specifications or  
simply informal agreement about what we're going to do is not to the  
point. My conjecture is that the central thing we need to agree on is  
the minimal meaning of terms (URIs): when you should use certain  
terms (in talking to me) and when you shouldn't. Absent common terms  
with common meanings there is little hope of communication.

Maybe this sounds trivial, but the records of this mailing indicate  
that it's not (record vs. protein, ontology boundary cases, time  
dependence, ontology versioning, points of view, document variants,  
etc.).

> Advantages: One does not need to negotiate with the original author in
> order to augment their definition, and people who actually want to
> know things have clear unambiguous ways of getting to their goal.
Advantages over what? The requirements take no position on HTTP  
protocol vs. any other resolution method (including sameAs, DNS,  
LSID, handle, info:, CORBA, dewey decimal, ...).

Clearly there are differences of opinion about definitions, so we  
need to be somewhat careful about this. You can define your internal  
terms as you like, but as soon as you use sameAs (or equivalentClass  
etc.) in order to communicate with me, you are making a very strong  
statement - that the things you've said about the thing you're  
talking about are true of the thing I would talk about using the  
shared term. We have to have a shared understanding to communicate.  
We may not share a definition, but we have to share some aspect of  
meaning or use of the term, and I would say that has to be documented  
somewhere.  When you assert sameAs, this is a matter of judgment or  
hypothesis, and you and I might argue about it as we'd argue about  
any kind of assertion in science; but if we don't have a starting  
point, an agreement on what minimally required of *any* description  
(your "definition") you OR I make, we won't have any basis for  
disagreement.  (Just as in this discussion!)

We agree that if definitions are different then different terms are  
needed; I think that's the effect of "matched to one's institution"  
which I agree is a good safe position. If we have a quarrel here it  
is probably tactical, or maybe about what "definition" means as  
opposed to description, not fundamental.

> Follows the process of how knowledge is developed, ie, someone comes
> up with an idea and develops it themselves with citations to outside
> publications.
Absolutely - we're on the same page. I'll look at your statements and  
see which can be added to a revision (or explanation) of the  
requirements.
> Disadvantages, sparql queries are not simple, but I use programmatic
> level access and enable the retrieval of sameAs items through code
> which then abstracts queries to utilise all known identifiers when
> querying. People don't actually want to write sparql queries
> themselves, they are biologists or doctors, who just want to click on
> a button and have it work for them, whether the program does one or
> three queries is basically inconsequential to them.
To me this is an acceptable use case.  When you assert a sameAs  
you're saying that for the purposes of this query you would like to  
assume that these URIs all name the same thing. This can be  
hypothetical, just as any of your biological relationships are  
hypothetical, with sameness assertions relative to "trust" in sources  
for a particular application - roughly speaking, you might assert  
lots of sameAses for higher recall, or fewer for higher precision.

SameAs is a very strong statement, so you might consider using a  
superproperty of it, but let's not get into that as Alan R and  
Richard C have already been over this territory recently.
>
> # What particular URI's to use for resources related to public
> databases (esp. database records) (>4 proposals on table)
>
> Admittedly this is an issue, but so far I like being able to have the
> best of lsid and http: uri's with the bio2rdf markup schemata. Simple
> text URI's not matching is inconsequential if one has metadata
> identifying two URI's as identical.
I think you try to match URIs or use aliased URIs depending on what  
you're trying to talk about. If you want to talk about what someone  
else is talking about, you have to use a shared term. But private  
terms are important if you don't know whether what you're talking  
about is the same as what they're talking about. If you discover some  
difference you can always retract the sameAs.  The alternative is to  
retract all of your assertions about the thing and rewrite in terms  
of a new URI at that point. I think we're in agreement.

Personally I like to be able to have good recall using query engines  
that don't infer sameAs, without having to muddy up the query by  
making it look for sameAs relationships. But I'm willing to allow  
that this is point is not central to this discussion.

What URI would *you* use were you to desire to talk to *me* (or my  
RDF-understanding agent) about (pick a favorite bioinformatical  
entity, among those I probably know about given that I have worked in  
bioinformatics)?
>
>     * What entity is responsible for choosing and maintaining these  
> URIs
>
> What is wrong with a simple scheme that "bio2rdf.org" uses? With my
> local "myBio2Rdf" installation I populate my database from the
> original supplier. Why do the metadata records need to be preprocessed
> and maintained by another entity?
>
> What is the difference between their scheme and any other, apart from
> prejudice against a particular opening identifer which people can
> translate and use without relying on the actual organisation to exist
> anyway.

This is treated above - the problem is communication, not use.  
Without sharing we're merely talking bioinformatics, not semantic web.
>
> # How to get stuff
>
> Personally, I would stick with HTTP GET here.
This answer is favored right now, with qualification, but apparently  
is still controversial, and the other side has to be heard out  
(assuming they're not too fatigued to speak up) or the whole effort  
to create a semantic web for health care & life sciences will be  
weaker. As editor I'm trying hard to stay neutral and to try, if  
possible, to make everyone happy. Of course each side of the http vs.  
lsid debate now thinks that I'm on the other side...
>
>     * How to use a URI to get metadata (RDF) about an identified  
> resource
>
> I have no problems with getting metadata using the explicit URI object
> reference and then having to follow another url to find the actual
> data. It is the way things in society pretty much work, you find the
> identifying information before you find the data, so when you find the
> data you know what you were looking for and that you actually wanted
> to expend resources to get the data
>
> Ie, I would never follow the following url's until I verified that
> http://bio2rdf.org/identifier described what I wanted to know.
>
> http://bio2rdf.org/data/identifier
> http://bio2rdf.org/html/identifier
> http://bio2rdf.org/image/identifier
>
> Where one knows about what html and image mean to them for their goal
> as basic information types.

I'm not sure what you're suggesting here but I think we're probably  
in agreement, with the detail (probably of no import to any existing  
client) that there should be a 303 redirect or #fragid-removal on the  
way to getting the "metadata".  Get yourself some RDF first that  
tells you what documents are available and what their roles are.   
Then use that information to decide what documents to look at next.
>
>     * How to use a URI to retrieve the bits of an information resource
>
> Not sure what the difficulties are here. I spent a week making up a
> perfectly good browser page for bio2rdf information using my local
> database which assumed that the browser already knew how to follow
> HTTP standards... and it works so far.
>
> Essentially, given all of that, I have an adaptable system which
> utilises what I see as the best of the distributed semantic web (Web
> 3.0) with personal touches (Web 2.0).
>
> What would change if people all decided for instance to only use lsid
> and deprecated http:// uri's? Essentially, I could continue my
> personal methods as lsid is included already in my rdf data.

I infer that your approach is to use sameAs to "redirect" a URI to a  
different one, maybe a local one, that you can use.  This is not too  
different, in practice, from the resolution ontology idea I've been  
working on (based on Alan R's work), in that it represents known  
tactics for getting to the data in RDF, rather than relegating them  
to some external API, web proxy, DNS configuration, or whatever.

> What would change if people decided to access data by default with
> object references instead of metadata? Bio2RDF already allows for this
> within itself (ie, http://bio2rdf.org#rdfdata, although it is designed
> with what I see to be a more intuitive metadata by default approach.
>
> Is there any other change that would break my way of doing things? And
> does everyone need to decide on one standard, as opposed to utilising
> common elements well enough to combine them. Personally I do not like
> the idea of anonymous elements, ie bnodes, in RDF describing realistic
> scientific or medical data, but that is a minor issue I guess.

Good! I also consider blank node notation to be a minor issue in this  
context.

For the purposes of the URI note I don't want to take a stand on the  
issue of naming a record ("metadata" document) vs. naming the thing  
described by that record - that's a separate issue to be argued on  
its own merits - but I do want to make sure that if both have names,  
that the names are different.

Document requirements aren't supposed to imply any change; that's  
left for recommendations (advice, etc) to do. Requirements for a  
recommendation-producing process like this one are supposed to  
inspire you to say four kinds of things:
   1. "Those requirements don't make sense to me"
   2. "a recommendation meeting these requirements won't be  
complete" (meaning they're not going to ask enough of *other* people  
to satisfy me)
   3. "a recommendation (etc) may end be being too strong" (meaning I  
fear it may ask too much of *me*).
  4. "I know how those requirements can be met - here are my  
recommendations"
You have done some of these - I hope others will too.  I'll update  
the wiki page today with some clarifications. After I do this, I  
would be grateful if you would recast your message above as proposed  
recommendations (what you would do if others would) that meet the  
document requirements.

Best
Jonathan
Received on Tuesday, 23 October 2007 13:59:23 UTC