Re: Nature: A call for a public gene Wiki from Matthew Cockerill on 2006-02-08 (public-semweb-lifesci@w3.org from February 2006)

From: Matthew Cockerill <matt@biomedcentral.com>
Date: Wed, 8 Feb 2006 21:22:19 +0000
To: public-semweb-lifesci@w3.org
Cc: Eric Jain <Eric.Jain@isb-sib.ch>
Message-Id: <470C549E-FEEF-4260-AF59-9331B5C057CD@biomedcentral.com>
Eric,
You raise important issues on this. But I take the converse position,  
I think:

My sense is that:
(a)  wiki style maintenance will prove to be just as valuable for   
ontologies and semantically tagged information as it has proved  to  
be for hypertext  encyclopedia entry maintenance
(b)  attempts to capture the benefits of a wiki-style approach while  
avoiding the core 'anyone can contribute and see their changes  
immediately without having to  go through moderation'  miss the fact  
that that very thing is central  to the motivation that makes wikis  
(especially wikipedia) work so well (and vastly better then most  
people would guess they might).


To comment on some of your points:
>
> We are currently exploring various strategies to encourage people  
> to let us know when they find errors or omissions in UniProt, or  
> even to contribute data as they publish their research, rather than  
> waiting for a curator to pick up their results from a publication.
>
> In principal all of this has been possible for a long time: We have  
> feedback forms etc, but people don’t make use of these often (or  
> not as often as we would like…). The most frequent requests are  
> from people who have published a paper and would like us to cite them.

I agree that people don't use feedback forms - and I believe the key  
to that it the lack of motivation, and lack of assurance that the  
feedback they send will be listened on and acted on in an appropriate  
way to justify the time it takes.

>
> The most effective improvement, in my opinion, may be to allow  
> people to directly attach comments to each database entry, like in  
> a blog (simple, instant gratification etc). These comments could  
> then be reviewed by curators, and integrated into the database, if  
> appropriate.

Adding comments does not provide the same motivation as updating the  
core data.

The Encyclopedia Brittanica, with the ability to add comments, is  
still the Encyclopedia Brittanica.
In many cases, no one will bother to make comments, in other cases  
there may be pointless comment debates and comment spam which no one  
has time to read. Comments are fine (you can make them on BioMed  
Central articles - e.g. http://www.nutritionj.com/content/4/1/24/ 
comments ), but they aren't paradigm changing.


>
> Having a system where people could add and update data directly, on  
> the other hand, wouldn’t be practical: The amount of training  
> required to enter data in a consistent manner is considerable, and  
> being consistent is essential for large, highly structured data  
> sets like UniProt.
>

But what would motivate someone to take the time to update an entry  
in an erroneous and inconsistent way.

The only reason someone would generally do this, would be that
(a) the item was lacking in all annotation up to then, and so  
something is better than nothing
(b) the previous annotation was worse
(c) they are clueless, and have too much time on their hands

The way things seem to work on wikipedia, in all but a few  
pathological cases, annotations seem to converge towards surprisingly  
good quality.
The process by which all  parties interested in  a particular item  
(or family of items) can and do sign up to be  automatically notified  
of changes to that item seems to be a highly  effective way to make  
sure that changes that get made make sense (and (c), when it occur,  
is rectified).

> Another approach is to have a hand-picked list of experts who are  
> responsible for certain database entries, according to their area  
> of expertise. These people would be responsible for letting us know  
> if something needs to be updated, though I wonder how many people  
> can be motivated to commit themselves to such a thing.
> The critical factor in the end may not just be how easy it is to  
> contribute, but also how much credit can be gained from doing so.  
> Contributors should be listed on each page. Should we go as far as  
> attributing individual facts to contributors? This would allow us  
> to also state who disagrees with something. Should we allow people  
> to rate the contributions of others? This way people could gain  
> reputation through our web site. Somehow I suspect that  
> contributing to public databases like UniProt won’t become common  
> practice until this is something that you can proudly mention in  
> your CV…


I agree that the motivation issue is absolutely  key.
Central to what makes Wikipedia work is not simply that you can  
change it, but that it is a highly useful resource, used by millions  
of people. All the contributors are users too.

It seems to me that the key to motivation is the ability to yourself  
make changes which increase the usefulness of the resource to  
yourself and to others.
Using a resource if it contains an inaccuracy is a motivation to fix  
that resource if you can do so directly. To some extent Wikipedia   
works because a significant fraction of the world has OCD tendencies.  
If they see something out of place, and can change it, they will  
because it makes them feel better, and leaves the world a tidier  
place ( sending a feedback message simply cannot provide this level  
of reward).

But perhaps more importantly, there are also practical motivations too.

Say that I want to link people from my website to a good, standard  
explanation of (say) what an Impact Factor is. I can link to wikipedia:
http://en.wikipedia.org/wiki/Impact_factor
But say that the explanation on the site makes a mistake, or omits a  
key aspect, or a certain link. I'm motivated to improve the wiki  
entry before I link to it.

The same applies to biologists and bioinformaticists working with  
Uniprot type data - if there is noise in the data (or missing but  
vital info in that data), and this means that their automated  
analyses are missing things, then if it is possible  to clean up the  
data at source, there is an immediate motivation to do so.

Another example of how motivation can drive good curation from  
grassroots, that would be impractical in scale if approached from the  
top down,  imagine we have wiki entries for all scientific authors  
(not just http://en.wikipedia.org/wiki/Einstein but everyone who's  
ever published a scientific author, generated from the literature  
using automated statistical tools).
This could be a really handy resource - not least, a URI for any  
author for semantic web purposes.

And suppose that you are John Smith, and you discover that you've  
been lumped in with another John Smith on the same URI because you  
shared a name, and the statistical analysis tools couldn't spot that  
your work and career was distinct from the other JS (your doppelganger).
Assuming that this wiki database of scientific authors and their  
careers and bibliographies is highly used, you (and/or the other John  
Smith) would be strongly motivated for practical reasons to  
disentangle your identities into separate wiki pages. And as a result  
of doing so, you would be adding additional training data for the  
algorithms, that could then be used to improve the statistical  
analysis next time around.

One a related issue:
Pierre wrote:
"a wiki is not a "semantic web" source of information"

My sense is that, to take one example, Wikipedia is a lot closer to a  
'semantic web' source of information than is commonly acknowledged.
For a start, unlike, say, the Gene Ontology, there is a clearly  
agreed URI for each concept/entry within Wikipedia.
e.g.
http://en.wikipedia.org/wiki/Gambia
http://en.wikipedia.org/wiki/France

Admittedly, although those entries link to:
http://en.wikipedia.org/wiki/Country
and
http://en.wikipedia.org/wiki/Population

Wikipedia (I think) currently lacks the expressive power to express  
even simple "is a" or "has a" relationships.
But it has the necessary building blocks to make such a thing, and  
more complicated ontology management possible.

Why do I keep mentioning Wikipedia, rather than proposing a new Wiki- 
semanto-pedia?
Because I think, just like with the success of Google and Ebay,  
motivating people to update content is an example positive-feedback  
creating a  winner-takes-all.
The more wikipedia is used, the more people are motivated to update  
it, and the more useful it gets.

If it is possible to give people comprehensible tools to allow them  
to express (and manage in a wiki way)  semantic web relationships  
within wikipedia at the same time as human readable text, then I  
think there is finally a chance to turn the whole semantic web dream  
into something practically and realisticly attainable, and that  
wikipedia itself may play an important role in that. After all,  
there's no quicker way to look up the URI for a given concept  than  
to do a quick search of Wikipedia from your firefox searchbox...

Matt



On 8 Feb 2006, at 19:44, Eric Jain wrote:

>
> Pierre LINDENBAUM wrote:
>> I agree, a wiki would be great way for sharing
>> knowledge as it would allow experts of a protein, of a
>> gene to freely add, modify and share annotations. But
>> I fear it could also be a problem for knowledge
>> discovery  because a wiki is not a "semantic web"
>> source of information.
>
> I'm also a bit skeptical about how well a wiki would work here, see  
> http://eric.jain.name/2006/02/08/how-to-encourage-contributions/.
>
>
Received on Wednesday, 8 February 2006 21:20:22 UTC