Replies from Bio2RDF about contact with upstream data providers

Hi all,

happy new year!

On Mon, Jan 4, 2010 at 3:44 PM, Susie Stephens <susie.stephens@gmail.com> wrote:
> == Agenda ==
>  * Open data follow up - all
>  * Data update - Anja, Jun, Matthias, Egon

I have had email with Peter Ansell of the Bio2RDF project and
copy/pasted replies below.

***
Are you in contact with upstream providers? E.g. are they aware you
rdf-ied their data?
***

Only in cases where they do not offer licenses that we can use without
telling them as far as I know. Some, like the full NLM pubmed license
require that we ask, so in those cases they know.

***
How do you propagate licenses and copyright? I know you have they
data blobs nicely separated, so no problems with license
incompatibility, but I did not see copyright/license statements
mentioned on the RDF pages (or HTML conversion), nor in the list at [0].
Will copyright/license information be added to that list at [0]?

0. http://sourceforge.net/apps/mediawiki/bio2rdf/index.php?title=Namespace
***

Quite a few of the pages, but not all, have a triple added that
indicates where the license is to be found. We use the
<http://creativecommons.org/ns#license> predicate to indicate the
license, even if the license is not a CC license. It fits better IMO
than dc:license and definitely better than xhtml:license.

See http://bio2rdf.org/go:0000345 for an example, with the license
redirect URL <http://bio2rdf.org/license/go:0000345> redirecting in
this case to <http://www.geneontology.org/GO.cite.shtml>. We do a
redirect to the license because that is the easiest method, not that
we couldn't do it directly. I prefer to have the ability to redirect
licenses based on both the namespace and the identifier, particularly
in the case of SIDER for example, where there are two datasets with
different licenses in the same "namespace" because that is how it
works.

The current list that is used to autogenerate the license triples,
although it should definitely be expanded, can be found in RDF at
<http://bio2rdf.svn.sourceforge.net/viewvc/bio2rdf/trunk/src/war/WEB-INF/base-bio2rdf-providers-licenses-config.n3?view=markup>
All of the providers there, insert the static RDF/XML that is defined
at <http://qut.bio2rdf.org/query:license>, but another query could be
used if there were specific conditions for particular datasets, as
there will be with pubmed soon.

***
Do upstream providers have preferences regarding how you put in the license?
***

Not that I know of in most cases. The 2009 Pubmed License has a few
new provisions though, so there are some cases that have different
providers.

***
Have you talked with upstream providers about changing licenses to
reduce license conflicts?
***

I can understand providers not wanting you providing their actual
information in RDF, but I can't understand them thinking that they can
have control over how people relate their personal datasets to their
information in small amounts. If the linking is major then we could be
in the situation that CAS tried to get into with WIkipedia, with CAS
giving Wikipedia a special agreement.
<http://www.cas.org/newsevents/caswikipedia.html> What they don't
realise is that WIkipedia releases the information under the same
license so it is totally free from that point on, and CAS cannot go
back on the agreement if anyone can prove that they helped with the
CAS number insertions on WIkipedia.

***
Do all upstream databases provide open/free licensing?
***

I only found three databases that we are currently offering for
download that I will have to check up with Marc-Alexandre about
the license conditions [...].

The majority of the databases seem to have the equivalent of CC-BY-NC
on it, although they don't actually use Creative Commons licenses.

------------------------------

Egon

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

Received on Wednesday, 6 January 2010 13:56:49 UTC