- From: Markus Krötzsch <markus.kroetzsch@cs.ox.ac.uk>
- Date: Sat, 18 May 2013 20:48:25 +0100
- To: Denny Vrandečić <denny.vrandecic@wikimedia.de>
- CC: Pascal Hitzler <pascal.hitzler@wright.edu>, SW-forum <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
Thanks, Denny. Well spoken. Everybody should read this. I know that I will want to point people to this text in the future. Can you make a permanent URL and title that can be used to cite this text? No need to make a PDF, but something more than mailing list archives would be nice. -- Markus On 18/05/13 10:06, Denny Vrandečić wrote: > * > > ** > > *tl;dr - If you publish data, attach the CC0 license to it, but that’s > basically just advertising - don’t think it means anything.* > > *If you use data, you do not have to care much about the data license.* > > *If you republish data, it’s a bit more complicated, but not as horrible > as you might think.* > > * > > Imagine a student reading a CC-BY-SA published textbook on compilers. > Next thing, based on that knowledge, he writes a parser and publishes > the binary on the Web. Does he have to acknowledge the textbook? Does he > have to publish his code under the same license? > > > Imagine a designer creating an image with GIMP, a fantastic open source > image processing tool, published under the GPL. Or a developer writing > his code in Eclipse. Or a website being served from a Linux box. What > legal implications does it have for the license of the image? For the > source code? For the served page? > > > Imagine a search engine that changes its background color depending on > the type of thing you are searching for. You enter a city - it turns > gray. You enter a person - red for females, blue for males, and purple > for others. You enter a company - yellow. And so on. Let us assume that > the search engine does that by figuring out the thing you are searching > for and then asking DBpedia for its type. Since DBpedia is licensed > under CC-BY-SA, does this mean we have to put a link on the search > result acknowledging DBpedia? Does this mean we have to publish our > search index under CC-BY-SA as well? > > > Imagine Red Cross publishing pages about the countries they work in, and > adding the population data to each of them from Freebase, the location > from OpenStreetMaps, the local name of the country from GeoNames, and > the capital from DBpedia. What amount of legal disclaimer would need to > be displayed on the page? Maybe some of the data items derive from > another source? What about their licenses? What about this license > stacking effect? > > > > There are some rather vague ideas floating about how the whole > intellectual property law apparatus works for data. I have mulled over > this for a long time, and read more laws and court cases than I care to > admit. I want to try to make a few points in the following. > > > Let’s start with the basics. What laws do actually apply? > > > Copyright law protects the expression, not the idea - the form, not the > content. You can watch the newest Iron Man movie, and you are legally > allowed to annoy your friends with retellings of the movie as often as > you want. But you are not allowed to film it with your phone camera in > the theater and display it to your friends. If you learn something from > a textbook, you are free to write your own textbook, adding other > knowledge you have acquired, possibly from other textbooks and > publications. Only if you start copying the original texts to closely, > you will get into legal trouble. > > > Almost all of the above mentioned licenses - all Creative Commons > licenses currently available, as well as the GFDL or the GPL - are based > on copyright laws. The GPL has started, as Stallmann admits, as a legal > hack of copyright law. This makes a lot of sense, since these licenses > have not meant to cover data, but expressions: texts, music, and the > like. This means, these licenses cannot extend beyond that. They only > cover the expression. They cover the actual RDF/XML file, the string of > characters. Not the content. Not the graph. > > > (Note that ODBL and the current draft of the upcoming fourth revision of > CC go beyond copyright and include database right where applicable, i.e. > within the legislation of the EU. This extension is irrelevant for the US.) > > > This means that such licenses, like GFDL for data, have no restricting > effect if you want to use the data. Only if you want to republish the > data files more or less verbatim (in whole or partially, standalone or > as part of a bigger project), you need to think about the original > license. Merely including the data (not the files!) has no effect > stemming from copyright. > > > This also makes intuitively sense: if someone takes Wikipedia and counts > the distribution of words and letters in Wikipedia, the subsequent > publication of the results is not restricted by the original license > Wikipedia was published. If someone takes the whole Web, and creates a > graph of all links on the Web, and starts to apply some algorithms on > this graph, the subsequent usage of the results of these algorithms are > not subject to any of the licenses of the original texts published on > the Web. Copyright simply does not extend this far. And that is good. > > > > So much to copyright. Unfortunately, the European Union went a step > further. They recognized that copyright does not apply to databases. > They also recognized that the EU was not doing well in their competition > against the US, with regards to publishing databases. So they decided to > level the field by introducing a completely new right, the database > right. This protects the effort that goes into creating databases - > basically their schema (which columns should I have) and the coverage > (which rows do I have in my database). Ten years later the EU made an > evaluation of the effectiveness of the laws, and came to some > interesting conclusions: first, technically the newly database rights > made things more complicated; second, most publishers obviously do not > understand it, but are happy with what they think it means (which > usually contradicts with what it actually means); and third, it > completely failed in its goal to advance the database publishing sector. > The report offers options to drop the whole database rights thing again, > but so far nothing has happened. > > > Also, this novel database right got a few major blows by the European > Court of Justice, where it clearly stated that the right does not cover > the creation of the database, merely the effort put into obtaining, > selecting, and cleaning a database. This means, e.g. that the > publication of match dates and fixtures by FIFA can not be protected > under the database right. On the other hand, if an external Website > keeps statistics of all FIFA player, how much their cost, where they > currently are, etc., then their database as a whole could be protected. > > > But to make it clear: the database right does not apply to single data > items in the database: should I keep a database of all cities in the UK > and their populations, and if someone asks for the population of Oxford > from my database, the database rights do not prevent them from > republishing and using that data item as they like. Eurostat cannot sue > you if you tell someone the population of France. > > > To summarize on database rights: the EU, and only the EU, have > introduced in 1996 the so called database rights. They are independent > of copyright, and cover a database as a whole in certain circumstances. > If you are in the EU, and want to use the data, database right does not > restrict you. It only restricts you from republishing the database as a > whole or in relevant parts. > > > > Besides the legal foundations of the data licenses, one also has to > consider that copyright law refers dominantly to the right to copy the > data, not to use it: if you want to count how often certain explicit > words are uttered in a movie like Pulp Fiction, you are free to do so. > If you want to count and compare the death count in certain books and > movies (like, Rambo, War and Peace, and the Bible - the results might > surprise you), you are free to do so. You are free to publish the > results, and you are even more free to use them internally in your > organization. > > > > Having said that, I still recommend to add the CC0 license to a dataset > when you publish it. I grudge every time I do it, but it still makes > sense. Not because I believe that it means much: as said, the data in it > is free anyway. But because a lot of other people believe that it means > a lot. They might believe that if they integrate a point of data from a > CC-BY-SA licensed dataset in their own dataset, they have to publish it > under CC-BY-SA as well. They might believe that mixing a CC-BY-SA > dataset with an ODBL dataset and displaying the results is legally > impossible. Maybe they don’t even believe it, but they are required to > ask their lawyers, and their lawyers will prefer to play it safe for > their clients (it is their job!) and advise them accordingly. And for > all of these people, the CC0 license is an item of assurance. So if you > want your dataset to be usable by them, just add a CC0 license to it. > And grudge about it. > > > > There is a completely independent aspect of why it could make sense to > cite your data sources, which is trust and provenance. Even if a dataset > is not published under a CC-BY-like license, meaning that it requires > attribution, it often makes sense to keep the provenance and attribution > intact - simply because the user of your data might ask for the source > themselves, and might want to check on their credibility. But > attribution for increasing your credibility is something entirely > different than attribution because you think you are legally obliged due > to the used data. > > > > If I were an organization or individual with sufficient financial > backup, I would even offer to pick up your legal battles if a data > publisher ever sues you for using their data (not for republishing it > verbatim, though). I hope that maybe an organization or individual will > step up at some point to do so, but I wouldn’t hold my breath for it. > Both the US Supreme Court and the European Court of Justice have > repeatedly decided in favour of the freedom of data, be it the results > of games, be it telephone numbers, be it horse racing fixtures. > > > So, as paradoxical as it sounds: Data is free. Free the data! > > > > There is a battle over minds going on. The one side fights for the > establishment and extension of intellectual property rights. In the last > decades, even years, they have achieved some considerable victories. > Copyright law, as it was introduced in the United States, was meant for > 14 years, and had to be explicitly stated. Today it holds not only for > the lifetime of the creator, but also an additional 70 years (to > incentivize the creator to produce more, because an author would be much > less motivated to write if they knew that half a century after their > death their highly beloved publisher wouldn’t make profit out of their > work anymore). Today, copyright applies automatically, without any > registration or statement. There is no need to put the little c in a > circle anywhere. It is there, automatically, everywhere. > > > The extension from works to content, from expression to ideas, is > another dimension, this time in scope instead of time, in the continuous > struggle to extend and expand intellectual property rights. It is not > just a battle over the laws, but also, and more importantly, over our > believes and minds, to make us more accepting towards the notion that > ideas and knowledge belong to companies and individuals, and are not > part of our commons. > > > Every time data is published under a restrictive license, “they” have > managed to conquer another strategic piece of territory. Restrictive in > this case includes CC-BY, CC-BY-SA, CC-BY-NC, GFDL, ODBL, and (god > forbid!) CC-BY-SA-NC-ND, and many other such licenses. > > > Every time you wonder what license some data has that you want to use, > or whether you need to ask the data publisher if you can use it, “they” > have won another battle. > > > Every time you integrate two data sources and want to publish the > results, and start to wonder how to fulfill your legal obligation > towards the original dataset publishers, “they” laugh and welcome you as > a member of their fifth column. > > > Let them win, and some day you will be sued for mentioning a number. > > > > Links: > > I am not linking to the obvious texts, which are the actual laws. Read > them. They are not as impenetrable as you think. I mean, heck, if you > can make sense of an RDF/XML file, you shouldn’t be scared of some legal > text. > > > Evaluation of the European Commission on the effect of database rights > > http://ec.europa.eu/internal_market/copyright/docs/databases/evaluation_report_en.pdf > > > US Supreme Court, Baker v. Selden - on the extent of copyright with > regards to the expression, not the content > > http://www.justia.us/us/101/99/case.html > > > > Sorry for the far too long reply. It is not meant as a critical reply to > Pascal and his colleagues’ text, but rather something that has been > brooding in me for a while. This text triggered me to write it down, and > in the framework of their text I would read it as a contribution to > point 5 of their way forward. > > > > This text was written by me on a Saturday morning, as a completely > personal opinion. It does not represent the official point of view of > any current, former, or future employer, nor of any project I ever was, > am, or will be affiliated with or am thought to be affiliated with. > > > * > > * -- Dr. Markus Kroetzsch Department of Computer Science, University of Oxford Room 306, Parks Road, OX1 3QD Oxford, United Kingdom +44 (0)1865 283529 http://korrekt.org/
Received on Saturday, 18 May 2013 19:48:51 UTC