Re: LOD Data Sets, Licensing, and AWS from Eric Hellman on 2009-06-24 (public-lod@w3.org from June 2009)

From: Eric Hellman <openurl@gmail.com>
Date: Wed, 24 Jun 2009 14:41:35 -0400
To: Leigh Dodds <leigh.dodds@talis.com>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, Ian Davis <lists@iandavis.com>, public-lod@w3.org
Message-Id: <56C991B4-23B2-4899-BDBB-81F52182B3BD@gmail.com>
I'd like to step in here and add my 2 cents. My background in this is  
that of have started a company that produced a knowledgebase  
aggregation. We were a bit early to take advantage of most of the  
semantic web technologies, but we definitely made use of a lot of the  
early intellectual foundations.

I must agree with Kingsley that it's extremely important for the  
success of the Linked Data meme that we enable an "economy of  
attribution". I would argue however, that the accidental weak  
attribution provided by uri's is a sad excuse for a real provenance  
infrastructure- after all, a site may have to assert a triple to be  
able to say it's false. Or it may need different types of attribution  
for different parts of its data space. We can do better, and I don't  
mean by reifying everything. It's interesting that Freebase was  
brought up, because it saves an identifier and an origin with every  
tuple- tying attribution (and licensing, for that matter) to globally  
identified tuples makes a lot more sense that attaching to the  
entities themselves; after all we want to make the entities reusable  
and to not be producing superfluous sameas's just to make your  
attribution economy work.

I must agree with Leigh that you really need concordance of intent the  
the legal facts of licenses; you can't use copyright to protect facts.  
However, gets more complicated than that. In the US, it's also not  
possible to copyright collections of facts which can in fact be  
copyrighted in Europe under the "Sweat of the Brow" doctrine. http://en.wikipedia.org/wiki/Sweat_of_the_brow

So Copyright protection (and thus licenses including GPL and CC) can  
be asserted on entire dataspaces, but that protection is invalid in  
the United States.

A statement of Ian Davis' caught my attention:
	"Having datasets require attribution will negate one of the linked  
data web's greatest strengths: the simplicity of remixing and reusing  
data."

I would argue that when data loses attribution, it becomes impossible  
to judge the reliability of that data, and thus loses most of its  
worth and that the lack of strong attribution is the linked data web's  
greatest weakness- and that while what he says is true, it doesn't  
HAVE TO be true, as the are multiple approaches to addressing  
attribution that don't impact remix or reuse simplicity.

I've blogged on some of these issues at http://go-to-hellman.blogspot.com/

Eric

On Jun 24, 2009, at 12:04 PM, Leigh Dodds wrote:

> 2009/6/24 Kingsley Idehen <kidehen@openlinksw.com>:
>> My comments are still fundamentally about my preference for CC-BY- 
>> SA.  Hence
>> the transcopyright reference :-)
>
> Unfortunately your preference doesn't actually it make it legally
> applicable to data and databases. The problem, as I see it,  at the
> moment is that this is what the majority of people are doing: using a
> CC license to capture their desire or intent with respect to
> licensing, rights waivers, attribution, intended uses, etc. The
> disconnect is between what people want to do with the license, and
> what's actually supported in law.
>
>> I want Linked Data to have its GPL equivalent; a license scheme that:
>>
>> 1.  protects the rights of data contributors;
>> 2.  easy to express;
>> 3.  easy to adhere to;
>> 4.  easy to enforce.
>
> Then the best way to do this is to engage with the communities that
> are attempting to do exactly that: the open data commons and creative
> commons. We shouldn't be encouraging people to do the wrong thing and
> use licenses and waivers that don't actually do what they want them to
> do. The science commons protocol is a good example of best practices
> w.r.t data licensing that are being agreed to within a specific
> community; one that has a a long standing culture of citation and
> attribution.
>
> IMHO much of the advice and reasoning that has gone into the
> definition and publishing of the science commons protocol is
> applicable to the the web of data as a whole. Convergence on a commons
> -- which can still support and encourage attribution through community
> norms -- is a Good Thing.
>
>> As I stated during one of the Semtech 2009 sessions. HTTP URIs  
>> provide a
>> closed loop re. the above. When you visit my data space you leave  
>> your
>> fingerprints in my HTTP logs. I can follow the log back to your  
>> resources to
>> see if you are conforming with my terms. I can compare the data in  
>> your
>> resource against my and sniff out if you are attributing your data  
>> sources
>> (what you got from me) correctly.
>>
>> If all the major media companies grok the above, there will be far  
>> less
>> resistance to publishing linked data since they will actually have  
>> better
>> comprehension of its inherent virtues and positive impact on their  
>> bottom
>> line.
>
> I'm not sure that understanding the value of a unique uri for every
> resource, and the benefits of a larger surface area of their website,
> is the primary barrier to entry for those companies. One might build
> similar arguments around SEO and APIs. IMO, the understanding has to
> come through the network effects created by opening up the data for
> widest possible reuse. Clear and liberal licensing is a part of that.
>
> Cheers,
>
> L.
>
> -- 
> Leigh Dodds
> Programme Manager, Talis Platform
> Talis
> leigh.dodds@talis.com
> http://www.talis.com
>


Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

eric@hellman.net
http://go-to-hellman.blogspot.com/
Received on Thursday, 25 June 2009 06:25:53 UTC