RE: data.gov.* memo

As mashup activities increase and web-based data becomes more interoperable (a la the Internet as One Big Database), it becomes difficult for software to determine copyright infringements without a consistent machine-readable approach.

 

As is mostly the case, http://www.loc.gov/homepage/legal.html#copyright places the responsibility to determine copyright infringement on the part of the “researcher” by looking “in the catalog records, finding aids and other texts that accompany collections…It is the researcher's obligation to determine and satisfy copyright or other use restrictions when publishing or otherwise distributing materials found in the Library's collections.”  

 

This seems like a good human-based approach, but for machine to machine access, I think we should consider recommending that governments:

(1) add copyright information to applicable electronic files (e.g., Dublin Core) and/or 

(2) organize files so that specific directories are flagged as copyright protected (e.g., including a copyright.xml file in the folder) and/or 

(3) use electronic catalog files that contain copyright information.  

 

Creative Commons has also done a lot of work in this area: http://wiki.creativecommons.org/UsingMarkup so they might be a group to advise us  on this matter.  Unfortunately, even with Dublin Core, a machine could not “read” the dc:rights and understand its meaning.   

 

Looking at data.gov (1) The metadata pages that describe data files do not include rights (e.g., http://www.data.gov/details/292) and (2) the policy notice at http://www.data.gov/datapolicy seems to cover any potential issues where the policy states “nor does this Data Policy limit the protection afforded any information by other provisions of law”  What other protection could information be afforded other than copyright?

 

For an example of a gov site with copywritten material, the images at http://bioguide.congress.gov may be protected per http://bioguide.congress.gov/copyright.htm but there’s no mechanism to determine which ones are protected, which ones are not protected, and which ones have unknown protection.  And, here’s a tangential problem to ponder.  When a photo identification needs to be corrected, how does the government alert the “re-users” that their info is incorrect?

 

As the advocate for the idea that re-using government data in particular promotes improved capabilities and knowledge for society in general, I think the W3C eGovernment Interest group should consider this issue for further action.  Governments should also recognize that the general perception is that all files found on government sites are in the public domain (esp. in the US per 17 USC 105). Given our focus on an electronic government, I think we need to consider recommending that in order to avoid copyright infringement by machines, governments should consider providing copyright information in a machine-readable standards-based format (which I don’t believe exists?) especially for the files on their sites that are protected by copyright.

 

Thanks,

Joe

 

From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org] On Behalf Of Novak, Kevin
Sent: Tuesday, June 16, 2009 8:57 AM
To: Acar, Suzanne; daniel@citizencontact.com; jonathan.gray@okfn.org
Cc: josema@w3.org; public-egov-ig@w3.org; John.Sheridan@nationalarchives.gov.uk
Subject: RE: data.gov.* memo

 

All,

 

It is a complex issue even for US government. Not so much for the general agencies given Suzanne’s comments.

 

The Library of Congress, Smithsonian, NEH, National Gallery of Art, National Park Service and a few others have “collections” of material that have been digitized and made available on the web. Many resulting from agreements with trustees and custodians that have donated the materials to the institutions for some level of access. The challenge was and is ensuring that the materials are rights protected and it is made clear that they do not fall under the normal regulations. Negotiating these agreements is quite an experience and always challenging when you don’t have a good policy basis to start with. Although this isn’t specifically a “data” issue under the current data.gov and UK efforts, it is indeed a growing issue for agencies dealing with culturally significant materials that aren’t necessarily government produced and the desire to have the materials located on government websites.

 

Kevin

 

Kevin Novak

Vice President, Integrated Web Strategy and Technology

The American Institute of Architects

1735 New York Avenue, NW

Washington, DC 20006

 

Voice:   202-626-7303

Cell:       202-731-0037

Twitter: @novakkevin

Fax:        202-639-7606

Email:    kevinnovak@aia.org

Website: www.aia.org

 


 <http://outlook.aia.org/exchweb/bin/redir.asp?URL=http://www.webbyawards.com/> http://outlook.aia.org/exchange/knovak/Inbox/sharepoint%20access-2.EML/1_multipart/image001.jpg?Security=2

AIA NAMED BEST ASSOCIATIONS WEBSITE FOR THE 12th ANNUAL WEBBY AWARDS!


America's Favorite Architecture <http://outlook.aia.org/exchweb/bin/redir.asp?URL=http://www.favoritearchitecture.org/>  Tops the Shortlist for International Honor for the Web

 

The American Institute of Architects is the voice of the architectural profession and the resource for its members in service to society.

 

 

From: Acar, Suzanne [mailto:Suzanne.Acar@ic.fbi.gov] 
Sent: Tuesday, June 16, 2009 8:44 AM
To: 'daniel@citizencontact.com'; 'jonathan.gray@okfn.org'
Cc: 'josema@w3.org'; 'public-egov-ig@w3.org'; 'John.Sheridan@nationalarchives.gov.uk'; Novak, Kevin
Subject: Re: data.gov.* memo

 

Very interesting, Daniel. Will take a closer look.
Also, thank you Jonathan for the clarifiacation on your statement. 

Cheer
Suzanne

 

  _____  

From: Daniel Bennett <daniel@citizencontact.com> 
To: Jonathan Gray <jonathan.gray@okfn.org> 
Cc: Acar, Suzanne; josema@w3.org <josema@w3.org>; public-egov-ig@w3.org <public-egov-ig@w3.org>; John.Sheridan@nationalarchives.gov.uk <John.Sheridan@nationalarchives.gov.uk>; kevinnovak@aia.org <kevinnovak@aia.org> 
Sent: Tue Jun 16 08:44:28 2009
Subject: Re: data.gov.* memo 

Awhile ago, when some of the bills were starting to be introduced in XML, the Congress decided to add in some Dublin Core metadata so that issues such as rights would be made clear. See below.

And then there is the presumption that anyone or organization that publishes raw data in an open and without real applications is intending for the data to be either used in place or copied. This is like having an RSS newsfeed and then claiming that the RSS newsfeed itself is copyrighted. 

And then there is the issue of how data is used on the Internet with search engines essentially having a complete copy of almost everything internally in order to allow for search.   Hmmmmmm.



<metadata xmlns:dc= <http://purl.org/dc/elements/1.1/> "http://purl.org/dc/elements/1.1/">
<dublinCore>
<dc:title>111 HR 11 IH: Lilly Ledbetter Fair Pay Act of
</dc:title>
<dc:publisher>U.S. House of Representatives</dc:publisher>
<dc:date>2009-01-06</dc:date>
<dc:format>text/xml</dc:format>
<dc:language>EN</dc:language>
<dc:rights>Pursuant to Title 17 Section 105 of the United States Code, this file is not subject to copyright protection and is in the public domain.</dc:rights>
</dublinCore>
</metadata>

 Daniel



Jonathan Gray wrote: 

On Tue, Jun 16, 2009 at 2:13 PM, Acar, Suzanne <mailto:Suzanne.Acar@ic.fbi.gov> <Suzanne.Acar@ic.fbi.gov> wrote:
  

US data.gov published a policy statement on the site.  Copyright statement was not needed because government data once released for sharing is public domain.
    

 
While this is true for US Federal government material - this is
unfortunately not so clear outside the US.
 
In my experience of looking at the situation with data across Europe,
many government sites do not explicitly state what can and can't be
re-used. The EU PSI Directive broadly encourages member states to make
material available for re-use - but this is still being implemented,
and some feel there is ambiguity about its scope and strength. Also
its always helpful to know where rights are held by third parties!
 
  

 

Received on Tuesday, 16 June 2009 16:35:07 UTC