W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > February 2014

IBM DataMag: Data Open to the Masses

From: Steven Adler <adler1@us.ibm.com>
Date: Mon, 3 Feb 2014 18:29:44 -0500
To: "DWBP Chairs" <member-dwbp-chairs@w3.org>, public-dwbp-wg@w3.org
Message-ID: <OF75C4A6A2.46D58ADC-ON85257C74.00807E8E-85257C74.00811108@us.ibm.com>

Data Open to the Masses
A recently launched W3C working group aims to establish best practices and 
vocabularies for an open data ecosystem

Open data is live data in a database or catalog outside a firewall that is 
open to every person and potential purpose. The open data movement began 
in 2005 with the International Aid Transparency Initiative (IATI). The 
IATI seeks to publish data about all foreign aid donations, projects, 
people, resources, and impacts worldwide. Its goal is to inject 
transparency and information sharing into a realm that previously has been 
opaque and corrupt. The IATI inspired governments around the world to 
adopt transparency as a tool for improving civil services, and in 2008 the 
US created data.gov to provide an open data portal from which US 
government public data could be provided on the Internet for civic uses.
Gathering steam
Data.gov was a model for open data movements around the world, and today 
city and state governments are publishing a dizzying array of public 
information in open data catalogs. Application developers now gather 
around local open data fountains of information to create applications 
that augment social services in their areas. They organize hackathons on 
workday evenings in which 20 to 40 developers meet to brainstorm uses for 
a type of data. The hackathons combine political town hall events with 
workshops and LAN parties. Developers bring their laptops and do rapid 
prototyping in Ruby, Python, Perl, and JavaScript. 
In three hours, they develop rough interfaces and analytical designs. They 
use knowledge of local problems with civic pride, a desire to make a 
difference, and ambition to develop an application that gets noticed. Some 
seek social connections, business networking, and entrepreneurial 
opportunities, while others show up for food, free beer, and a challenging 
intellectual environment. Not every idea gets coded, not all code is good 
code, and many prototypes end up in the bin later. 
But these hackathons are transforming enterprise IT in the cities where 
they are organized on a weekly basis. Just a few years ago, city IT 
departments could not imagine publishing data without a purpose or 
application that used it. Now they are putting everything out there with 
the hope that it inspires others to develop uses that IT could not 
imagine. Sometimes developers help municipalities understand things about 
themselves that their IT staffs didn’t know because governmental IT groups 
can be compartmentalized just like the departments they serve. Open data 
allows people outside the internal structures to identify patterns the 
people inside were unable to discover.
Some cities such as Chicago, New York, and San Francisco have aggressive 
agendas to publish all their data in open data formats by 2015, and these 
new data sets are free resources from which anyone can generate value and 
improve government services and quality of life in cities and states.
But it isn’t perfect. Data published in open data often lacks common 
quality standards. Its creation, point of origin, age, and internal usage 
are all mostly absent from open data in catalogs. And even if that 
information is provided, every city or state has its own methods for 
identifying it. Without common descriptors of how data was derived, where 
it came from, and to what degree the publishing authority itself trusts 
it, a lot of open data is assumed to be authoritative and it just isn’t.
Establishing a standard
In December 2013, the World Wide Web Consortium (W3C) launched a new 
working group to develop common standards to address these issues. The 
Data on the Web Best Practices working group seeks to build open data best 
practices and vocabularies to enable cities and states publishing open 
data to describe data lineage, quality, veracity, and derivation. Using 
this standard, governmental data published in open data formats should be 
more reliable, valuable, and comparable than without this standard.
Hadley Beeman, Yaso Cordova, and I cochair this working group. We envision 
a world in which states and nations can use open data published by towns 
and cities to gain enhanced understanding about their urban environments 
and civic governments. We see opportunities for regions to analyze common 
open data to identify opportunities to cut carbon dioxide emissions, 
improve traffic safety, speed disaster recovery services, and manage 
public resources such as water. Open data has the potential to transform 
how citizens interact with civic government, and we intend the Data on the 
Web Best Practices working group to provide common open standards and best 
practices that can empower that transformation.
Our goal is to deliver the Data on the Web Best Practices (recommendation) 
to develop the open data ecosystem, provide guidance to publishers, and 
build trust in the data. The working group will build on and extend the 
work completed in the Government Linked Data working group by taking a 
domain and technology-agnostic approach to cover the following aspects:
Establishing vocabulary rules to enable data sharing, comparability, and 
Designing and managing Uniform Resource Identifiers (URIs) for persistence
Guiding the provision of metadata
Publishing and accessing versions of data sets
Making controlled vocabularies accessible as URI sets
Providing technical factors for consideration when choosing data sets for 
Offering technical factors that affect the potential use of open data for 
innovation, efficiency, and commercial exploitation
Preserving data
Evidence of implementation will be gathered from national or 
sector-specific guidelines that reference the best practices. The working 
group will also develop vocabularies—working group notes—including the 
following two new vocabularies to support the data ecosystem:
Quality and Granularity Description Vocabulary: This vocabulary is 
foreseen as an extension to the Data Catalog Vocabulary (DCAT) to cover 
the quality of the data, how frequently it is updated, and whether it 
accepts user corrections, persistence commitments, and so on. When used by 
publishers, this vocabulary fosters trust in the data among developers.
Data Usage Description Vocabulary: This vocabulary describes how one or 
more data sets are used. Where data is used in an application, it 
facilitates a description of what the application does and what problem it 
helps to solve. This description can improve discoverability of the 
application. Where data is used in other contexts, such as in research, it 
facilitates provision of information about which data was used and how it 
was used during the research. This information can link to and be cited 
within published papers. In these and other scenarios, using this 
vocabulary seeks to encourage the continued publication of the data on 
which the usage depends.
Leveling the playing field
The Data on the Web Best Practices working group’s charter and all the 
proceedings of the working group are public—and of course, published in 
open data. Working group participation is open to W3C members, but we will 
host a series of public workshops around the world in 2014 to encourage 
open data community participation and input. We want this working group to 
include all the diverse voices and points of view from the global open 
data community on every continent.
Supported by common standards, open data can generate tremendous economic 
and social value for citizens and governments. But transparency is the 
best tool to fight growing state surveillance and Internet balkanization. 
We want to live in a world in which information is freely available, and 
through taxation we already pay governments to collect our data. When a 
government publishes our data in open formats, it levels the playing field 
and provides citizens with the opportunity to empower themselves with 
information to self-determine their own needs and purposes.
Whether you are a data governance professional, a big data scientist, an 
application developer, or just an interested citizen and public advocate, 
open data is an important movement and the world needs your skill and 
attention. If you are a member of W3C, please consider joining the Data on 
the Web Best Practices working group. If not, stay tuned for public 
workshop announcements, and consider participating to add your ideas and 
experiences to the creation of open data standards. And please share any 
thoughts or questions in the comments.

Best Regards,


Motto: "Do First, Think, Do it Again"
Received on Tuesday, 4 February 2014 12:53:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:24:06 UTC