Data on the Web scope issue

Dear all,

 

In last week's meeting we had some discussion about the scope of the
working group. The issue that I brought up was: what do we mean by
"data"? As far as I understand, the group is not a-priori limited but I
think we may want to think about distinguishing data in several
dimensions.

 

Here a set of dimensions we may want to look at, and maybe choose from:

 

Domains:

E.g.

*         Base registers, e.g. addresses, vehicles, buildings;

*         Business information, e.g. patent and trademark information,
public tender databases;

*         Cultural heritage information, e.g. library, museum, archive
collections;

*         Geographic information, e.g. maps, aerial photos, geology;

*         Infrastructure information, e.g. electricity grid,
telecommunications, water supply, garbage collection;

*         Legal information, e.g. supranational (e.g. EU) and national
legislation and treaties, court decisions;

*         Meteorological information, e.g. real-time weather information
and forecasts, climate data and models;

*         Political information, e.g. parliamentary proceedings, voting
records, budget data, election results;

*         Social data, e.g. various types of statistics (economic,
employment, health, population, public administration, social); 

*         Tourism information, e.g. events, festivals and guided tours;

*         Transport information, e.g. information on traffic flows, work
on roads and public transport.

 

Obligation:

E.g.

*         Data that must be provided to the public under a legal
obligation, e.g. legislation, parliamentary and local council
proceedings (dependent on specific jurisdiction);

*         Data that is a (by-)product of the public task, e.g. base
registers, crime records.

 

Usage:

E.g.

*         Data that supports democracy and transparency;

*         Data that is the basis for services to the public;

*         Data that has commercial re-use potential.

 

Quality:

E.g.

*         Authoritative, clean data, vetted and guaranteed;

*         Unverified or dirty data.

 

Size (ranging from small CSV files of less than a megabyte to
potentially tera- or petabytes of sensor or image data)

 

Type/format:

E.g.

*         Text, e.g. legislation, public announcements, public
procurement;

*         Image, e.g. aerial photos, satellite images;

*         Video, e.g. traffic and security cameras;

*         Tabular data, e.g. statistics, spending data, sensor data
(such as traffic, weather, air quality).

 

Rate of change:

E.g.

*         Fixed data, e.g. laws and regulations, geography, results from
a particular census or election;

*         Low rate of change, e.g. road maps, info on buildings, climate
data;

*         Medium rate of change, e.g. timetables, statistics;

*         High rate of change, e.g. real-time traffic flows and airplane
location, weather data.

 

In terms of Best Practices, the last three dimensions (size, type/format
and rate of change) may require different sets of best practices -
publishing real-time traffic flow data may require different processes
and technologies than publishing the results of a census or next year's
public budget. The other dimensions may not need different best
practices but maybe they could serve as topics in use cases?

 

Makx.

 

 


Makx Dekkers

 <mailto:makx@makxdekkers.com> makx@makxdekkers.com

+34 639 26 11 46

 

 

Received on Thursday, 13 February 2014 20:19:01 UTC