Re: Raw-data-now

Just some thoughts... a little random, so feel free to flame.

We spend a lot of time talking about "publishing raw data", when what we
are really saying is "Making data available to the public that has to date
only been available internally or on a need to know basis".

So what happens in normal circumstance when we publish the same data
in-house? Obviously if we simply put a dataset on a corporate intranet (as
an example) it would suffer the same woes - employees would find it
unusable because it has no sense of context. What really happens is that
we make available a view of the data, often with either an interface with
supporting explanatory notes (including XLST transforms etc), or through
an API which plugs into some sort of off the shelf or custom program that
makes it useable in house.

(As an example - google makes all of its data available to the public
using both methods - the google site (a view of the data) and an API for
plugging the xml into other applications. If they simply published search
results as "raw usage data of everything we've found on the internet" it
would make no sense to anyone.)

I feel that at the core of it, what we are really talking about in
publishing raw data, is giving the public the same read (not write or
admin etc) access as is available to internal users.

Thus Antti's point 1 - making it available in whatever form it is in, is
fairly easily accomplished in a technical sense, barring policy/governance
obstacles - give the public access to the same systems, views and software
(viewers) as are available internally.

In another sense what I am saying is "make the distinction that EVERYONE
is an employee and somehow working for/with you" - apply the same access
rules - if everyone in your Department has read access to x dataset,
mirror it on an external site and give anonymous read access to it. If
access is via login credentials without security concerns (for tracking
etc), make it the same externally - require registration, verification and
login. If the data is sensitive and classified for any reason, then it
should stay so - no one complains that Joe Blogs in Policy Section A
doesn't have access to commercial-in-confidence tender data, so why should
that be an issue when it comes to Joe Public?

The key is that the "raw data" should, when published, be no less
useful/usable to the public than to the Public Sector personnel who have
exactly the same access in-house.

Applying this methodology to all data makes Step 1 a non-issue in a sense,
and lets people concentrate on the important work of how to link it all
together via ontologies/metadata etc. And you'll probably find that the
public does some of that work for you :)

Cheers

Chris

> I should add that we've been thinking about creating a new site at
> rawdatanow.com (domain kindly donated by Tom Heath) for
> individuals/organisations thinking about publishing their 'raw data' -
> to expand upon what we mean by this phrase and offer guidance and
> links to relevant resources.
>
> Thread is here:
>
>   http://lists.okfn.org/pipermail/okfn-discuss/2009-October/006940.html
>
> Jonathan
>
> On Sun, Jan 24, 2010 at 1:34 PM, Jose Manuel Alonso
> <josema.alonso@fundacionctic.org> wrote:
>> El 23/01/2010, a las 15:29, Peter Krantz escribió:
>>>
>>> On Sat, Jan 23, 2010 at 12:38, Antti Poikola <antti.poikola@gmail.com>
>>> wrote:
>>>>
>>>> I believe that there should be some pragmatic way to publish raw data
>>>> at
>>>> the
>>>> same time while continuing to develope the information architecture.
>>>> How
>>>> this should be communicated to the managers of different public
>>>> offices?
>>>>
>>>
>>> I have had some success in explaining a three step process:
>>>
>>> 1. Publish whatever you have in whatever format it currently is in.
>>> This provides data for people to start tinkering with and ask
>>> questions about.
>>> 2. While data is out there, start thinking about the context it lives
>>> in. We are looking at harmonizing the way agencies publish their
>>> vocabularies as a first step (e.g. OWL).
>>> 3. Gradually adapt your data to make it use common identifiers for
>>> common things.
>>
>> +1
>>
>> We are currently using this approach in a project.
>> I would say we are also doing a 1.5: describing metadata properly. I
>> must
>> say that, although doing it this way is ok, usefulness of 1. in many
>> cases
>> is quite low. Stupid example: <weather date="12643488000000" type="0" />
>>
>> Yeah, today at 10am it was cloudy around here...
>>
>> -- Jose
>>
>>
>
>
>
> --
> Jonathan Gray
>
> Community Coordinator
> The Open Knowledge Foundation
> http://blog.okfn.org
>
> Twitter/Identica: jwyg
>
>

Received on Sunday, 24 January 2010 22:44:52 UTC