Re: VS: generic list of public data sources from Chris Beer on 2009-11-02 (public-egov-ig@w3.org from November 2009)

From: Chris Beer <chris-beer@grapevine.net.au>
Date: Mon, 02 Nov 2009 23:01:04 +1100
To: "Peristeras, Vassilios" <vassilios.peristeras@deri.org>
CC: Antti Poikola <antti.poikola@gmail.com>, Jonathan Gray <jonathan.gray@okfn.org>, Li Ding <dingl@cs.rpi.edu>, eGov IG <public-egov-ig@w3.org>, Rastas Taru <Taru.Rastas@mintc.fi>
Message-ID: <4AEECA00.5090509@grapevine.net.au>
Hi Vassilios

Comments within your reply

Peristeras, Vassilios wrote:
>
> Hello Chris,
>
> Very interesting ideas.
>
> Some comments.
>
> /If tags are specifically set by /
>
> /the public sector, there is always the possibility that the tags lose /
>
> /relevance to the user./
>
> We discuss top-down (tags provided by the public sector) versus 
> bottom-up (tags provided by people) tagging.
>
> The first approach keeps the fundamental idea that the producer has 
> the right to arrange, define and classify the data they produce. The 
> second challenges this idea and is aligned to Web 2.0 rhetoric, as it 
> includes the clients in the picture and gives them the right to add 
> their own metadata which at a second round can be used to create 
> folksonomies in order to generate classification schemas bottom-up. So 
> yes, we want to include people's voice in classification systems like 
> this, but subjectivity, noisy and context-biased tags (e.g. "The 
> dole") are shortcomings of this bottom-up (Web 2.0) approach. 
> Cross-border issues are apparent where "borders" and not only national 
> but could also involve cultural/linguistic etc aspects.
>
> Could we combine both approaches to take the best of each? Actually 
> this seems to be what you propose. Not sure how, but looks a very 
> interesting perspective.
>
Some more thoughts in response, because you got me thinking ;)

When I initially wrote my reply, I did indeed have in mind the idea of a 
taxonomy mixed with a folksonomy. I've been thinking through the problem 
since then, however from a traditional Information Architecture 
perspective rather than from the point of view of tagging/semantic web 
methods. Its the Information Architecture angle I believe is really 
crucial in this regard - in the sense that IA is so often confused with 
User Interface and User Experience. Where the data lives, who "owns' it, 
the Information Architecture in that sense, (ie: the Top Down approach) 
is much much different from the navigation aspect - "how you get to it" 
- in that sense bottom up tagging is absolutely essential to the model 
as well, as there can be a million different ways from a million 
different users to categorise a single piece of data. Combining both 
methods becomes the only way at the end of the day to allow the 
flexibility required by such a categorisation system as well as the 
standardisation that is required on the machine readable/systems level - 
which changes your question from "Could we" to "How do we".

I now believe that it would be relatively easy to combine both top down 
and bottom up approaches (with or without a static taxinomy in place) by 
drawing a simple line between top level classification and user defined 
classification. The key I believe is in not losing sight of the basics - 
why are we tagging the data. We're on one level doing it to define the 
dataset against other datasets ("this is dataset (a), this is (b) - you 
can see from the tags they are different"), as well as making it 
possible on a machine readable level to mash them up. ("Query: Find me 
all datasets where tag (a) = x and mash them with all datasets where tag 
(b) = y but only where tag (c) = z in both").

But tagging has its core origins in human readable information - people 
being able to find information.When we combine this with other ongoing 
discussions on provenance (eg: 
http://www.w3.org/2005/Incubator/prov/charter ) and ways to guarantee 
validity of data, we can always return to one basic fact that won't 
change - the original dataset, and in a perfect world, the only 
reputable hosted source of the dataset, is on a server controlled by a 
public sector organisation - usually the website of the 
department/ministry in question or the data.gov.* equivalent. And it 
will "always be there", a fact that is core to Linked Open Datasets in 
e-Gov and discussed in both the /Publishing Open Government Data working 
draft <http://www.w3.org/TR/gov-data/> /and at the core of documents 
such as /Cool URI's for the Semantic Web 
<http://www.w3.org/TR/cooluris/>/ and the TBL classic /Cool URI's don't 
change <http://www.w3.org/Provider/Style/URI>/. This is where the 
information architecture angle really comes into play. While we can 
allow the dataset itself to be publically tagged, the 
tagging/classification of the URI where the dataset lives can easily 
controlled by the host. A combination of both gives us a lot of power in 
combining both approaches.

Let's take Antti's example of Alcohol licenses in a city. For arguments 
sake, we'll expand it a little and add the scope that Liquour Licensing 
is a) held by individuals/businesses (the licensee) and b) applied to an 
actual physical business (the licensed premises) and c) all licenses are 
administered by individual municipalities on a local government level.

First we apply the top down approach - we (the public sector 
professionals engaged in releasing the data to the public in any form) 
tag the dataset, and the location it lives at, from our perspective. We 
might classify it as: Country (x), State (xx), Local (xxx) and 
Administrating Department/Ministry/Office (xxxx) data. We then add in a 
tag that links it to the Liquor Act of 19xx, and a couple more that 
cover the fact that it concerns "Licensing", "Controlled Drugs or 
Substances" and "Business". We would probably within the dataset add in 
extra tagging like the geotags of the licensed premises, linked tags to 
company registers, and so on. This also solves the cross-border issue in 
that at this level we're working with a fairly static set of tags/URI's 
that are easily (compared to public tagging) translated/transformed when 
"cross-border" aspects come into play.

We then move to the bottom up approach simultaneously. If one had, for 
instance, everything set up nicely, along 2.0 concepts such as "My Page" 
on our data.gov.* site - then each public user could add whatever tags, 
at whatever level of depth they so desired - these tags could be saved 
as user preferences, preferably server side, but in such a way that they 
could combine to form a separate open tag cloud for the use of all if 
needed. In that sense it becomes irrelevant from an actual public sector 
POV how noisy or context-biased the tags get as long as you keep the two 
seperated. You might even go the extra step to allow searches on 
data.gov.* sites to use an "official tag search" option or a "public tag 
search" option or both. A regular review of the public tags, if held 
serverside, will slowly develop a folksonomy that can be integrated into 
the "official tag set".

Its the Information Architecture part I believe is really crucial in 
this regard - in the sense that IA is so often confused with User 
Interface and User Experience. Where the data lives, who "owns' it, ie: 
the Top Down information is much much different from "how you get to it" 
- in that sense bottom up tagging is absolutely essential to the model 
as well, as there can be a million different ways from a million 
different users to categorise a single piece of data. Combining both 
methods becomes the only way at the end of the day to allow the 
flexibility required by such a categorisation system as well as the 
standardisation that is required on the machine readable/systems level - 
which changes your question from "Could we" to "How do we".

Please - feel free to pick it apart people - just throwing out the ideas :)


> /its almost impossible to link data /
>
> /straight to government portfolio and business area/
>
> Unless you introduce a simple and rather straightforward guideline: 
> the data producer (or collector) is the owner. Btw, this does not 
> necessarily imply that the owner has the right to set the CRUD 
> policies over this data, especially the “Read” part. E.g. a ministry 
> may produce some data but who can access it, could be something to be 
> decided higher e.g. at the cabinet/president level.
>

I agree in that sense, however my comment was more in regards to the 
fact that publically released datasets can already be a combination of 
datasets collected by various stakeholders. eg: A publically available 
dataset called "Information on Wetlands Management" might be a 
combination of data from a national Survey Department, a state 
Environment Department and Land Management Agencies funded at a national 
level and operated at a local government level, collated and released 
through a single data.gov.* site. Who then, is the owner?

If I read between the lines of your reply there, I guess that in time 
with more and more centralised data.gov.* sites that my initial comment 
will be almost rendered moot. CRUD policy will be dictated from various 
information management ministries/offices who will not discriminate 
about access as long as users meet set authentication rules.

> /how all governments at all levels are actually structured at the top /
>
> /level in terms of portfolios, departments, ministries etc and see what /
>
> /patterns, if any, appear?/
>
> I would be surprised to see anything different to the classical 
> functional differentiation (e.g. transportation, health, education, 
> security, etc). Actually many management theories for decades now 
> (NPM, BPR, TQM and Enterprise/Government 2.0 now) have advocated much 
> towards a more “horizontal” organization of the public sphere (and of 
> private enterprises) contrary to the vertical, hierarchical and 
> stovepipe functional division which goes back to the 19^th century. To 
> the best of my knowledge, I am not aware of any real adoption of such 
> radical (re-)organization at a large scale.
>
Agree 100% - but it'd be interesting to look at nonetheless.
>
> /I think any public sector information on the /
>
> /internet could benefit in this regard/
>
> Agree 100%. I am just a bit worried on the feasibility due to the 
> complexity this would involve. Nevertheless, I am very much interested 
> to assist if you decide to go this way.
>

My issue is always time, rather than complexity. But I may well hold you 
to that offer :)

Cheers

Chris

> Best regards,
> Vassilios
>
> -----Original Message-----
> From: Chris Beer [mailto:chris-beer@grapevine.net.au]
> Sent: 01 November 2009 13:08
> To: Rastas Taru
> Cc: Peristeras, Vassilios; Antti Poikola; Jonathan Gray; Li Ding; eGov IG
> Subject: Re: VS: generic list of public data sources
>
> Hi all
>
> I like where this discussion is going. I agree that tags will certainly
>
> offer flexibility, and probably should form the kernel of a system at a
>
> more specific level of use, however tagging in and of itself presents
>
> the problem of standards and taxomies, especially when looking at the
>
> problem from a cross-border or e-Government perspective.
>
> The direction tagging is taking, as seen in the public eye, is for the
>
> public themselves to do the tagging, either explicitly or via search
>
> term mining by a hosting organisation. If tags are specifically set by
>
> the public sector, there is always the possibility that the tags lose
>
> relevance to the user. (An example from the Australian perspective would
>
> be Social Security Benefits paid to unemployed people. Standard public
>
> sector governance anywhere (I'm guessing) would tag such a dataset as
>
> "social security" or "social services". However 90% of the public in
>
> Australia would tag this as "The dole" - a commonly accepted nickname
>
> for "social security benefits."). The cross-border issues I see as being
>
> most immediate in a tagged based system is in the
>
> localisation/translation aspect - the flexibility of tagging can easily
>
> become a nightmare in terms of defining a namespace and terms for
>
> translating tags on the fly.
>
> While I accept that tagging is certainly the way of the future for most
>
> publically accessible data, I think that if this taxonomy goes that way,
>
> it should work hand in hand at this point with a strict taxonomy of some
>
> sorts, even basic DC metadata, until a stable distribution of tags
>
> (gleaned from public tagging?) forms and a vocabulary for a namespace
>
> could be developed with some degree of accuracy.
>
> My own experiences have shown that there is so much cross-over on
>
> datasets in terms of category or business "ownership" (even within a
>
> single department/ministry) that its almost impossible to link data
>
> straight to government portfolio and business area unless its very
>
> specific data. Machinery of Government changes (the creation of new
>
> departments/ministries, or the seperation or mashup/restructure of
>
> existing ones) will also affect how data is categorised if we're
>
> following Vassilios's quick and dirty method, even when adding "life
>
> event" or "business episodes". Different types of political systems also
>
> create different inherent structures and portfolio areas (eg: countries
>
> where law enforcement is the purview of the defence forces for instance).
>
> Thought: Would it be worth doing something akin to, or in concert with,
>
> the proposal put forward in the "Group Call Tomorrow / Best Practice
>
> Publishing" thread of "seeing what's out there" - ie: a quick collation
>
> on how all governments at all levels are actually structured at the top
>
> level in terms of portfolios, departments, ministries etc and see what
>
> patterns, if any, appear? It may then give this discussion a good point
>
> of reference to start hacking on to come up with some sort of system
>
> that works - which I personally think needs to be done, and for more
>
> than just datasets - I think any public sector information on the
>
> internet could benefit in this regard.
>
> Cheers
>
> Chris Beer
>
> Invited Expert
>
> W3C e-Gov IG
>
> Rastas Taru wrote:
>
> > Hi,
>
> >
>
> > I need to hop in to your good discussion. I'm ministerial adviser in 
> the Ministry of Communications and co-operating with Antti regarding 
> open public data issues here in Finland. The taxonomy aspect is indeed 
> important. I would go with Vassilios idea that subject based grouping 
> is probably the most useful from the citizen (life events and 
> activities like housing, transport, public safety, work etc.) and 
> business (services necessary in everyday business lifecycle) point of 
> view. Example: is it the grouping used in Suomi.fi portal 
> (www.suomi.fi/suomifi/english/index.html) or any other kind (many 
> worldwide!). This way available services could be also added to be 
> developed further: I just figured out that for example "open jobs"- on 
> line service (www.mol.fi) is basically open API (?) but not accessible 
> or developers probably don't know this. Antti's well thought mind map 
> could be arranged in the life event too I guess or perhaps the issue 
> goes further that some sort of general "cross-border" taxonomy could 
> be useful from developers point of view? Anyhow administrative way of 
> grouping is no good I think as for users it shouldn't matter. In the 
> subject based grouping, at the best links to "shared services" can be 
> found between different administrations (perhaps affecting even 
> goverment's service mind)?
>
> >
>
> > Regards,
>
> > Taru
>
> >
>
> > Taru Rastas
>
> > Ministerial Adviser
>
> > Media and Communications Services
>
> > Ministry of Transport and Communications
>
> >
>
> > Tel: +358 9 160 28617
>
> > Mob: +358 40 7155075
>
> > taru.rastas@mintc.fi
>
> > Fax: +358 9 16028588
>
> >
>
> > Office: Eteläesplanadi 18, Helsinki
>
> > P.Box 31, FI-00023 Government, Finland
>
> >
>
> >
>
> > -----Alkuperäinen viesti-----
>
> > Lähettäjä: public-egov-ig-request@w3.org 
> [mailto:public-egov-ig-request@w3.org] Puolesta Peristeras, Vassilios
>
> > Lähetetty: 29. lokakuuta 2009 19:37
>
> > Vastaanottaja: Antti Poikola; Jonathan Gray
>
> > Kopio: Li Ding; eGov IG
>
> > Aihe: RE: generic list of public data sources
>
> >
>
> > Hi Antti,
>
> >
>
> > This is an interesting discussion.
>
> > I see that you are not looking for data sets but for a taxonomy.
>
> > The quick (and dirty) way is to follow the administrative structure 
> (more or less ministries). A good example is here [1] from FEA.
>
> > But then you have the same problems we experienced with the grouping of
>
> > services: they can be found only if you are aware of the 
> administrative structure. Several approaches tried to ameliorate this. 
> The most common paradigm has been the "life-event" and "business 
> episode" based service groupings. Can they be used for data? I 
> wouldn't say so.
>
> > So the question is: Is there a better way to organize governmental 
> data from what is presented in [1]-like approaches? Don't have an 
> answer...
>
> > BTW, Jonathan's idea on using tags gives an interesting perspective.
>
> >
>
> > Regards,
>
> > Vassilios
>
> >
>
> > Taking the opportunity, the current issue of IEEE Intelligent Issue 
> is on eGovernment. You may find it interesting [2].
>
> >
>
> > [1] http://en.wikipedia.org/wiki/Business_reference_model
>
> > [2] http://www.computer.org/portal/web/intelligent/home
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > -----Original Message-----
>
> > From: public-egov-ig-request@w3.org
>
> > [mailto:public-egov-ig-request@w3.org] On Behalf Of Antti Poikola
>
> > Sent: 29 October 2009 18:34
>
> > To: Jonathan Gray
>
> > Cc: Li Ding; Antti Poikola; eGov IG
>
> > Subject: Re: generic list of public data sources
>
> >
>
> > Thanks Li, Owen and Jonathan
>
> >
>
> > I'm well aware that there are several sites listing the actual more or
>
> > less open data sources like the data.gov and CKAN
>
> >
>
> > I am looking a general topic list that would guide me that in my country
>
> >
>
> > there are most propably some organization holding data about this, this
>
> > and this. Ofcourse I can compile the list by going trough the existing
>
> > data catalogues... The Owens detailed categorization was good for the
>
> > statistical data, but statistical data is just one branch in the owerall
>
> >
>
> > picture... what about the register of "Alcholo licences in a city" or
>
> > something more weird but usefull.
>
> >
>
> > Just to give you an idea i drafted out of my head a MindMap that I would
>
> >
>
> > like to develope to cover the full picture.
>
> >
>
> > http://mind42.com/pub/mindmap?mid=b84b44a0-4636-4de9-9a00-5a4513195ce2
>
> >
>
> > All resource links are wellcome
>
> >
>
> > BR,
>
> >
>
> > -Antti
>
> >
>
> > Jonathan Gray wrote:
>
> >
>
> >> We've also got over 680 (mostly) open data packages listed on CKAN, an
>
> >> open source registry of open data:
>
> >>
>
> >> http://ckan.net/
>
> >>
>
> >> See, e.g.:
>
> >>
>
> >> * Linking Open Data group
>
> >> - http://ckan.net/group/lod
>
> >> * Packages as part of EU Open Data Inventory (alpha)
>
> >> - http://ckan.net/tag/read/eutransparency
>
> >> * Search for tags including 'country-[...]'
>
> >> -
>
> >>
>
> > http://ckan.net/package/search?q=country-&search=Search+Packages+%C2%BB
>
> >
>
> >> There are hopefully over 1000 UK government datasets on the way, as
>
> >> data.gov.uk is using CKAN. Regarding categories, we've found a
>
> >> flexible tag based approach quite useful.
>
> >>
>
> >> It would be great to ensure interoperability between CKAN and other
>
> >> open government data catalogues - so different bits of the 'open data
>
> >> ecosystem' can all talk to each other! We've started talking to Peter
>
> >> about this a bit regarding opengov.se.
>
> >>
>
> >> Out of interest - would anyone be interested in having an online
>
> >> meeting about this? E.g. next Tuesday (3rd November) evening at 1800
>
> >> GMT?
>
> >>
>
> >> Best wishes,
>
> >>
>
> >>
>
> >>
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
Received on Monday, 2 November 2009 12:01:46 UTC