Re: Public Data Catalog Priorities and Demand

Thanks Jose for your "two cents" and for the others that reacted my 
question.

To make it clear, I'm quite looking for a standard cataloging format, 
but the human understandable big picture, a visualization or easy to 
grasp categorization/list of typical PSI datasets, maybe a "map of PSI". 
This discussion developes the questions and indicates that there is no 
clear answer to the questions yet. Some more questions...

* How would the "national registry of lakes", "geodata of high voltage 
electric network", "public job vacancies" and "directory of restaurants 
holding licence to serve alcohol" for example relate to the universe of PSI?

* If there is, let's say some thousand, datasets in data.gov, is there 
any analysis or wild guesses of how many is missing 10 000, 50 000, 100 
000, 500 000?

* Is there any analysis what is popularily used and what is pure noise 
and not interesting to any developers, democracy advocates or anybody?

I found these two analysis about data.gov:
http://blog.programmableweb.com/2009/07/20/whats-in-datagov/
http://data-gov.tw.rpi.edu/wiki/File:Data-gov-cloud-200910.png



Jose Manuel Alonso kirjoitti:
> My guess based on current experience is that this is not easy to 
> compile. A national (Spain) report on eGov recently released states 
> that the two most important information sets at regional (state) level 
> for citizens are: organization chart and public job vacancies.
Any link to that?

> Said that, there are much more variables that have an impact in an 
> open data project. We have identified 20+ important ones, some are 
> technical, some are organizational, some are policy-related... it's a 
> tough and complicated issue.

Mind of sharing those 20+ at some wikipage where we could discuss those?

> Just my 2 euro cents :-)
>
> -- Jose
>
>
> El 18/12/2009, a las 16:10, Joe Carmel escribió:
>> I totally agree with you Antti.  I think data.gov and other government
>> websites should be looking to use a standards-based data cataloging 
>> format
>> (e.g., extending AtomXML or OPDS) that allows entries link to be data 
>> files
>> or other catalogs.  Similar to sitemaps and HTML, governments would 
>> publish
>> a file at the root of their websites that provides a catalog to the data
>> files on their site.  By enabling the catalog format to point to other
>> catalogs, a root catalog could point to sub-department level catalogs
>> allowing data catalog management responsibilities to be distributed 
>> within
>> an organization.
>>
>> At present, governments use HTML in a variety of ways for data 
>> cataloging.
>> This looser approach has made it difficult to get one's arms around 
>> all of
>> the data being published at a given site. (e.g,
>> http://www.atlantis-press.com/php/download_paper.php?id=1763).  IMO, 
>> if a
>> standard data catalog format was used it would presumably be with XML 
>> which
>> would enable individual catalogs to "look" different from one site to
>> another (using CSS or XSL), but the underlying data structures would 
>> be the
>> same--allowing for machine readability.
>>
>> By providing access to remote data storage, the Internet has been 
>> used to
>> publish data and documents.  Standard file names (index.htm, 
>> main.htm) are
>> used as HTML entry points for websites.  The default HTML file then uses
>> hypertext links to provide access to subsequent files.  In the same 
>> way HTML
>> provides links to any file, I believe that standardized catalog files
>> pointing to sub-catalogs and data files could enable a more 
>> searchable and
>> usable web of data.
>>
>> Joe
>>
>> -----Original Message-----
>> From: public-egov-ig-request@w3.org 
>> [mailto:public-egov-ig-request@w3.org]
>> On Behalf Of Antti Poikola
>> Sent: Friday, December 18, 2009 1:10 AM
>> To: Jonathan Gray
>> Cc: Steven Clift; public-egov-ig@w3.org; sunlightlabs@groups.google.com
>> Subject: Re: Public Data Catalog Priorities and Demand
>>
>> Hi,
>>
>> Please Jonathan, Steven and others, let us know if you find some
>> visualization, categorization or prioritization that would clarify the
>> "swamp" of public sector information sources.
>>
>> I'm looking for two things:
>>
>> 1. A easy way to get the BIG PICTURE of what kind of public sector
>> information most propably exists (even if it is not open yet)
>> in a typical country or city.
>>
>> 2. Some priorities from the information re-users point of view
>>
>> So far I have found only listings and catalogues that can be re-ordered
>> according to some topics (for example CKAN and data.gov), but these are
>> not really helping to give the big picture. From this kind of catalogues
>> it is easy to find some specific data source if you know what you are
>> looking for, but if you just want to see what is out there and build the
>> overview the catalogues are not so helpful.
>>
>> Best regards
>>
>> -Antti "Jogi" Poikola
>>
>>
>> Jonathan Gray kirjoitti:
>>> Just to let you know, we're currently working on this with CKAN.net.
>>> Also very interested in thinking about how we can track how different
>>> datasets are reused.
>>>
>>> Jonathan
>>>
>>> On Mon, Nov 23, 2009 at 4:20 PM, Steven Clift <clift@e-democracy.org>
>> wrote:
>>>
>>>> Has anyone explored what government data is in highest "demand" on the
>>>> emerging public data reuse sites? How does interest from different
>>>> re-user audiences vary (e.g.  business, media, open gov advocates,
>>>> independent coders, etc.)
>>>>
>>>> Also, has anyone started a comparsion chart of what different
>>>> governments are providing? It would be interesting to quickly see what
>>>> different national or local governments are providing now and over
>>>> time. This gets to the "what's important" to release for easy reuse
>>>> versus what is the easiest or least politically sensitive.
>>>>
>>>> Steven Clift
>>>> E-Democracy.org
>>>>
>>>> -- 
>>>> Steven Clift - http://stevenclift.com
>>>> Executive Director - http://E-Democracy.Org
>>>> Follow me - http://twitter.com/democracy
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>

Received on Sunday, 20 December 2009 22:00:48 UTC