RE: Universal distributed open government data catalog? from Joe Carmel on 2010-02-03 (public-egov-ig@w3.org from February 2010)

From: Joe Carmel <joe.carmel@comcast.net>
Date: Wed, 3 Feb 2010 09:26:55 -0500
To: "'Ed Summers'" <ehs@pobox.com>, "'eGov IG'" <public-egov-ig@w3.org>
Message-ID: <000c01caa4dc$eff80650$cfe812f0$@carmel@comcast.net>
Ed, 

I wholeheartedly agree with your Atom recommendation.  IMO (and probably 99%
of us), Atom is the Internet's most widely adopted XML vocabulary that is,
essentially, a container for catalog entries.  

I've been working with Owen Ambur and others using the StratML files
available through http://xml.gov/stratml/drybridge/urls.xml.  As a
repository of web resources (or just a collection of 500+ small related
files), StratML provides a good opportunity to demonstrate data cataloging
options. 

To this end, we've been first building an XForm for StratML but as part of
the back end, creators and users are given the opportunity to catalog a
StratML file--that is create or update Atom catalog entries.  To see it
work, first use a browser other than IE and enter a URL such as
http://xml.gov/stratml/drybridge/USDAstratplan.xml at
http://www.xmldatasets.net/XF2/stratmlxform3.xml.  Then choose CatXForm at
the bottom of the form.  [I'm having problems with XForms, IE, and multiple
namespaces].

The catalog XForm allows creators/users to enter additional metadata and
then to publish the StratML file entry to the Atom catalogs with th main
Atom catalog at http://www.xmldatasets.net/stratml/catalog.xml.

When a file is cataloged, the master and subcategory Atom files are
automatically updated, so visitors to the Atom feeds can quickly see which
files have been last updated across the entire collection.

I invite others to join us in this effort or let's create alternative
catalogs for this repository.  Having several models and examples to compare
and contrast might help get the conversation going in terms of pros and
cons.

Because this "site" is operating with alpha and beta scripts and code,
things will break and get fixed unexpectedly.  I'm currently working on
incorporating the iframe presented at
http://www.labnol.org/internet/tools/using-wikipedia-api-demo-source-code-ex
ample/3076/  in order demonstrate using Wikipedia as a controlled vocabulary
for dc:subject and possibly build a Wikipedia-based browsing capability in
order to expose similar Strategic Plans.

One challenge with Atom is that IE's stylesheet for Atom provides greater
capabilities than other browsers.  It would be great if a stylesheet with
IE's capabilities were made available in order to display Dublin Core
information and provide enhanced inline filtering.  Does anyone know
if/where this XSLT can be found?

If others accept the challenge to create Atom catalogs, others are also
needed to help define what features are needed to provide usefulness.  What
are the demonstrable benefits of cataloging data and documents in this
manner for end users?  

In addition: What features of Atom files work best with search engines and
how should Atom files be set up to provide maximum useful information to
search engines?  Can a network of interrelated Atom catalogs provide an
effective electronic card catalog for the web and has a demonstration of
this already been done?  Are there metadata elements that are (or should be)
included that are specific to governments (e.g., namespace of gov or
govmeta) such as jurisdiction, more detailed organizational identity
(department/agency/branch/unit), etc.?  Are there opportunities here to move
Atom data to RDF and the semantic web with reusable and modifiable tools?

I apologize for posing so many questions at once.  I'm hoping the questions
and the demo will spur and challenge our collective creativity.

Best regards,

Joe


-----Original Message-----
From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org]
On Behalf Of Ed Summers
Sent: Tuesday, February 02, 2010 1:00 PM
To: eGov IG
Subject: Re: Universal distributed open government data catalog?

On Tue, Feb 2, 2010 at 8:02 AM, Antti Poikola <antti.poikola@gmail.com>
wrote:
> During this year 2010 most propably 3 new catalogs will open
>
> 1. National Official
> 2. Regional (Helsinki capital city region)
> 3. Independent developer community catalog opengov.fi
>
> How can we make sure that the catalogs work togeather between three of
them
> and togeather with the rest of the catalogs:
> http://www.diigo.com/list/apoikola/publis-sector-data-catalogues

Thanks for your post Antti -- I guess this is a goal many of us share.

My personal opinion is that a key ingredient to making this happen is
to publish dataset availability and metadata using a syndicated feed
(Atom and/or RSS). This would allow datasets to be aggregated at the
national level from the regional levels, and would also empower an
egov developer community (around the world) to do the same. This idea
is not my own; it's work that Erik Wilde, Eric Kansa and Raymond Yee
did at the UC Berkeley analyzing the recovery.gov effort here in the
US [1].

I really think there is an opportunity here for the w3c egov working
group to publish a Note or some other document that defines best
practices for publishing dataset availability without dipping into the
details of what formats to use for the datasets themselves.

My hunch is that this lightweight web-centric approach would work
quite well with rdf/xml, ebXML, pdf, and other data formats. Feed
syndication is a technology that is widely deployed across the web,
and would allow government data to participate in an already thriving
ecosystem of machine readable data.

//Ed

[1] http://escholarship.org/uc/item/0fv601z8
Received on Wednesday, 3 February 2010 14:27:32 UTC