W3C home > Mailing lists > Public > public-egov-ig@w3.org > November 2009

RE: Defining "Open" Data (was RE: no F2F3 in 2009 -- Re: Agenda, eGov IG Call, 11 Nov 2009)

From: Anne L. Washington <washingtona@acm.org>
Date: Mon, 16 Nov 2009 16:02:20 -0500 (EST)
To: Joe Carmel <joe.carmel@comcast.net>
cc: 'Daniel Dietrich' <daniel@so36.net>, 'Brian Gryth' <briangryth@gmail.com>, 'Jonathan Gray' <jonathan.gray@okfn.org>, 'eGovIG IG' <public-egov-ig@w3.org>, "'Emmanouil Batsis (Manos)'" <manos@abiss.gr>, 'Todd Vincent' <todd.vincent@xmllegal.org>, 'Niklas Lindström' <lindstream@gmail.com>, "'prof. dr. Tom M. van Engers'" <vanengers@uva.nl>, peter.krantz@gmail.com, 'david osimo' <david.osimo@gmail.com>, 'Jose Manuel Alonso' <josema.alonso@fundacionctic.org>, washingtona@acm.org
Message-ID: <alpine.DEB.1.00.0911161537510.7762@anneasus>
Lots of great ideas ...from specific examples of new scales to solid
definitions of open data.

I agree w Todd Vincent. We can build on existing definitions. The vision 
is out there and has been done by many committees and organizations. 
However, I take Josh's cautionary tale to heart. No need running around 
pointing fingers and making enemies. The public sector doesn't need 
accusations. They need advice.

I am arguing for a scale / scorecard of open-ness. There is already a 
precedent for this type of hierarchy in evaluating businesses. Software 
capabilities are evaluated this way.  It would be a familiar tool for 
anyone in management. From the suggestions in the last few days, it 
seems we are already moving towards several possibilities.

Let's see we create a 5 point scale.

For each point on the scale I'd suggest we start with some generic overall 
descriptions. Someone can see if they are doing ABC, they are only at 2 on 
the scale, while if they are doing XYZ, they are scale 5. For simplicity 
sake so we can get something out the door, we could just list several 
technologies under each scale.

Later we could, possibly, dig into the technology by applying that scale 
to specific technologies like pdfs (as Joe Carmel has done), to data 
downloads (as the original message did), to searching data, to rights 
management, to xml, to html pages... whatever. It would be easy to ask for 
and build use cases underneath each one.


and btw
data.gov.uk, with Tim Berners Lee at the helm, has been announced 
to open in early December 2009.
http://news.bbc.co.uk/2/hi/technology/8311627.stm


Anne L. Washington
Standards work - W3C egov - washingtona@acm.org
Academic work - George Washington University
http://home.gwu.edu/~annew/


On Fri, 13 Nov 2009, Joe Carmel wrote:

> I also agree that Brian Gryth?s "Access, Rights and Formats / Medium"
> breakdown is a good way to clarify and define best practices in this space
> and Todd has identified a great set of parameters and criteria to more fully
> explore and define the meaning behind open data.  I think one of the W3C
> eGov goals was highlighted by Anne Washignton?s idea that a more detailed
> description and hierarchy would be very useful.
>
> Piggy-backing on Anne's idea, I drafted some ideas as a hierarchy based on
> my experience parsing PDF files to provide sub-document access capability at
> www.legislink.org.
>
> http://legislink.wikispaces.com/message/view/home/14870950#toc1 (included
> below)
>
> This listing is based on a single criteria and different criteria would lead
> to different conclusions.  For example, if access performance or ADA
> compliance were used as criteria, different conclusions would likely be
> drawn.
>
> In my breakdown, there are three subcategories (Best, Middle-of-the-Road,
> and Minimal Practices).  Since different file formats offer different
> capabilities (e.g., anchors, ids, named-dests) that determine re-usability,
> listing formats alone (e.g., HTML, XML, PDF) is not sufficient to categorize
> a practice as best, good, or minimal.  For example, if a file is in XML, it
> still might NOT have metadata, ids, or even internal structure (e.g., a text
> file surrounded by one set of tags making it equivalent to an ASCII text
> file).  The use of file format practices makes a format more or less
> reusable and "open".  Alternatively, worst practices might be considered
> "open" in the sense that the data is at a minimum available on the web.
> Publishing scanned historic material is certainly better than not providing
> the material at all (e.g.,
> http://clerk.house.gov/member_info/electionInfo/index.html).  If I
> understand Anne correctly, I think she's trying to get at this sort of
> hierarchical list and explanation.
>
> It seems to me that building such hierarchical use case lists would be a
> good task for the IG or one of the sub-groups we started discussing at the
> end of our last call.
>
>
> On a related topic, for those who haven't seen it,
> http://www.gcn.com/Articles/2009/10/30/Berners-Lee-Semantic-Web.aspx# is an
> interesting recent article covering an interview with Sir Tim Berners-Lee at
> the International Semantic Web Conference (Tim Berners-Lee: Machine-readable
> Web still a ways off).
>
> 	"He said that the use of RDF should not require building new
> systems,
> 	or changing the way site administers work, reminiscing about how
> many
> 	of the original Web sites were linked back to legacy mainframe
> systems.
> 	Instead, scripts can be written in Python, Perl or other languages
> that
> 	can convert data in spreadsheets or relational databases into RDF
> for
> 	the end-users. "You will want to leave the social processes in
> place,
> 	leave the technical systems in place," he said. "
>
> This statement is practical and sounds very much like a call to identify
> which format conditions are more or less convertible to RDF (e.g.,
> http://djpowell.net/blog/entries/Atom-RDF.html,
> http://blogs.sun.com/bblfish/entry/excell_and_rdf).  Maybe this sort of work
> is already being done as a separate effort at W3C or elsewhere?  If a Python
> or Perl script could be written to convert formats to RDF, it would be
> possible to build such conversions as on-the-fly web services or even as a
> spider.  Given a probable future that includes such services, formats that
> are currently considered less "open" will likely be viewed differently under
> at least some circumstances.  I think all of this probably deserves some
> attention and would go a long way toward helping governments and the public
> understand the implications of government electronic publication practices
> for open data.
>
> Thanks much,
>
> Joe
>
> BEST PRACTICES: Direct sub-document access, no file modification needed
>
> 1. XML with ids
> Direct access to every level is possible (author determined)
>
> 2. well-formed HTML with anchors
> Direct access to every level is possible (author determined)
>
> 3. PDF with named destinations
> Direct access to every level is possible (pages are automatic, others are
> author determined)
> From
> http://partners.adobe.com/public/developer/en/acrobat/sdk/pdf/pdf_creation_a
> pis_and_specs/pdfmarkReference.pdf#page=47
> ?Named destinations may be appended to URLs, following a ?#? character, as
> in http://www.adobe.com/test.pdf#nameddest=name. The Acrobat viewer displays
> the part of the PDF file specified in the named destination?
>
> 4. PDF (non-image based)
> Page access by default
>
>
> MIDDLE OF THE ROAD: Direct sub-document access only possible with file
> modification
> 1.. TXT files with consistent formatting
> Direct access to consistently formatted levels with file modification
>
> 2. XML without ids
> Direct access to consistently formatted levels with file modification
> If browsers supported Xpointer within the URL entry, XML and well-formed
> HTML be in the "best" category
> (see http://www.w3schools.com/xlink/xpointer_example.asp#)
>
> 3. HTML without anchors
> Direct access to consistently formatted levels with file modification
>
>
> WORST PRACTICES: OCR needed or human readers only
> 1. PDF (image based only)
> 2. TIFF
> 3. Proprietary models where the document cannot be viewed in all browsers.
>
>
>
Received on Monday, 16 November 2009 20:53:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 16 November 2009 20:53:53 GMT