Re: Defining "Open" Data (was RE: no F2F3 in 2009 -- Re: Agenda, eGov IG Call, 11 Nov 2009)

Defining "open data", although an artificial construction, may still be
helpful.   It would establish a beginning point or at least context of how
this group, Open Knowledge Foundation, or others conceptualize the
term/idea.  "Open" is a tricky term just like "transparency" or "public
record".  The fact is if a law or executive order is enacted to mandate the
release of government data as open data (or insert appropriate term) then
that term will have to be defined.  (Or enter the lawyers!)  Is a definition
critical?  I am not sure it is a show stopper.

What is more to the point are guidelines.  As Josh points out, rigid
constructs are likely to create artificial conflict that is not the intent.
Joe's work and Anne's suggestion are good, but I take pause calling any
guidelines best practices or a scale.  I'd rather leave the good and evil
arguments to Nietzsche and other philosophers.  If the structure of the
guidelines could be made more neutral, it may be helpful.  Placing the
discussion in a non-judgmental light might be more beneficial.  Talking
about how "open data" is helpful or advances the agencies mission will go
along way.  Laying out the pros and cons will also help policy makers and
people like me who are promoting these concepts within government.

Finally, I wanted to share an interesting post I ran across yesterday that
addresses some of this issues,
http://osrin.net/2009/10/open-government-data-and-the-great-expectation-gap/

On Mon, Nov 16, 2009 at 2:02 PM, Anne L. Washington <washingtona@acm.org>wrote:

> Lots of great ideas ...from specific examples of new scales to solid
> definitions of open data.
>
> I agree w Todd Vincent. We can build on existing definitions. The vision is
> out there and has been done by many committees and organizations. However, I
> take Josh's cautionary tale to heart. No need running around pointing
> fingers and making enemies. The public sector doesn't need accusations. They
> need advice.
>
> I am arguing for a scale / scorecard of open-ness. There is already a
> precedent for this type of hierarchy in evaluating businesses. Software
> capabilities are evaluated this way.  It would be a familiar tool for anyone
> in management. From the suggestions in the last few days, it seems we are
> already moving towards several possibilities.
>
> Let's see we create a 5 point scale.
>
> For each point on the scale I'd suggest we start with some generic overall
> descriptions. Someone can see if they are doing ABC, they are only at 2 on
> the scale, while if they are doing XYZ, they are scale 5. For simplicity
> sake so we can get something out the door, we could just list several
> technologies under each scale.
>
> Later we could, possibly, dig into the technology by applying that scale to
> specific technologies like pdfs (as Joe Carmel has done), to data downloads
> (as the original message did), to searching data, to rights management, to
> xml, to html pages... whatever. It would be easy to ask for and build use
> cases underneath each one.
>
>
> and btw
> data.gov.uk, with Tim Berners Lee at the helm, has been announced to open
> in early December 2009.
> http://news.bbc.co.uk/2/hi/technology/8311627.stm
>
>
>
> Anne L. Washington
> Standards work - W3C egov - washingtona@acm.org
> Academic work - George Washington University
> http://home.gwu.edu/~annew/
>
>
> On Fri, 13 Nov 2009, Joe Carmel wrote:
>
> I also agree that Brian Gryth?s "Access, Rights and Formats / Medium"
>>
>> breakdown is a good way to clarify and define best practices in this space
>> and Todd has identified a great set of parameters and criteria to more
>> fully
>> explore and define the meaning behind open data.  I think one of the W3C
>> eGov goals was highlighted by Anne Washignton?s idea that a more detailed
>> description and hierarchy would be very useful.
>>
>> Piggy-backing on Anne's idea, I drafted some ideas as a hierarchy based on
>> my experience parsing PDF files to provide sub-document access capability
>> at
>> www.legislink.org.
>>
>> http://legislink.wikispaces.com/message/view/home/14870950#toc1 (included
>> below)
>>
>> This listing is based on a single criteria and different criteria would
>> lead
>> to different conclusions.  For example, if access performance or ADA
>> compliance were used as criteria, different conclusions would likely be
>> drawn.
>>
>> In my breakdown, there are three subcategories (Best, Middle-of-the-Road,
>> and Minimal Practices).  Since different file formats offer different
>> capabilities (e.g., anchors, ids, named-dests) that determine
>> re-usability,
>> listing formats alone (e.g., HTML, XML, PDF) is not sufficient to
>> categorize
>> a practice as best, good, or minimal.  For example, if a file is in XML,
>> it
>> still might NOT have metadata, ids, or even internal structure (e.g., a
>> text
>> file surrounded by one set of tags making it equivalent to an ASCII text
>> file).  The use of file format practices makes a format more or less
>> reusable and "open".  Alternatively, worst practices might be considered
>> "open" in the sense that the data is at a minimum available on the web.
>> Publishing scanned historic material is certainly better than not
>> providing
>> the material at all (e.g.,
>> http://clerk.house.gov/member_info/electionInfo/index.html).  If I
>> understand Anne correctly, I think she's trying to get at this sort of
>> hierarchical list and explanation.
>>
>> It seems to me that building such hierarchical use case lists would be a
>> good task for the IG or one of the sub-groups we started discussing at the
>> end of our last call.
>>
>>
>> On a related topic, for those who haven't seen it,
>> http://www.gcn.com/Articles/2009/10/30/Berners-Lee-Semantic-Web.aspx# is
>> an
>> interesting recent article covering an interview with Sir Tim Berners-Lee
>> at
>> the International Semantic Web Conference (Tim Berners-Lee:
>> Machine-readable
>> Web still a ways off).
>>
>>        "He said that the use of RDF should not require building new
>> systems,
>>        or changing the way site administers work, reminiscing about how
>> many
>>        of the original Web sites were linked back to legacy mainframe
>> systems.
>>        Instead, scripts can be written in Python, Perl or other languages
>> that
>>        can convert data in spreadsheets or relational databases into RDF
>> for
>>        the end-users. "You will want to leave the social processes in
>> place,
>>        leave the technical systems in place," he said. "
>>
>> This statement is practical and sounds very much like a call to identify
>> which format conditions are more or less convertible to RDF (e.g.,
>> http://djpowell.net/blog/entries/Atom-RDF.html,
>> http://blogs.sun.com/bblfish/entry/excell_and_rdf).  Maybe this sort of
>> work
>> is already being done as a separate effort at W3C or elsewhere?  If a
>> Python
>> or Perl script could be written to convert formats to RDF, it would be
>> possible to build such conversions as on-the-fly web services or even as a
>> spider.  Given a probable future that includes such services, formats that
>> are currently considered less "open" will likely be viewed differently
>> under
>> at least some circumstances.  I think all of this probably deserves some
>> attention and would go a long way toward helping governments and the
>> public
>> understand the implications of government electronic publication practices
>> for open data.
>>
>> Thanks much,
>>
>> Joe
>>
>> BEST PRACTICES: Direct sub-document access, no file modification needed
>>
>> 1. XML with ids
>> Direct access to every level is possible (author determined)
>>
>> 2. well-formed HTML with anchors
>> Direct access to every level is possible (author determined)
>>
>> 3. PDF with named destinations
>> Direct access to every level is possible (pages are automatic, others are
>> author determined)
>> From
>>
>> http://partners.adobe.com/public/developer/en/acrobat/sdk/pdf/pdf_creation_a
>> pis_and_specs/pdfmarkReference.pdf#page=47
>> ?Named destinations may be appended to URLs, following a ?#? character, as
>> in http://www.adobe.com/test.pdf#nameddest=name. The Acrobat viewer
>> displays
>> the part of the PDF file specified in the named destination?
>>
>> 4. PDF (non-image based)
>> Page access by default
>>
>>
>> MIDDLE OF THE ROAD: Direct sub-document access only possible with file
>> modification
>> 1.. TXT files with consistent formatting
>> Direct access to consistently formatted levels with file modification
>>
>> 2. XML without ids
>> Direct access to consistently formatted levels with file modification
>> If browsers supported Xpointer within the URL entry, XML and well-formed
>> HTML be in the "best" category
>> (see http://www.w3schools.com/xlink/xpointer_example.asp#)
>>
>> 3. HTML without anchors
>> Direct access to consistently formatted levels with file modification
>>
>>
>> WORST PRACTICES: OCR needed or human readers only
>> 1. PDF (image based only)
>> 2. TIFF
>> 3. Proprietary models where the document cannot be viewed in all browsers.
>>
>>
>>
>>


-- 
Brian Peltola Gryth
715 Logan street
Denver, CO 80203
303-748-5447
twitter.com/briangryth

Received on Thursday, 19 November 2009 19:46:47 UTC