Re: last call for comments from Jose M. Alonso on 2009-08-26 (public-egov-ig@w3.org from August 2009)

From: Jose M. Alonso <josema@w3.org>
Date: Wed, 26 Aug 2009 14:08:29 +0200
To: Leigh Dodds <leigh.dodds@talis.com>
Cc: Daniel Bennett <daniel@citizencontact.com>, eGovIG <public-egov-ig@w3.org>
Message-Id: <8D9D6E2D-2307-47CE-932B-C0A2644BA58D@w3.org>
Leigh, just a short note to fully support your comments.

Daniel, I'm sorry I was not able to provide any yet (too busy with  
transition) but I'm glad to see Leigh pointed out the great majority  
of what I had in mind and I believe this would enhance the document  
significantly.

Although I agree with comments about "Some easy steps, but only  
starting points" section, I'm not sure if it should be moved towards  
the end. I think I'd leave the re-worked version there but with a  
clearer intro section.

Please, use "example.gov" for URL/URI examples.

Wrt previous comments sent by Owen, remember this is a document that  
will be released by a W3C work, so keep the international context in  
mind, i.e. those could serve as examples, but may be better positioned  
in the Bibliography section and referenced from the body.

Other mentions to RESTful APIs, etc. and comments about those would  
also benefit from pointer we use in the Note that are linked from the  
OGD section -- http://www.w3.org/TR/egov-improving/#OGD -- in fact,  
some of the stuff is already explained there.

Sorry I cannot be of more help this time :(

-- Jose



El 26/08/2009, a las 11:05, Leigh Dodds escribió:
> Hi Daniel,
>
> I've gone through and reviewed the document and have included some
> comments below.
>
> * Initial Paragraph:
>
> This needs to be slightly longer to more clearly set out what the memo
> is actually about.
>
> * Section: "Some easy steps, but only starting points"
>
> I think this section needs to be heavily reworked. I don't think it
> works well as the opening section of the memo as the core message:
> make sure you publish some raw data, in a machine-processable format,
> is obscured by references to a lot of different technologies a number
> of which are either obsolete or rarely used (Gopher, XPointer).
> There's also specific technical advice without reference to further
> reading, or inaccurate, e.g. which accessibility requirements should
> be abided by?
>
> I wonder whether this section might be better placed towards the end
> of the document after the "We're all learning" section. This makes a
> nice, readable progression. E.g: "we're all learning, but here's some
> simple steps that have proved successful so far".
>
> However I do think there needs to be a simple clear message right at
> the start of the document, that publishing raw data, ideally in CSV,
> XML, RDF, or XLS, is the single most important step.
>
> * Section: "Identify"
>
> A suggested revision:
>
> It should be a matter of best practice for publishing open government
> data on the web to apply the technical principles described in
> Architecture of the World Wide Web, Volume 1. The critical
> foundational principle is to identify things using a URI/URL. This
> applies to not just the documents and files that carry the data, but
> also the resources which are referenced or described in that data:
> i.e. the people, places, events, legislation, etc. Permanent, easily
> discoverable URIs, form the basis for creating unique identifiers that
> scale to the web. These stable identifiers can also be used to tie
> together data from different sources, greatly simplifying data
> integration.
>
> Defining simple patterns for creating new URIs makes it easy for
> different groups and departments to create unique, global identifiers.
> For example a URI can be created by appending an existing unique,
> non-web identifier, e.g. derived from a database key, to a common base
> URL. E.g. Data about organization 12345 could be published at
> http://www.example.gov/organizations/12345, whilst data about area
> code A4567 could be published at
> http://www.example.gov/organizations/A4567. Agreeing on simple
> patterns for creating new, unique, and importantly, stable identifiers
> is an important first step in putting data onto the web.
>
> * Section "Document"
>
> Suggested revision:
>
> Without supporting documentation, e.g. to describe the contents of a
> dataset, the published data may be hard to reuse. Publishing some
> minimal documentation with a dataset, e.g. at an associated web page,
> will ensure that re-users can clearly understand what the dataset
> contains. Minimal documentation would include a title, description, a
> publication date, and perhaps some notes on the origins of the data. A
> noted later in this memo, the license for the data should also be
> clearly documented.
>
> If data is published according to either custom or industry standard
> schemas, then also include links and references to the relevant
> standards so that developers can find additional supporting
> documentation and tools.
>
> Building a browsable and/or searchable directory of data is also a
> useful way of allowing people to find the range of datasets that are
> available.
>
> * Section "Link"
>
> This is the first section that refers to "linked data", listing the
> four main principles. I think a bit more context is required here:
> initially the memo talks about at least publishing raw data using CSV,
> XML, etc. It seems a leap to then jump to Linked Data. Perhaps in this
> section of the memo, which is mainly about basic best practices and
> principles, it would be enough to say that it is important to include
> links both in the supporting data and, where supported (e.g. if using
> RDF), within the data itself. If the "easy steps" section is moved to
> the end, then this could introduce Linked Data as a natural step
> beyond publishing raw data, with perhaps a recap of the princples
> pointing out how Linked Data fulfills all of them?
>
> * Section "Preserve"
>
> There's a hanging sentence in this section.
>
> Issues to consider should be:
>
> * preservation of URIs/URLs to ensure stability of linking to datasets
> and data items
> * versioning of datasets, so that people can cite and link to both new
> and past versions. Logical links, e.g. "/latest" are also worth
> considering for downloadable datasets
> * formats: XML, RDF, etc are arguably better for preservation than  
> e.g. Excel
> * supporting documentation that describes how a dataset may have
> evolved, e.g. have terms or method of collection changed?
>
> * Section "Expose Interfaces"
>
> I think this section should stress that:
>
> * as an initial step, raw machine-readable data and interfaces are
> the crucial first goal; the community can create new and interesting
> interfaces
> * offering both human and machine-readable interfaces should be a
> best practice, enabling browsing and discovery for all audiences, but
> don't focus on creating flashy visualisations if they detract from
> delivering on the first goal
> * Using the principles of Linked Data and RDF, there's no need for a
> separate API as the website is the API.
> * A SPARQL endpoint adds greater utility to RDF datasets
> * Where a API is going to be created, e.g. to publish data as XML,
> then avoid using standards like SOAP and concentrate on using simple
> RESTful patterns -- with references to relevant resources
>
> * Section "Choosing what to publish as data on the Web"
>
> I think this section should come at the head of the document, then the
> progression is: here's what you should be thinking about publishing;
> here are some principle and issues to consider; and some steps to
> achieve it.
>
> The guidance ought to be to open up any non-personal data that the
> government currently collects and maintains on behalf of its citizens,
> with an emphasis on data for legislation, national statistics, and key
> entities like registered companies, locations, administrative
> boundaries, etc. I think part of the goal is to not only unlock data
> that governments should be making easily available, its to also create
> an infrastructure that lets *others* begin to tie their data into an
> authoritative URI space managed by the government and/or its
> departments. So, e.g. having a unique identifier for every registered
> company or school is as important as having information about those
> resources.
>
> The references to schemas and documentation could probably be included
> in the "Document" section.
>
> * Section "Social Issues"
>
> Suggest this is renamed to "Licensing" and is expanded to stress the
> important for clear licensing of data, using an open,
> non-transactional model. Ideally public domain licenses like CC0,
> PDDL, ODbL should be used or customized to achieve this.
>
> --
>
> I hope those comments are useful. I'm more than happy to help
> contribute further to editing of the document.
>
> Cheers,
>
> L.
>
> -- 
> Leigh Dodds
> Programme Manager, Talis Platform
> Talis
> leigh.dodds@talis.com
> http://www.talis.com
>
Received on Wednesday, 26 August 2009 12:09:11 UTC