Re: charter and publication wrt W3C Process from Acar, Suzanne on 2009-05-27 (public-egov-ig@w3.org from May 2009)

From: Acar, Suzanne <Suzanne.Acar@ic.fbi.gov>
Date: Wed, 27 May 2009 07:14:52 -0400
To: "'John.Sheridan@nationalarchives.gsi.gov.uk'" <John.Sheridan@nationalarchives.gsi.gov.uk>, "'Owen.Ambur@verizon.net'" <Owen.Ambur@verizon.net>, "'public-egov-ig@w3.org'" <public-egov-ig@w3.org>
Message-ID: <DD0D97590E176A45BFBFB0BDA0369EE45CF146953A@fbi-exvme-11.FBI.GOV>
Excellent.  Many thanks, John.
I'll take a look.

Cheers,
Suzanne


----- Original Message -----
From: Sheridan, John <John.Sheridan@nationalarchives.gsi.gov.uk>
To: Acar, Suzanne; Owen.Ambur@verizon.net <Owen.Ambur@verizon.net>; public-egov-ig@w3.org <public-egov-ig@w3.org>
Sent: Wed May 27 06:56:12 2009
Subject: RE: charter and publication wrt W3C Process

The most recent version of eGMS is here:
http://www.govtalk.gov.uk/schemasstandards/metadata_document.asp?docnum=1017


And UK GEMINI 2 can be found from here:
http://www.gigateway.org.uk/metadata/standards.html


As I understand it, my colleagues at DEFRA (http://defra.gov.uk) are proposing to use GEMINI 2 for describing datasets containing geographic and environmental information, as part of the UK's implementation for INSPIRE, the EU Directive which establishes shared standards between countries in Europe for the interchange of location and environmental information.

There have also been European efforts, for example to standardise metadata about legislation - again as an XML Schema. This is always difficult, given the different legal and constitutional arrangements that exist (common law, civil law, federal systems, parliamentary systems etc.). A linked data approach would help here too - for example, US and UK legislation are closer than, say UK and German legislation, in terms of possible approaches for metadata descriptions.

-----Original Message-----
From: Acar, Suzanne [mailto:Suzanne.Acar@ic.fbi.gov]
Sent: 27 May 2009 10:45
To: Sheridan, John; 'Owen.Ambur@verizon.net'; 'public-egov-ig@w3.org'
Subject: Re: charter and publication wrt W3C Process

John,
I'm inclined to agree with you and am already seeing evidence of some of your points on linked data on various projects.

I'm really intrigued about the metadata standards you mentioned for e-Gov the UK has in place.  Are they visible somewhere that I may learn more?

Many Thanks,
Suzanne

----- Original Message -----
From: public-egov-ig-request@w3.org <public-egov-ig-request@w3.org>
To: Owen Ambur <Owen.Ambur@verizon.net>; eGov IG <public-egov-ig@w3.org>
Sent: Wed May 27 05:41:48 2009
Subject: RE: charter and publication wrt W3C Process

Interesting conversation.

To my knowledge there are many existing standards for describing information resources, from the bibliographic (eg Dublin Core), through to data type specific (eg GEMINI2 for geographic information in the UK), through to government specific (eg we have something called the e-Government Metadata Standard in the UK).

What I am not convinced about is the need for *another* standard on top of those we already have. What is so special about "government information"? We have just about every type of information that everyone else has.

Instead, I see a classic interoperability problem. Different people will want to capture different types of metadata, about different types of information resource. From an e-Gov IG perspective, isn't it better to explain how this information (the metadata) could be made more interoperable? (rather than say it should all conform to the same schema?)

I wrote a think-piece on this topic for an event in Madrid about "Information Asset Registers" that sets our my position more fully.

http://www.epsiplus.net/events/thematic_meetings/information_standards/standards_meeting_3/information_asset_registers_opsi_discussion_paper


Starting from where we are, we should publish human readable descriptions of information assets on the web, in XHTML, and we can publish machine interpretable descriptions at the *same time* using RDFa. Then let RDF take the interoperability strain - it's what it was designed for. It's possible (in fact easy) to build services that aid discovery, for example by harvesting the RDF and exposing it via SPARQL. We don't all need to use the same XML Schema - nor should we, especially when we want to say different things about the information that we hold.

I do not think we need a common XML Schema for describing government information assets; moreover, even if such a thing was desirable, I doubt such a thing is possible or achievable.

OK, so maybe I'm like the guy with a hammer (RDFa), for whom every problem looks like a nail - but thinking about how we can evolve existing catalogue descriptions of information resources, in an interoperable way (in a linked data way), seems to me to be a better strategy than a new XML Schema?


-----Original Message-----
From: public-egov-ig-request@w3.org
[mailto:public-egov-ig-request@w3.org]On Behalf Of Owen Ambur
Sent: 26 May 2009 15:54
To: 'eGov IG'
Subject: RE: charter and publication wrt W3C Process


Joe, if I understand your message correctly, I think what you are suggesting is essentially the same as what I have been encouraging the eGov IG to do, i.e., to propose a standard set of metadata for public information (preferably in an XML schema so that the metadata itself is readily referenceable, indexable, and reusable).

It seems to me that it would be especially good if the IG could propose an XML schema for the description, indexing, discovery, and referencing of standards (technical specifications) of interest to .gov agencies.

Beyond that, as per my exchange with Brand, it would be great if the IG could add value to the specification of version 3 of the Data Reference Model (DRM).

I also agree that it would be great if the eGov IG could take up the Sunlight Foundation's challenge to demonstrate how the data (metadata) provided on the Data.gov site itself can be made more usable.

BTW, in the context of this thread another term having essentially the same meaning as "stovepipe" is "authoritative source."  That is, the original, authoritative sources of the data should be made readily reusable (not a stovepipe, e.g., by making the data available in XML format) and referenceable (e.g., by posting a standard set of metadata in XML format anywhere on the Web, for referencing, indexing, and reuse on sites like Data.gov).

Owen

-----Original Message-----
From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org]
On Behalf Of Joe Carmel
Sent: Tuesday, May 26, 2009 8:13 AM
To: 'Owen Ambur'; 'eGov IG'
Subject: RE: charter and publication wrt W3C Process

Thanks for your strong support Owen.  I think the Internet has operated and continues to operate more or less by example. If data.gov used an easy-to-create and easy-to-understand model, I would hope other government
agencies would follow the example.   If that happened, data.gov could then
just point to the catalog files at the agency websites (leveraging and re-using the appropriate "stovepipes" rather than duplicating them as data.gov is starting to do).  Now that they have something up and running and if they are not 100% committed to the current format, I think data.gov should consider asking the Internet community to reformat the catalog data into machine-readable and friendly formats that also provide human readability.  Maybe, data.gov could post candidate options and have the Internet community "vote" and comment on the options.  They could also establish a cut-off date thus letting the community help them make a decision on format choice in a relatively timely manner.  As a side note, Sunlight has announced a contest, but it's for developers to re-use the data being pointed to by data.gov:
http://blog.sunlightfoundation.com/2009/05/21/announcing-apps-for-america-2-

the-datagov-challenge/

I think this could be more efficient than the establishment of a standard by the W3C.  Maybe the eGovernment IG could suggest this or another idea to data.gov.  I have suggested this to data.gov directly but I think it would certainly have more value if the W3C made the suggestion.  When reading the mission of the W3C eGov IG in the charter, this seems like it would be perfectly aligned with that mission.

"The mission of the eGovernment Interest Group, part of the eGovernment Activity, is to explore how to improve access to government through better use of the Web and achieve better government transparency using open Web standards at any government level (local, state, national and multi-national)."  http://www.w3.org/2008/02/eGov/ig-charter


I'm hoping that an approach like this could help to promote OGD principles while enabling a public dialog about best practices.  Thanks,

Joe

-----Original Message-----
From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org]
On Behalf Of Owen Ambur
Sent: Saturday, May 23, 2009 12:29 PM
To: 'eGov IG'
Subject: RE: charter and publication wrt W3C Process

I strongly support Joe's line of reasoning and would reiterate that for the U.S. federal government the Federal Enterprise Architecture (FEA) Data Reference Model (DRM) was supposed to serve the function that Joe highlights.  http://xml.gov/draft/drm20060105.xsd


Like Joe, I am also very glad that the Data.gov site has been made available.  However, like all of the other so-called "one-stop portals" that have been stood up, it is yet another data stovepipe system in that, as Joe points out, the data (metadata) it provides is not readily shareable/referenceable/reusable and one must know where to look in order to find it.  While those of us who are focusing on .gov data know about it, the average citizen probably will not.

If full-blown implementation of the XML schema for the DRM is deemed to be too much to expect, it would be good, as Joe suggests, if the eGov IG could at least suggest that a smaller, more manageable set of metadata be associated with .gov datasets -- in an open, standard format that is readily shareable/referenceable/reusable not just by Data.gov but also anyone else.
(I understand the Data.gov folks started with the Dublin Core but implemented Data.gov's metadata in a stovepipe fashion.)

BTW, to a large degree, Data.gov duplicates another good site that has been available for a number of years but which also happens to be a data
stovepipe:  http://www.fedstats.gov/


One of the hallmarks of moving out of childhood is being able to understand other points of view, i.e., to put one's self in another person's shoes.  By that measure, .gov agencies are still in early childhood when it comes to citizen-centricity.  It would be good if the eGov IG could help .gov agencies worldwide achieve a marginally higher level of maturity.  (One of the longer-term objectives of the StratML standard is to enable users to submit queries in terms of their *own* goals and objectives, i.e., what they want to *do*, and retrieve exactly what they need to accomplish *their*
objectives.)

Another relevant thought of which this thread reminds me is that the job of a good manager is to eliminate his or her own job, by enabling others to do their jobs without the "leader's" guidance/assistance.  In that respect, hopefully, Data.gov is merely a prototype that will help elevate understanding of the potential for a better future.

Owen

-----Original Message-----
From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org]
On Behalf Of Joe Carmel
Sent: Friday, May 22, 2009 12:14 PM
To: 'Owen Ambur'; 'Daniel Bennett'; 'Jose M. Alonso'
Cc: 'Sharron Rush'; 'eGov IG'
Subject: RE: charter and publication wrt W3C Process

Owen Ambur wrote:

>I also agree that a good topic of focus for the eGov IG would be open
government data (OGD), such as:
>
>a) how agencies can make their data more readily discoverable and
>usable,
and


It seems to me that while standards exist for resource descriptions (e.g., RSS, Atom), these standards are not commonly used to identify and expose open government datasets.  These current standards are either inadequate or perceived to be inadequate...or government agencies are possibly thinking that publishing a catalog of their datasets would not be useful.

The recently published data.gov site seems like a great place to establish best practices in this area since the site's purpose is to point to open government datasets.  I certainly don't want to disparage the incredibly excellent efforts of data.gov, but the page that lists the datasets
(http://www.data.gov/catalog/category/0/agency/0/filter//type#raw) is not valid per http://validator.w3.org/ nor is it well-formed per http://www.cogsci.ed.ac.uk/~richard/xml-check.html  This means that it will be used primarily for human access.  Machine access will be limited to text-based screenscraping -- the practice I think we're hoping to discourage.

Alternatively, it's possible to find open government data by using Google's advanced search capabilities (for example "filetype:xls site:usda.gov" will return Excel files), but this approach provides little or no metadata about the specific files and might even lack official status.  You only really know for certain that the file is on the site, but you can't tell if the data is test data, out-of-date data, or something real.

I think "we" need a common approach (e.g., file format) for dataset cataloging that provides basic information about each dataset on a website.
Often, datasets reside in WAFs (web accessible folders) such as
http://thomas.loc.gov/home/gpoxmlc111 and these often have readme files but how does one discover the existence of these folders in the first place.

Daniel Bennett has proposed the idea of repository schemas which if I understand correctly will address some if not all of these issues.
Regardless of the format(s) used, the need obviously exists.  Even something as simple as: http://www.xml.gov/stratml/urls.xml or http://www.xmldatasets.net/data/index.xml is much better than nothing.
Here's an example of an Atom file pointing to XML datasets for Federal Government StratML files: http://www.xmldatasets.net/data/fedgovt.xml These URLs simply provide examples of different ways to catalog datasets but I think to really make it work, the government should consider establishing two things: (1) a standard file location for their datasets catalog (e.g., catalog.xml or catalog.html off the root) and (2) establish/use a machine accessible (well-formed) approach that allows for extensibility by individual government organizations.

Returning to Jose's point about the role of the W3C eGov and the charter, while the IG can't create a standard or even a recommendation, I would hope we can point out where standards need to be established and the value to be gained from their establishment and use.  Given the diversity of file formats used by governments for the representation of data (e.g., XML, CSV, XLS, PDF, HTML, DBF, etc.), I'm not sure we can gain much by adding another data-format standard to the mix, but there certainly seems to be a vacuum in terms of the cataloging of government datasets or as Owen put it: how agencies can make their data more readily discoverable and usable.

Thanks,

Joe



-----Original Message-----
From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org]
On Behalf Of Daniel Bennett
Sent: Wednesday, May 20, 2009 8:43 AM
To: Jose M. Alonso
Cc: Sharron Rush; eGov IG
Subject: Re: charter and publication wrt W3C Process

I was thinking that having best practices and having use cases was the most obvious things to do. I think that the "small how-to" project of identifying and exposing OGD is actually a huge, but important project that I encompasses citations and indexing documents (hmmm perhaps schematizing repositories). Citations would be a big win that could help transform access and referencing govt. documents.

Another not-so-small project is to allow for a posting of what various governments are using and the standards they are using or breaking.
Legislatures, executive and judicial organizations across the world use different authoring tools that often determine what is published online and how, the success in using standards or being accessible, how the governmental entities index/make searchable/usable the online documents and services, are all datum that we could help be collected. We don't need to even comment on the data collected, just make it reference-able for conversation. And this would help governments find out what software is available, especially if the software was developed internally and could be made available. In the United States alone there are thousands of governments (federal, state, municipal) using different standards and tools with different results, but no place to post and/or search for what they are all doing.

Daniel


Jose M. Alonso wrote:
>> ...
>>>  + a set of small docs with guidance?
>>>   (could be recs or not)
>>
>> I am not sure what these "small docs" would do that would not be
>> included in BP and the rewritten Note, but am open to suggestion. Are
>> you thinking of technical documents that would be more of a how-to?
>> a series of case studies of particularly effective practices?
>
> I was thinking of small how-to like things, e.g. techniques to
> identify and expose OGD, but also identification of scenarios to do
> so. More how-to than case studies.
>
>>  The suite of ARIA documents could be a model, I suppose.
>
> Maybe... I like this how-to piece:
> http://www.w3.org/TR/wai-aria-practices/#accessiblewidget

>
>>  This one requires more consideration and could be decided after
>> being chartered, is that not so?  or do we need to state our entire
>> scope of work at the time of charter?
>
> As specific as possible is always welcome, but we can definitely leave
> some room as we did first time. More on charters:
> http://www.w3.org/2005/10/Process-20051014/groups#WGCharter

>
>
>>>  + a second version of the Note?
>>>   (no need to be a rec, as you know)
>>
>> Yes, the Note must be rewritten for coherence, narrative flow,
>> conclusions, etc.
>
> Heard several saying this. I don't have an opinion yet besides that
> this should be done if there are group members willing to take on this
> task.
>
>
>>> In summary: going normative is "stronger" but has more implications:
>>> patent policy matters, strongest coordination with other groups,
>>> more process-related stuff to deal with...
>>
>> If we are saying that we will produce normative standards and expect
>> eGov practitioners around the world to begin to claim "conformance"
>> to these standards,  that is a mighty undertaking.  Think of the
>> arduous processes around WCAG2 and HTML5.  Also, eGov is a bit less
>> easily defined because of cultural influences, history, forms of
>> government etc.  I would advise that we not commit to normative
>> output at this time, but as previously stated, happy to hear another
>> point of view.
>
> Ok, thanks. I think I'm more of a non-normative opinion so far.
>
>
>> Please let me know if this is the type of input needed and/or if I
>> have overlooked any questions.
>
> Very much so, thanks!
> If you have something more specific in mind about the content we
> should produce, please share it, too.
>
> Cheers,
> Jose.
>
>
>> Thanks,
>> Sharron
>>
>>> [1] http://www.w3.org/Consortium/Process/

>>> [2] http://www.w3.org/2005/10/Process-20051014/groups#GAGeneral

>>> [3] http://www.w3.org/2008/02/eGov/ig-charter

>>> [4] http://www.w3.org/2004/02/05-patentsummary

>>> [5] http://www.w3.org/2005/02/AboutW3CSlides/images/groupProcess.png

>>> [6] http://www.w3.org/2005/10/Process-20051014/tr#Reports

>>> [7] http://www.w3.org/Guide/Charter

>>> [8] http://www.w3.org/TR/mobile-bp/

>>>
>>> --
>>> Jose M. Alonso <josema@w3.org>    W3C/CTIC
>>> eGovernment Lead                  http://www.w3.org/2007/eGov/

>
>











______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email ______________________________________________________________________

Please don't print this e-mail unless you really need to.

---------------------------------------------------------------------------------

National Archives Disclaimer

This email message (and attachments) may contain information that is confidential to The National Archives. If you are not the intended recipient you cannot use, distribute or copy the message or attachments. In such a case, please notify the sender by return email immediately and erase all copies of the message and attachments. Opinions, conclusions and other information in this message and attachments that do not relate to the official business of The National Archives are neither given nor endorsed by it.

------------------------------------------------------------------------------------



This email was received from the INTERNET and scanned by the Government Secure Intranet anti-virus service supplied by Cable&Wireless in partnership with MessageLabs. (CCTM Certificate Number 2007/11/0032.) In case of problems, please call your organisations IT Helpdesk.
Communications via the GSi may be automatically logged, monitored and/or recorded for legal purposes.
Please don't print this e-mail unless you really need to.

---------------------------------------------------------------------------------

National Archives Disclaimer

This email message (and attachments) may contain information that is confidential to The National Archives. If you are not the intended recipient you cannot use, distribute or copy the message
or attachments. In such a case, please notify the sender by return email immediately and erase all copies of the message and attachments. Opinions, conclusions and other information in this message
and attachments that do not relate to the official business of The National Archives are neither given nor endorsed by it.

------------------------------------------------------------------------------------
Received on Wednesday, 27 May 2009 11:20:09 UTC