W3C home > Mailing lists > Public > public-egov-ig@w3.org > February 2010

RE: Ed and Outreadch Opportunity

From: <Niemann.Brand@epamail.epa.gov>
Date: Mon, 1 Feb 2010 09:02:57 -0500
To: rachel.flagg@gsa.gov
Cc: Owen.Ambur@verizon.net, "'eGovIG IG'" <public-egov-ig@w3.org>, public-egov-ig-request@w3.org, niemann.brand@epa.gov
Message-ID: <OFF5B7F431.110620DD-ON852576BD.004AC3F7-852576BD.004D2D26@epamail.epa.gov>
Rachel and all, I would suggest that agencies provide their most
important documents in Wiki format along with PDFs so the public can
more easily access and comment on them. I would also suggest that
agencies integrate their OGD deliverables for the same reason - see
http://www.slideshare.net/guest8c518a8/design-suggestions-for-epas-one-wiki-in-support-of-the-epa-ogd-work-group


Brand


                                                                                                                                  
  From:       rachel.flagg@gsa.gov                                                                                                
                                                                                                                                  
  To:         Owen.Ambur@verizon.net                                                                                              
                                                                                                                                  
  Cc:         "'eGovIG IG'" <public-egov-ig@w3.org>, public-egov-ig-request@w3.org                                                
                                                                                                                                  
  Date:       01/31/2010 09:51 PM                                                                                                 
                                                                                                                                  
  Subject:    RE: Ed and Outreadch Opportunity                                                                                    
                                                                                                                                  






+1 to Owen's statement in a previous post".. let me assure you that I am
going to be one ticked off taxpayer if .gov agencies continue to insist
upon flaunting style over substance in publishing their strategic and
performance plans (including their open gov plans)."

+1 to Brian's comment below that, if there are better ways to create
PDFs, then we need to tell people.

So in the interest of transparent, participatory and collaborative
government, my question to the group is this....

If you were in charge of publishing government agency strategic/OpenGovt
plans... how would you do it?

Keep these points in mind:
 - for some agencies, old habits die hard and there will probably be a
push to publish at least some of these plans as glossy PDFs with pretty
pictures........ so we need to make sure that content creators are
creating these PDFs correctly
-  the solution must be explainable in non-techie language, to help
agency web managers convince their bosses of the "right" way to do this,
so plans are accessible (in all ways) to the public

HOW can we do it better?
Is there ONE place that offers simple, step-by-step guidance for
creating machine-readable PDFs, that we point out to agencies and tell
them to follow that model?

I think we all agree that context, style and substance are all important
- so how can we combine all those into one end product that meets all
those needs?

Government agencies are trying really hard to get this right - what
tools can you recommend to help agencies deliver?

Thanks!
-Rachel

-------------------------------
Rachel Flagg
Web Content Manager
 and Co-Chair, Federal Web Managers Council
Government Web Best Practices Team
Office of Citizen Services
U.S. General Services Administration
rachel.flagg@gsa.gov
www.webcontent.gov - Better websites. Better government.




                                                                        
                                                                        
 "Owen Ambur"                                                           
 <Owen.Ambur@verizon.net>                                               
 Sent by:                                                            To 
 public-egov-ig-request@w3.org                   "'eGovIG IG'"          
                                                 <public-egov-ig@w3.org 
                                                 >                      
 01/29/2010 02:22 PM                                                 cc 
                                                                        
                                                                Subject 
                                                 RE: Ed and Outreadch   
                                                 Opportunity            
                                                                        
                                                                        
                                                                        
                                                                        
                                                                        
                                                                        
                                                                        





Brian, with reference to my separate message and the text of your draft
cited by Dave below, I would also point out that:

a)      HTML is a presentation format and, thus, is about style rather
than substance (meaning), and
b)      RDF may be “serialized” in XML:
http://en.wikipedia.org/wiki/Resource_Description_Framework#Serialization_formats



Besides XFDL, MS’s XML Paper Specification (XPS) is another XML
vocabulary dealing with style.
http://en.wikipedia.org/wiki/XML_Paper_Specification


Adobe’s Mars Project is described as “an XML-friendly representation of
PDF documents”:  http://labs.adobe.com/technologies/mars/


Owen

From: public-egov-ig-request@w3.org [
mailto:public-egov-ig-request@w3.org] On Behalf Of Brian Gryth
Sent: Friday, January 29, 2010 4:51 PM
To: Dave McAllister; Owen Ambur
Cc: eGovIG IG
Subject: Re: Ed and Outreadch Opportunity

Dave,

I apologize for the error and it has been corrected.

+ 1 to Owen's statements.  That is why I would suggest that we need to
focus on educating people on the best approach to creating PDFs.  If a
PDF can be created with the necessary raw data, metadata, or what have
you that makes the document more machine readable than we need to tell
people.





On Fri, Jan 29, 2010 at 2:14 PM, Dave McAllister <dmcallis@adobe.com>
wrote:

Just for completeness (and since the group has heard this before.

One objection...

In this sentence, you lump a standard, PDF with two
implementations/products.

The W3C, the Sunlight Foundation, and other open government advocates
recommend that government's should use open standards based
technologies, such as HTML, XML, or RDF, rather than proprietary
formats, such as PDF, Microsoft Word or Excel, when publishing data.

PDF is not proprietary, it is an open International standard, ISO 32000,
under TC171.

Adobe products such as Acrobat and Acrobat Reader are proprietary... And
yes, if you choose to state Acrobat here, then I’ll live with it. But I
worked really hard to separate PDF from Adobe specification to ISO
standard.

Thanks for the insight into the letter.

davemc


On 1/29/10 1:10 PM, "Brian Gryth" <briangryth@gmail.com> wrote:
Hello all,

Thanks for the good discussion.  It has been helpful.  I have created a
Google Doc to capture my thoughts.  It is a draft letter that I plan to
send to member of the Colorado General Assembly concerning the school
finance bill I identified.  The doc is viewable at
https://docs.google.com/Doc?docid=0Aev3E7WkLorMZGhkcGhkYjlfOXpudzNkNWZ0&hl=en

 (please let me know if you would like access to edit the doc.)

As to this discussion, I think that it can best be described as the PDF+
approach.  As Joe has frequently and correctly pointed out, PDF use is
persistent and this will not change.  (Adobe has been very effective in
making their product ubiquitous.)  Replacing PDF is going to be
extremely difficult, if not impossible.  Therefore, we need to education
the government community on the best practices for creating PDF
documents or the best approach to augment PDF publication.

Again thank you for the information and please continue the discussion
or help revise and improve the document I linked to above.

Thanks,
Brian

On Fri, Jan 29, 2010 at 8:29 AM, Joe Carmel <joe.carmel@comcast.net>
wrote:
David,

PDF is probably the most flexible human-readable electronic format we
humans have invented and provides one of the richest possible electronic
formats ever devised in terms of capabilities (text, graphics, color,
image, audio, video, forms, printability, digital signatures, metadata,
file attachments, and archiving).  With no disrespect, it seems like the
problem for many is that PDF is not readable and consumable with a text
editor.  While this is true, there are several public domain and
commercial tools that provide developers with access to PDF file
contents (even converting page contents to XML).  Given these
overwhelming benefits and the substantial use of the format on the
human-side of the web, it’s very unlikely that PDF is going away.  Even
if everyone stopped using it, there would still be over 26 million PDF
files (per Google) on the web from the .gov sites alone.  Since the PDF
format allows metadata inclusion and file attachments, I think getting
the word out about how these and other features add interoperability to
PDF should encourage practices that lead to combining human and machine
readability for all electronically published information.

HTM  30,800,000 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Ahtm&aq=f&aqi=&oq=
HTML27,700,000 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Ahtml&aq=f&aqi=&oq=
PDF    26,100,000
http://www.google.com/search?hl=en&source=hp&q=site%3A.gov

+filetype%3Apdf&aq=f&aqi=&oq=
ASP    13,100,000 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Aasp&aq=f&aqi=&oq=
TXT     2,980,000 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Atxt&aq=f&aqi=&oq=
DOC    2,310,000 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Adoc&aq=f&aqi=&oq=
XLS     1,880,000 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Axls&aq=f&aqi=&oq=
XML    1,010,000 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Axml&aq=f&aqi=&oq=
RDF             3,240 http://www.google.com/search?hl=en&q=site%3A.gov

+filetype%3Ardf&aq=f&aqi=&oq=

Also, see http://legislink.wikispaces.com/message/view/home/14870950 for
more tech info.

Joe



From: David Pullinger [mailto:David.Pullinger@coi.gsi.gov.uk]
Sent: Friday, January 29, 2010 9:27 AM
To: chris-beer@grapevine.net.au
Cc: Kevin' 'Novak; Joe Carmel; 'Brian Gryth'; 'eGovIG IG'
Subject: Re: Ed and Outreadch Opportunity



Chris,



Let me assure you that I'm not in favour of PDF for data or
communication, the critical words were ...'those who insist on..'   Let
me draw a comparison.  The government is not in favour of people taking
drugs.  But we provide information to help those who do.  Our friends at
Adobe should not draw the analogy too far as I just mean that sometimes
we engage in harm reduction - in this case to get at good re-usable
data.



David











David Pullinger

david.pullinger@coi.gsi.gov.uk

Head of Digital Policy

Central Office of Information

Hercules House

7 Hercules Road

London SE1 7DU

020 7261 8513

07788 872321


Twitter #digigov and blogs:  www.coi.gov.uk/blogs/digigov <
http://www.coi.gov.uk/blogs/digigov>




>>> Chris Beer <chris-beer@grapevine.net.au> 28/01/2010 12:05 >>>
Hey Brian, everyone

Wouldn't be right if I didn't pop the TF4 hat on and respond into the
conversation ;) I already sent Brian an email offering to assist, but
since we're doing this in list... :)

Personally and professionally,  I have issues with "data", if not any
government information, being published in PDF formats as well as how
PDF files are used in general, not only by Gov, but by the Private
sector as well.

IMO The only three reasons (and only if you had to) to use PDF is a) as
an archive snapshot of a document and b) for document control - that is
- when you don't want a document to be altered by users such as in the
case of a manifestation or publication of a piece of legislation,
tenders etc - hence why you can embed digital signatures, lock them from
editing, etc etc. and c) With accessible Smart Forms, which are actually
just such a cool idea and so very useful as an assistive technology, and
for both the user, and the owner - that said these all still have issues
around being in PDF.

The general usage, however, seems to be for anything and everything that
can be published. Want a printable version? Download the PDF file.

Rather than focus on the pitfalls of using PDF's in the .gov.* space
(which I'm more than happy to discuss with anyone - especially David in
light of his comments ;) ), I'll focus on the topic at hand. I've had a
look at the Fiscal Note Brian provided as well as the proposed Act and
I'm a little stunned by the leap of logic in this sense.

A careful reading of the Bill reveals that throughout, information is
required to be "posted on-line, in a downloadable format". Now if I was
a clever Web Manager in charge of implementing my local schools
requirements under this bill, I could quickly and easily meet these
requirements through a CMS enabled website/database - the act of viewing
a webpage is, by definition, downloading information. Not only that, but
I could point at my model and highlight the fact that:

a) The data supports RDF(a), XML, StratML etc in a far more useful and
usable format than a PDF version

b) I can send my schemas to other schools, or even the Department (who
might want to create a centralised model) to enable consistancy of data
formatting, not just a pretty view of the data

c) I can deliver my data in a range of open standard formats, from such
as binary, CSV, HTML, XML, etc using very basic, free, vendor
independant and accessible technologies

d) I can export a customisable view of this data on demand as a PDF file
if needed... (think the export as PDF function of Google Analytics
dashboard reports.) But I can also export it in a variety of other
propriety formats on demand.

e) I can very easily track the usage and access of this data by the
public through web analytics. If I track it well enough, and agressively
enough, I can start to analyse which parts of the data are the most
useful (for instance I might well find that visits from .edu domains
(ie: teachers) show a marked interest in salary schedule comparisons)
and I can tailor the solution from a push Web 1.0 model to a information
on demand Web 2.0 model.)

f) I can allow others, including other arms of Local, State and Federal
Governments, through API's and mashups, to mix my data with other data
to provide interesting information - like financial data mapped against
student result averages.

A couple of other things to consider with the financial and workload
aspects in mind, is that technically (and correct me if I am wrong) each
and every PDF release of this data would be classed as a government
publication and will require not only ISBN numbers etc, but entry into
the Library of Congress or State equivalent, catalogues as well. A
single website, being considered as an Intergrated Resource, technically
would require only a single catalogue entry...

The Fiscal Note also reads "It is assumed that financial documents can
be electronically converted into a portable document format (PDF) or
image file (tiff, gif, jpg), and posted online at minimal cost, and that
software to convert documents and software to modify websites is readily
available at the district level."

Now thats an interesting assumption - and it is just that - an
assumption. Considering publishing the information as HTML etc is
effectively free.

These are only some initial thoughts, but you get the idea. Happy to
discuss.

David - would love to discuss your thoughts around the standards and
governance on PDF, but it'd probably off topic in this thread. Drop me a
line and expand on things :)

Cheers

Chris




David Pullinger wrote:

Both,



As well as separate data files, it is perfectedly possible to embed RDF
(a) into PDF files, as other markup, and so provide access to Linked
Data thereby...



We're considering whether or not to issue standards in this area so that
those who insist on releasing information in PDF files nevertheless
don't put a block on Linked Data.



David











David Pullinger

david.pullinger@coi.gsi.gov.uk

Head of Digital Policy

Central Office of Information

Hercules House

7 Hercules Road

London SE1 7DU

020 7261 8513

07788 872321


Twitter #digigov and blogs:  www.coi.gov.uk/blogs/digigov <
http://www.coi.gov.uk/blogs/digigov>




>>> "Joe Carmel" <joe.carmel@comcast.net> <mailto:joe.carmel@comcast.net
>  26/01/2010 18:56 >>>
Brian,
One option to consider might be XForms (and XSLTForms in particular).
Although I’m not familiar with the school district financial data, it
seems like publishing an XForm on a central website and mandating that
school districts fill it out would be easy to create, maintain, and
implement.  The output files could then be posted centrally and/or
locally.
I’m working with Owen Ambur and several others on something like this
for StratML.  Check out http://www.xmldatasets.net/XF2/stratmlxform3.xml

.   It’s still being developed but it might serve as an example.  The
idea is to provide a way to create, import, update, display, and finally
catalog StratML fles across the web.
Joe

From: public-egov-ig-request@w3.org [
mailto:public-egov-ig-request@w3.org] On Behalf Of Novak, Kevin
Sent: Tuesday, January 26, 2010 12:13 PM
To: Brian Gryth; eGovIG IG
Subject: RE: Ed and Outreadch Opportunity
Brian,
I am here to help you.
I can provide input and opinion on the piece you are developing. I
concur with your assessment of PDF. Other options in addition must be
considered.
Kevin
From: public-egov-ig-request@w3.org [
mailto:public-egov-ig-request@w3.org] On Behalf Of Brian Gryth
Sent: Tuesday, January 26, 2010 12:08 PM
To: eGovIG IG
Subject: Ed and Outreadch Opportunity

Good day all,

Members of the Colorado General Assembly introduced legislation recently
that would mandate school districts to publish certain financial data in
a down loadable format.  The bill is HB10-1036 and is available at
http://legislink.org/us-co?HB10-1036.  This is a good thing on the
surface.  What concerns me is the fiscal impact statement associated
with the legislation.  The concerning part of the fiscal impact
statement focuses on the information being released in PDF or in an
image format (e.g. JPEG, TIFF, GIF), but does not talk about other
formats.  The fiscal note is available at http://bit.ly/80RBiu.  As has
been discussed by this group and in other places, PDF only publication
is not the best method of publishing government data.

Therefore, I saw this as a perfect opportunity for some education and
outreach.  I am planning on putting some summarized information together
that will discuss data publication methods to sent to the bill sponsors
and other members of the Colorado legislature.  I also plan on speaking
at the Senate hearing for the bill as a concerned citizen.

I would appreciate the assistance of anyone wishing to help me out.
Please feel free to e-mail me and I will share a Google Doc I will be
using to draft the materials.

Thanks
Brian

This communication is confidential and copyright.
Anyone coming into unauthorised possession of it should disregard its
content and erase it from their records.

The original of this email was scanned for viruses by Government Secure
Intranet (GSi) virus scanning service supplied exclusively by Cable &
Wireless in partnership with MessageLabs.
On leaving the GSI this email was certified virus free.
The MessageLabs Anti Virus Service is the first managed service to
achieve the CSIA Claims Tested Mark (CCTM Certificate Number
2006/04/0007), the UK Government quality mark initiative for information
security products and services. For more information about this please
visit www.cctmark.gov.uk <http://www.cctmark.gov.uk/>


This communication is confidential and copyright.
Anyone coming into unauthorised possession of it should disregard its
content and erase it from their records.

The original of this email was scanned for viruses by Government Secure
Intranet (GSi) virus scanning service supplied exclusively by Cable &
Wireless in partnership with MessageLabs.
On leaving the GSI this email was certified virus free.
The MessageLabs Anti Virus Service is the first managed service to
achieve the CSIA Claims Tested Mark (CCTM Certificate Number
2006/04/0007), the UK Government quality mark initiative for information
security products and services. For more information about this please
visit www.cctmark.gov.uk <http://www.cctmark.gov.uk/>


Received on Monday, 1 February 2010 14:03:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 1 February 2010 14:03:35 GMT