W3C home > Mailing lists > Public > public-egov-ig@w3.org > April 2009

Re: PDF's usefulness to the semantic web

From: Jose M. Alonso <josema@w3.org>
Date: Fri, 24 Apr 2009 17:57:45 +0200
Cc: Dave McAllister <dmcallis@adobe.com>, Bobby Caudill <rcaudill@adobe.com>, Owen Ambur <Owen.Ambur@verizon.net>, eGovernment Interest Group WG <public-egov-ig@w3.org>, Christopher Testa <ctesta@ushmm.org>, Miguel Ángel Amutio <miguel.amutio@map.es>, John Sheridan <John.Sheridan@nationalarchives.gov.uk>, Daniel Bennett <daniel@citizencontact.com>
Message-Id: <EA53210D-6083-44EA-BC98-FBAA5F966C5E@w3.org>
To: <MCrompton@iispartners.com>
El 21/04/2009, a las 22:19, Malcolm Crompton escribió:
> I am an amateur in this game, but there is one consideration that  
> needs to be emphasised in this extremely useful discussion:  does  
> the end user (citizens of country X) or government agency involved  
> care to this level of depth what the standards source is?
>
> Almost certainly no.
>
> So my short contribution is that the answer to this fascinating  
> debate is that it should be answered from a user perspective.  Will  
> they see pdf files & forms & other widely used formats, including  
> OOXML, in an eGov environment?  Certainly.  Should we provide  
> guidance on that to eGov service providers in this document?  My own  
> view is yes because our document won’t be as useful to the wider  
> world as it otherwise might be.

As said in my previous message, I think this is the use case I was  
looking for. Thanks.


>  For example, we are in a debate in Australia right now on the use  
> of pdf smartforms in a government deployment.  One of the big issues  
> is ‘is it safe to play?’ from a citizen perspective, ie who is able  
> to see the data I have just entered in the form?  AND the answer  
> turns out to be less than simple:  an active strategy is needed  
> before the answer can be an ‘almost yes’.  I suspect that guidance  
> on this (both the need to consider such issues & how to do so) would  
> be appreciated by our audience.

Thanks for the suggestion. Unfortunately, I think it will be difficult  
to develop this sort guidance on time to be included in this document  
but it's something we could discuss when planning years 2-3. Not sure  
what other think, though.

Cheers,
Josema.


> Malcolm Crompton
>
> From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org 
> ] On Behalf Of Dave McAllister
> Sent: Wednesday, April 22, 2009 2:45 AM
> To: Jose M. Alonso; Bobby Caudill
> Cc: Owen Ambur; eGovernment Interest Group WG; Christopher Testa;  
> Miguel Ángel Amutio; John Sheridan
> Subject: Re: PDF's usefulness to the semantic web
>
> Well, let me express a few opinions here.
>
> The concept of data on and to the web is a broad topic in its own  
> right.  Does that “data” include audio?  Video?  How about existing  
> documents that should be readily available.
>
> Data (on and to the web) seems to fall into rough groupings (in a  
> non technical way) of static (not accessible to be changed by the  
> consumer, e.g documents, video, audio, DigSig), dynamic (able to be  
> changed, including forms) and conversational (active feedback and  
> exchange, both sync and asynch).
>
> W3C does not cover the myriad of data formats that have proliferated  
> on the web since the original days of hypertext (emphasis on text).
>
> Technically, you are correct in that the data formats of PDF are not  
> “web” standards.  However, the use cases for such technologies in  
> implementation is that they appear in a browser or web environment  
> and as such should be considered in best practices. Forms (and there  
> are dozens of form standards) are also separable from web standards;  
> should we equally ignore them?   Ask the average consumer whose  
> browser opens a PDF file within the browser confines whether it is a  
> Web standard; I’d postulate that they neither know nor care.
>
> My issue is not to identify the specific SDOs at this point, though  
> I believe that a widespread mission and charter should include such  
> recognition. W3C has an interesting set of standards, but how about  
> IETF, OASIS, DigSig in ETSI?   Where do those fit into this.
>
> For instance, referencing point 4, while PDFs may be ISO, they are  
> controlled under AIIM.  I think we do have some level of connection  
> to AIIM, so should we include them because of that?
>
> The short point is that the document should at least recognize the  
> existence and use of standards for data, even though they are not  
> W3C standards. Given the attention that the current US  
> administration may have placed on this document in implying that it  
> will be the vehicle for industry input, it behooves us to make sure  
> what we produce does not set up unfair trade practices in its own  
> right. (ref: OSTP, New Media Office)
>
> As such, the “industry input” that I have to consider for Adobe is  
> that negative references to existing standards should then be  
> equally removed.
>
> davemc
>
> On 4/21/09 1:19 AM, "Jose M. Alonso" <josema@w3.org> wrote:
> Dave, Bobby,
>
> I think we are talking about ISSUE-18 again here, what standards
> besides W3C's should be added to the document. Some comments about  
> this.
>
> I'm sure we all agree on the usefulness and heavy use in government of
> several standards beyond W3C's. Said that, here's my rational _not_ to
> include those you are referring to (and others).
>
> 1) This is W3C and the document is titled? "... of the Web" and I
> prefer to stick to Web standards for now. For me, e.g. PDF/A and OOXML
> are not Web standards but something you can link to from the Web, as
> you can link to a ZIP file. Not an expert in the field and welcome any
> additional info but quite sure we could discuss for hours and hours
> what is Web and what is not and we would probably have as many
> versions as group participants.
>
> 2) I would like to see the Web-related use cases to add this or any
> other technology and not adding them just for growing the list of
> standards referenced in the document.
>
> 3) Why add only ISO ones, why not IETF ones or others'?
> As an example, I recently learned about the study for the catalogue of
> standards usable by governments here in Spain, and heard Miguel Amutio
> speak about it saying 400 coming from dozens of bodies were analyzed.
> Not to mention other similar initiatives such as the U.S. TRM.
>
> 4) We don't have a liaison with ISO and I would prefer this Group not
> to make interpretations on the use of standards developed by other
> organizations without discussing with them how they fit in our work.
>
>
> Scoping the Group's work was a difficult challenge and I don't think
> that broadening the scope now that the charter is about expire makes
> sense.
>
> I think we should disregard this for now but discuss it when
> developing the 2nd charter, see if we should work on a broader suite
> of standards, setup a liaison with more SDOs, etc.
>
> One more comment, you mention:
> > On reading the document several times, it seems uncleaqr if we are
> > focused on transient data and remunging such, on archival and
> > temporal validation of such.  I think this came through in some of
> > the discussions on socila media in the last telecon.
>
> I wish we had some text in the "Long Term" section but we don't yet.
> There were former IG Members tasked to provide use cases on the
> differences you mention. I remember we were going to get a use case on
> "Temporal Data" but unfortunately that didn't happen.
>
> Sticking to the Web standards part above, I think that section was
> intended to talk about "Web Archiving" and maybe the closest view is
> that of the draft use case John submitted a while ago -- http://www.w3.org/2007/eGov/IG/wiki/Use_Case_10_-_Persistent_URIs
>
>
> -- Jose
>
>
>
> El 09/04/2009, a las 21:41, Dave McAllister escribió:
> > Actually, I think it important that we do recognize and extend to
> > relevant standards. PDF (as in ISO 32000) is such, as are PDF/A,  
> PDF/
> > E, PDF/X.  There are also best practices based on such , e.g.  PDF/
> > Healthcare. PDF/UA is approaching such status as well.
> >
> > It is interesting to note that right now neither Mars nor XPS are
> > formal standards, though I suspect XPS will be approved in Ecma
> > shortly (as was OOXML for th starting point of that most painful
> > standard process.
> >
> > On reading the document several times, it seems uncleaqr if we are
> > focused on transient data and remunging such, on archival and
> > temporal validation of such.  I think this came through in some of
> > the discussions on socila media in the last telecon.
> >
> > Reworking the world from PDF (to which there are numous independent
> > implementations)seems counter intuitive in this best practices  
> style.
> >
> > davemc
> >
> > On 4/9/09 11:47 AM, "Owen Ambur" <Owen.Ambur@verizon.net> wrote:
> >
> > If PDF is expressly referenced, so too should Adobe’s Mars Project
> > -- http://labs.adobe.com/technologies/mars/ -- as well as XFDL --http://en.wikipedia.org/wiki/Extensible_Forms_Description_Language
> >  -- and XPS: http://en.wikipedia.org/wiki/XML_Paper_Specification
> >
> > Owen Ambur
> > Co-Chair Emeritus, xmlCoP <http://xml.gov/index.asp>
> > Co-Chair, AIIM StratML Committee <http://xml.gov/stratml/index.htm>
> > Member, AIIM iECM Committee <http://www.aiim.org/Standards/article.aspx?ID=29284
> > >
> > Invited Expert, W3C eGov IG <http://www.w3.org/2007/eGov/IG/>
> > Communications/Membership Director, FIRM Board <http://firmcouncil.org/id5.html
> > >
> > Former Project Manager, ET.gov <http://et.gov/>
> > Brief Bio <http://ambur.net/bio.htm>
> >
> >
> >
> >
> > From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org
> > ] On Behalf Of Bobby Caudill
> > Sent: Wednesday, April 08, 2009 10:11 AM
> > To: public-egov-ig@w3.org
> > Subject: PDF's usefulness to the semantic web
> >
> >
> > Calling out PDF specifically here should be reconsidered.
> >
> > >From a semantic web perspective, PDF is more useful than many other
> > formats, including graphics, imagery, audio and video, all of which
> > are very useful formats for government to consider when becoming
> > transparent. Given that documents are machine readable as well as
> > human readable, technologies do exist today that are capable of
> > extracting an ontology, making the information more useful to the
> > semantic web.
> >
> > In addition, there simply are times when a secure container is
> > required for publishing information. While typical internet
> > technologies, such as outlined above, are very good for sharing and
> > transparency, they are not necessarily always appropriate for
> > information types that require assurances of authenticity, privacy,
> > authoritativeness, etc.
> >
> > Further, is the requirement to archive PSI. Again, with
> > consideration that many government processes are document based,  
> PDF/
> > a (ISO 19005-1:2005) provides a standards based approach to ensuring
> > the long term preservation of government information. PDF/a based
> > documents are both machine readable, making them searchable,
> > discoverable and available to the same technologies as an ISO 3200
> > PDF to extract ontologies. Likewise, the standard’s based nature of
> > PDF/a ensures the ability to allow human access to the documents
> > into the future.
> >
> > I am concerned that this paper is limiting it's focus and not taking
> > into consideration the wider view of government processes, many of
> > which depend upon more traditional document formats for legitimate
> > business reasons.
> >
> > Thank you for the consideration.
> >
> > Bobby Caudill
> >
> >
> >
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Bobby Caudill
> > Solution Architect, Global Government Solutions
> > Adobe Systems Incorporated
> > 8201 Greensboro Dr., # 1000
> > McLean, VA 22102
> > 703.883.2872 - Office
> > 703.855.9945 – Mobile
> > @BobbyCaudill – Twitter
> > Bobby Caudill – Facebook
> > www.governmentbits.com - Blog
> > rcaudill@adobe.com
> >
> >
> > --
> > Dave McAllister
> > Director, Standards and Open Source
> > 650-523-4942 (GC)
> > 408-536-3881 (Office)
> > Dwmcallister (Skype, Aim, YIM)
> > http://blogs.adobe.com/open
>
>
> -- 
> Dave McAllister
> Director, Standards and Open Source
> 650-523-4942 (GC)
> 408-536-3881 (Office)
> Dwmcallister (Skype, Aim, YIM)
> http://blogs.adobe.com/open
Received on Friday, 24 April 2009 15:58:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 24 April 2009 15:58:42 GMT