W3C home > Mailing lists > Public > public-egov-ig@w3.org > April 2009

Re: PDF's usefulness to the semantic web

From: Dave McAllister <dmcallis@adobe.com>
Date: Thu, 9 Apr 2009 12:41:06 -0700
To: Owen Ambur <Owen.Ambur@verizon.net>, "public-egov-ig@w3.org" <public-egov-ig@w3.org>
Message-ID: <C6039D62.10982%dmcallis@adobe.com>
Actually, I think it important that we do recognize and extend to relevant standards. PDF (as in ISO 32000) is such, as are PDF/A, PDF/E, PDF/X.  There are also best practices based on such , e.g.  PDF/Healthcare. PDF/UA is approaching such status as well.

It is interesting to note that right now neither Mars nor XPS are  formal standards, though I suspect XPS will be approved in Ecma shortly (as was OOXML for th starting point of that most painful standard process.

On reading the document several times, it seems uncleaqr if we are focused on transient data and remunging such, on archival and temporal validation of such.  I think this came through in some of the discussions on socila media in the last telecon.

Reworking the world from PDF (to which there are numous independent implementations)seems counter intuitive in this best practices style.


On 4/9/09 11:47 AM, "Owen Ambur" <Owen.Ambur@verizon.net> wrote:

If PDF is expressly referenced, so too should Adobe's Mars Project -- http://labs.adobe.com/technologies/mars/ -- as well as XFDL -- http://en.wikipedia.org/wiki/Extensible_Forms_Description_Language -- and XPS: http://en.wikipedia.org/wiki/XML_Paper_Specification

Owen Ambur
Co-Chair Emeritus, xmlCoP <http://xml.gov/index.asp>
Co-Chair, AIIM StratML Committee <http://xml.gov/stratml/index.htm>
Member, AIIM iECM Committee <http://www.aiim.org/Standards/article.aspx?ID=29284>
Invited Expert, W3C eGov IG <http://www.w3.org/2007/eGov/IG/>
Communications/Membership Director, FIRM Board <http://firmcouncil.org/id5.html>
Former Project Manager, ET.gov <http://et.gov/>
Brief Bio <http://ambur.net/bio.htm>

From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org] On Behalf Of Bobby Caudill
Sent: Wednesday, April 08, 2009 10:11 AM
To: public-egov-ig@w3.org
Subject: PDF's usefulness to the semantic web

Calling out PDF specifically here should be reconsidered.

>From a semantic web perspective, PDF is more useful than many other formats, including graphics, imagery, audio and video, all of which are very useful formats for government to consider when becoming transparent. Given that documents are machine readable as well as human readable, technologies do exist today that are capable of extracting an ontology, making the information more useful to the semantic web.

In addition, there simply are times when a secure container is required for publishing information. While typical internet technologies, such as outlined above, are very good for sharing and transparency, they are not necessarily always appropriate for information types that require assurances of authenticity, privacy, authoritativeness, etc.

Further, is the requirement to archive PSI. Again, with consideration that many government processes are document based, PDF/a (ISO 19005-1:2005) provides a standards based approach to ensuring the long term preservation of government information. PDF/a based documents are both machine readable, making them searchable, discoverable and available to the same technologies as an ISO 3200 PDF to extract ontologies. Likewise, the standard's based nature of PDF/a ensures the ability to allow human access to the documents into the future.

I am concerned that this paper is limiting it's focus and not taking into consideration the wider view of government processes, many of which depend upon more traditional document formats for legitimate business reasons.

Thank you for the consideration.

Bobby Caudill

Bobby Caudill
Solution Architect, Global Government Solutions
Adobe Systems Incorporated
8201 Greensboro Dr., # 1000
McLean, VA 22102
703.883.2872 - Office
703.855.9945 - Mobile
@BobbyCaudill - Twitter
Bobby Caudill - Facebook
www.governmentbits.com - Blog

Dave McAllister
Director, Standards and Open Source
650-523-4942 (GC)
408-536-3881 (Office)
Dwmcallister (Skype, Aim, YIM)
Received on Thursday, 9 April 2009 19:41:50 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:00:40 UTC