W3C home > Mailing lists > Public > public-egov-ig@w3.org > April 2009

Re: PDF's usefulness to the semantic web

From: Jose M. Alonso <josema@w3.org>
Date: Fri, 24 Apr 2009 17:54:35 +0200
Cc: Bobby Caudill <rcaudill@adobe.com>, Owen Ambur <Owen.Ambur@verizon.net>, eGovernment Interest Group WG <public-egov-ig@w3.org>, Christopher Testa <ctesta@ushmm.org>, Miguel Ángel Amutio <miguel.amutio@map.es>, John Sheridan <John.Sheridan@nationalarchives.gov.uk>
Message-Id: <F31FCAA3-236C-4DC5-B1A9-B77A647A5332@w3.org>
To: Dave McAllister <dmcallis@adobe.com>
El 21/04/2009, a las 18:44, Dave McAllister escribió:
> Well, let me express a few opinions here.

Please!


> The concept of data on and to the web is a broad topic in its own  
> right.  Does that “data” include audio?  Video?  How about existing  
> documents that should be readily available.

Documents, data, information are different concepts in my view.


> Data (on and to the web) seems to fall into rough groupings (in a  
> non technical way) of static (not accessible to be changed by the  
> consumer, e.g documents, video, audio, DigSig), dynamic (able to be  
> changed, including forms) and conversational (active feedback and  
> exchange, both sync and asynch).
>
> W3C does not cover the myriad of data formats that have proliferated  
> on the web since the original days of hypertext (emphasis on text).
>
> Technically, you are correct in that the data formats of PDF are not  
> “web” standards.  However, the use cases for such technologies in  
> implementation is that they appear in a browser or web environment  
> and as such should be considered in best practices. Forms (and there  
> are dozens of form standards) are also separable from web standards;  
> should we equally ignore them?   Ask the average consumer whose  
> browser opens a PDF file within the browser confines whether it is a  
> Web standard; I’d postulate that they neither know nor care.

This is the use case I was looking for. Thanks.


> My issue is not to identify the specific SDOs at this point, though  
> I believe that a widespread mission and charter should include such  
> recognition. W3C has an interesting set of standards, but how about  
> IETF, OASIS, DigSig in ETSI?   Where do those fit into this.

Right. See charter, it's there. Unfortunately, this Group has been  
moving slowly. It took the core participants several months to build  
awareness about our activity here and there's still a long way to go.  
We have a liaisons with OASIS already, hope we could setup more in the  
future.


> For instance, referencing point 4, while PDFs may be ISO, they are  
> controlled under AIIM.  I think we do have some level of connection  
> to AIIM, so should we include them because of that?

We do, and no. In my previous mail, connection between the points in  
my rationale was intended as and AND not as an OR.


> The short point is that the document should at least recognize the  
> existence and use of standards for data, even though they are not  
> W3C standards. Given the attention that the current US  
> administration may have placed on this document in implying that it  
> will be the vehicle for industry input, it behooves us to make sure  
> what we produce does not set up unfair trade practices in its own  
> right. (ref: OSTP, New Media Office)
>
> As such, the “industry input” that I have to consider for Adobe is  
> that negative references to existing standards should then be  
> equally removed.

+1

-- Jose


>
>
> davemc
>
> On 4/21/09 1:19 AM, "Jose M. Alonso" <josema@w3.org> wrote:
>
> Dave, Bobby,
>
> I think we are talking about ISSUE-18 again here, what standards
> besides W3C's should be added to the document. Some comments about  
> this.
>
> I'm sure we all agree on the usefulness and heavy use in government of
> several standards beyond W3C's. Said that, here's my rational _not_ to
> include those you are referring to (and others).
>
> 1) This is W3C and the document is titled? "... of the Web" and I
> prefer to stick to Web standards for now. For me, e.g. PDF/A and OOXML
> are not Web standards but something you can link to from the Web, as
> you can link to a ZIP file. Not an expert in the field and welcome any
> additional info but quite sure we could discuss for hours and hours
> what is Web and what is not and we would probably have as many
> versions as group participants.
>
> 2) I would like to see the Web-related use cases to add this or any
> other technology and not adding them just for growing the list of
> standards referenced in the document.
>
> 3) Why add only ISO ones, why not IETF ones or others'?
> As an example, I recently learned about the study for the catalogue of
> standards usable by governments here in Spain, and heard Miguel Amutio
> speak about it saying 400 coming from dozens of bodies were analyzed.
> Not to mention other similar initiatives such as the U.S. TRM.
>
> 4) We don't have a liaison with ISO and I would prefer this Group not
> to make interpretations on the use of standards developed by other
> organizations without discussing with them how they fit in our work.
>
>
> Scoping the Group's work was a difficult challenge and I don't think
> that broadening the scope now that the charter is about expire makes
> sense.
>
> I think we should disregard this for now but discuss it when
> developing the 2nd charter, see if we should work on a broader suite
> of standards, setup a liaison with more SDOs, etc.
>
> One more comment, you mention:
> > On reading the document several times, it seems uncleaqr if we are
> > focused on transient data and remunging such, on archival and
> > temporal validation of such.  I think this came through in some of
> > the discussions on socila media in the last telecon.
>
> I wish we had some text in the "Long Term" section but we don't yet.
> There were former IG Members tasked to provide use cases on the
> differences you mention. I remember we were going to get a use case on
> "Temporal Data" but unfortunately that didn't happen.
>
> Sticking to the Web standards part above, I think that section was
> intended to talk about "Web Archiving" and maybe the closest view is
> that of the draft use case John submitted a while ago -- http://www.w3.org/2007/eGov/IG/wiki/Use_Case_10_-_Persistent_URIs
>
>
> -- Jose
>
>
>
> El 09/04/2009, a las 21:41, Dave McAllister escribió:
> > Actually, I think it important that we do recognize and extend to
> > relevant standards. PDF (as in ISO 32000) is such, as are PDF/A,  
> PDF/
> > E, PDF/X.  There are also best practices based on such , e.g.  PDF/
> > Healthcare. PDF/UA is approaching such status as well.
> >
> > It is interesting to note that right now neither Mars nor XPS are
> > formal standards, though I suspect XPS will be approved in Ecma
> > shortly (as was OOXML for th starting point of that most painful
> > standard process.
> >
> > On reading the document several times, it seems uncleaqr if we are
> > focused on transient data and remunging such, on archival and
> > temporal validation of such.  I think this came through in some of
> > the discussions on socila media in the last telecon.
> >
> > Reworking the world from PDF (to which there are numous independent
> > implementations)seems counter intuitive in this best practices  
> style.
> >
> > davemc
> >
> > On 4/9/09 11:47 AM, "Owen Ambur" <Owen.Ambur@verizon.net> wrote:
> >
> > If PDF is expressly referenced, so too should Adobe’s Mars Project
> > -- http://labs.adobe.com/technologies/mars/ -- as well as XFDL -- http://en.wikipedia.org/wiki/Extensible_Forms_Description_Language
> >  -- and XPS: http://en.wikipedia.org/wiki/XML_Paper_Specification
> >
> > Owen Ambur
> > Co-Chair Emeritus, xmlCoP <http://xml.gov/index.asp>
> > Co-Chair, AIIM StratML Committee <http://xml.gov/stratml/index.htm>
> > Member, AIIM iECM Committee <http://www.aiim.org/Standards/article.aspx?ID=29284
> > >
> > Invited Expert, W3C eGov IG <http://www.w3.org/2007/eGov/IG/>
> > Communications/Membership Director, FIRM Board <http://firmcouncil.org/id5.html
> > >
> > Former Project Manager, ET.gov <http://et.gov/>
> > Brief Bio <http://ambur.net/bio.htm>
> >
> >
> >
> >
> > From: public-egov-ig-request@w3.org [mailto:public-egov-ig-request@w3.org
> > ] On Behalf Of Bobby Caudill
> > Sent: Wednesday, April 08, 2009 10:11 AM
> > To: public-egov-ig@w3.org
> > Subject: PDF's usefulness to the semantic web
> >
> >
> > Calling out PDF specifically here should be reconsidered.
> >
> > >From a semantic web perspective, PDF is more useful than many other
> > formats, including graphics, imagery, audio and video, all of which
> > are very useful formats for government to consider when becoming
> > transparent. Given that documents are machine readable as well as
> > human readable, technologies do exist today that are capable of
> > extracting an ontology, making the information more useful to the
> > semantic web.
> >
> > In addition, there simply are times when a secure container is
> > required for publishing information. While typical internet
> > technologies, such as outlined above, are very good for sharing and
> > transparency, they are not necessarily always appropriate for
> > information types that require assurances of authenticity, privacy,
> > authoritativeness, etc.
> >
> > Further, is the requirement to archive PSI. Again, with
> > consideration that many government processes are document based,  
> PDF/
> > a (ISO 19005-1:2005) provides a standards based approach to ensuring
> > the long term preservation of government information. PDF/a based
> > documents are both machine readable, making them searchable,
> > discoverable and available to the same technologies as an ISO 3200
> > PDF to extract ontologies. Likewise, the standard’s based nature of
> > PDF/a ensures the ability to allow human access to the documents
> > into the future.
> >
> > I am concerned that this paper is limiting it's focus and not taking
> > into consideration the wider view of government processes, many of
> > which depend upon more traditional document formats for legitimate
> > business reasons.
> >
> > Thank you for the consideration.
> >
> > Bobby Caudill
> >
> >
> >
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Bobby Caudill
> > Solution Architect, Global Government Solutions
> > Adobe Systems Incorporated
> > 8201 Greensboro Dr., # 1000
> > McLean, VA 22102
> > 703.883.2872 - Office
> > 703.855.9945 – Mobile
> > @BobbyCaudill – Twitter
> > Bobby Caudill – Facebook
> > www.governmentbits.com - Blog
> > rcaudill@adobe.com
> >
> >
> > --
> > Dave McAllister
> > Director, Standards and Open Source
> > 650-523-4942 (GC)
> > 408-536-3881 (Office)
> > Dwmcallister (Skype, Aim, YIM)
> > http://blogs.adobe.com/open
>
>
>
> -- 
> Dave McAllister
> Director, Standards and Open Source
> 650-523-4942 (GC)
> 408-536-3881 (Office)
> Dwmcallister (Skype, Aim, YIM)
> http://blogs.adobe.com/open
Received on Friday, 24 April 2009 15:55:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 24 April 2009 15:55:26 GMT