Re: [dpub-arch] ideas for PWP use cases related to archival services from Leonard Rosenthol on 2016-05-05 (public-digipub-ig@w3.org from May 2016)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Thu, 5 May 2016 12:17:07 +0000
To: Ivan Herman <ivan@w3.org>, Tim Cole <t-cole3@illinois.edu>
CC: W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <418DA257-360A-4133-8194-5A9834A582DA@adobe.com>
There are two common approaches to the “archiving active content” problem

1 – Remove it during the ingestion/archiving process.  (possibly squirrel it off somewhere, so it’s there for reference)
2 – Require a specialized “archival viewer” that does not execute the active content.

For example, PDF/A currently takes both approaches, as it considered some types of things more important to archive than others.  It removes JavaScripts but keeps form fields (and annotations) and so the special viewer is required to ensure that those things aren’t interacted with.  However, there are those in the archival industry who don’t like the idea that the archival version is not the original (since stuff was removed) and one of the key discussion items right now in our work on PDF/A.Next (we haven’t named/numbered it yet) is how to address this going forward.

Leonard

From: Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>
Date: Thursday, May 5, 2016 at 4:55 AM
To: Tim Cole <t-cole3@illinois.edu<mailto:t-cole3@illinois.edu>>
Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Subject: Re: [dpub-arch] ideas for PWP use cases related to archival services
Resent-From: <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Resent-Date: Thu, 5 May 2016 08:55:40 +0000

These are really interesting!


On 4 May 2016, at 00:13, Timothy Cole <t-cole3@illinois.edu<mailto:t-cole3@illinois.edu>> wrote:

The last few Archival Task Force calls have generated a few ideas for PWP use cases related to archival services. Several of these ideas are listed below. Please keep in mind that these are only preliminary (i.e., not fully baked – in various stages of development), and in some cases what we have now can only be thought of as a placeholder. In some instances the use case ideas listed below may overlap; in other instances the initial idea may conflate multiple use cases.

Please respond to this email with suggestions for additional archival-related use case ideas, as well as with feedback on the ideas listed here that will help get us going on Thursday.


I wonder what the archival requirements positions are v.a.v. usage of javascript (or any other 'active' components). Is there a way to specify what level of dependency is acceptable (eg, if a small piece of JQuery is used to make the interface a bit sexier, it may not be considered important).

This actually ties into the concept of 'essential content' that we referred to in the PWP document; how would that content be handled when archival? Do we need a finer granularity of classification for the resources?

Ivan


We will spend most of Thursday's Archival Task Force call refining and making these ideas more granular as needed, and developing additional use case ideas.  Some of the following have already been added preliminarily to our Archival Use Cases page, http://w3c.github.io/dpub-pwp-arch/Archival-UCR.html. Here's an initial list of archival-related PWP ideas to get us going Thursday (more ideas welcome):

·         Initial Capture of a PWP by an Archiving Service: An archival service wants to harvest (spider) and save a PWP, and expects to find in the manifest the enumeration of what it will need to capture to make sure it has all the pieces of the PWP that need to be archived, even if these pieces reside on separate servers. (What does this mean for the design of the PWP manifest?)
·         A new Version of a PWP Component is Published, requiring partial re-harvesting: An archival service needs to update an Archival Information Package (i.e., a previously harvested PWP) because a new version of a component of the PWP has been published. (This may in fact be multiple use cases, see below.)

·         A Revision of a PWP (or PWP Component?) is Published, requiring re-harvesting: An archival service needs to update an Archival Information Package (i.e., a previously harvested PWP) because it or one of its components has been revised, e.g., a spelling error corrected.

·         A Retraction Notice of a PWP or PWP Component is Issued: An archival service needs to harvest the retraction notice and replace / update /  add to the Archival Information Package for the PWP as originally harvested to reflect the Retraction Notice issuance.

·         A PWP or PWP Component is Taken Down: An archival service needs to update an Archival Information Package (i.e., a previously harvested PWP) because it or one of its components has been taken down by the publisher.

·         Determining when format migration of a PWP is required. An archival service needs to validate that a previously harvested PWP and all of its components are still viable in order to determine when format migration is required.

·         Adding metadata to a PWP to support archiving: A service wishes to augment the metadata of a PWP being harvested for archiving with additional metadata deemed essential for long-term archiving.

·         Migrating metadata format: An archiving service needs to migrate the metadata associated with a PWP to a scheme that will better make sure the metadata  (as distinct from the content of the PWP in this case) can be read and understood in the future.

Thanks,

Tim Cole
University of Illinois at UC


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Thursday, 5 May 2016 14:28:15 UTC