RE: SIMILE Call, 16 May 2003, 1:00pm

At Mick's request, I'm going to add an extra agenda item after we discuss
call times. John is going to describe a visit by the folks at CNRI
responsible for the Handle System to HP Labs. 

Also Mick has been working on the history system use case. I'm just in the
process of updating the research drivers document to reflect this but here
is the use case:

4.5 History System 

4.5.1 Sponsor 
Mick Bass 

4.5.2 Scenario / Context 
The current DSpace [6] codebase includes a ``History'' system. The history
system is invoked whenever events of archival interest occur within the
system (for example, when a community is created, an item's instance
metadata is edited, or the members of a collection are modified). The
history system produces RDF intended to model: 

``snapshots'' of the primary data objects within DSpace (e.g. communities,
collections, items, etc.) at various points in time; as well as 
the situations and temporal events that relate these snapshots. 
The graphs produced by the history system are serialized in RDF/XML, and
stored to the file system with simple indices to the serializations
maintained in the RDBMS. The history system is currently loosely based on
the ABC ontology from the Harmony project [16]. 
Constituents want Libraries to act as a long-term steward of assets, and
simultaneously demand support for community-specific metadata. In some cases
communities will contract with the library to maintain instances of their
community-specific schema(s). In other cases the community will provide
instances directly, and in still other an agent may provide and/or maintain
the instances. To be trusted, the steward must be able to retain and provide
non-repudiatable audit trails of how content and metadata (both library
``core'' metadata, and community-specific metadata) has been altered,
migrated, transformed, or augmented over time (``proof of transformation
path``, or ``data provenance``). The steward may need to undo and/or redo
part of the transformation path based on errors discovered, improved tools,
etc. Queries on the information regarding transformation paths across the
corpus can be expected in the process of optimally maintaining and
preserving the corpus. 

The history system includes two important functional capabilities: 

it models (and enables query upon) the temporal history of changes and
significant events within the archive. That is, it represents the archive as
a series of situations, separated by events, with the archive at a
particular state in each of the situations. 
it flexibly models the state of archived objects themselves, in any given
situation. 
Some questions: 

What schemas are required of all objects in the system for stewardship? 
Something like Harmony/ABC for modelling temporal events? 
Some sort of ``base ontology'' for any object for which stewardship /
preservation committments are made? Subclasses for various kinds of
preserved things (e.g. Collection, Community, Item, WebSiteItem,
CommunitySchema, etc.)? 
When a community-specific schema is used to accompany an item with
community-specific instance metadata, how does the history sytem model this
in the state of the item? 
How can questions be answered about how the policies associated with
managing a collection and/or validating the items within it have changed
over time? 
How can such journaling information assist in identifying items that require
attention, rework, or transformation? 

4.5.3 Actors 

``information owner'', within a community or collection 
preservation steward 
collection policy manager 

4.5.4 Use Cases 

interacting with communities regarding community-specific schemas and
instances 
OAIS & preservation-oriented use of the history data 
a service offering non-repudiatable proofs of the transformation path / data
provenance of an object 

4.5.4.1 Modelling Community-Specific Schemas 
A community representative approaches the library and proposes a new
collection C of DSpace items, which the library establishes. 
Sample Query When was collection C established? Sample Query Which
collections were established in 2004? 

The community begins to submit new items to this community. 

Eventually the community develops and begins to consistently use schema S to
describe their works. Someone from the community registers S with the
library, and some members begin to include instances of S with their
submissions to C. 
The history system models the registration of S, and the use of S with items
submitted to C that are accompanied by an instance of S. 

Sample Query: When was schema S registered? Sample Query: Who registered S?
Sample Query: for some item I which includes an instance of S, return a
graph representing the instance of S... Sample Query: for some item I which
includes an instance of S, return a graph modelling the series of changes to
the instance, with reference to each state of the instance. 

At some point the community decides to require that each item submitted to
collection C include an instance of S. 
Sample Query: How many items were submitted to C in the last year? How many
of these items included instances of S? 

Sample Query: When did collection C begin requiring instances of S upon
submission? 

Sample Query: Return a graph modelling the series of changes to collection
C's submission policies... 

Sample Query: Which items in C were submitted before S was registered?
Sample Query: Which items in C were submitted after S was registered, before
the collection required instances of S, and have S metadata? 

A new tool T is introduced that greatly increases the quality of produced S
instances by using technology assist to extract information from a work and
suggest appropriate values to the user. Community C begins to use this tool
on future submissions. 
Sample Query:When did C begin using T in preparing its items? Sample
Query:Which items in C have S metadata that were not prepared using T? Which
of these were submitted after C began requiring instances of S? 

4.5.4.2 OAIS & preservation-oriented use of the history data 

The library and a Community establish collection C. As part of this process,
the Library and the community agree on a Submission Information Packages
(SIP) that submitters may use to submit items to the collection. The
agreement is described in a submission agreement between the Library and the
Community. The submission agreement describes validation rules for the SIP
(e.g. required object structures, content formats, schema instances and
controlled vocabularies.), as well as ingest tools that the Library will
support (e.g. web UI, legacy system adapter). 

At a later time the community and the Library agree upon an alternate SIP,
which contains similar information content but that is structured slightly
differently (because, for example, the SIPs are produced in large quantities
by an existing system, rather than individually by humans). 
Sample Query: What Submission Agreements are in place for this collection?
What Submission Agreements were in place on February 28, 2003? 

The library initially agrees for items within the collection to be viewed
using the standard DSpace ``item view''. 

At some point the community develops a specialized ``community viewer'' for
items in their collections. The community negotiates with the library for
items in their collection to be made available in a dissemination
information package D which can be easily consumed by this viewer. 
Sample Query: What Dissemination Agreements are in place for this
collection? 

Sample Query: What Dissemination Agreements require production of D? 

Sample Query: What viewing services are known to be able to consume D? 

The community arranges for the Library to host a server-side viewer V that
accepts the community-specific DIP. 
Sample Query: What dissemination agreements govern our hosting of V? Sample
Query: When did we begin disseminate using D? When did we begin hosting V?
Sample Query: What viewers should be made available for item I? Sample
Query: What viewers were available for I on February 28, 2003? 

The community requests that the library-hosted viewer also be made available
for items in another collection with similar information content. 

The library and the community agree to cease submissions using the first
SIP, and nullify the corresponding submission agreement. 

4.5.4.3 non-repudiatable proofs of transformation path / data provenance 
To be completed. 

4.5.5 Issues, Mechanisms 

4.5.6 Scale 

> Friday 16 May 2003
> 1p eastern
> 
> Homework: please review History System Descriptive Note - see below. 
> 
> Toll Free Access Number:
>     866-639-4752
> Toll Access Number:
>     574-935-6705
> Participant PIN:
>     2536617
> 
> Date proposals for SIMILE plenaries
>   2 full days
> 	Week of July 22,23,24	(Cambridge, but offsite)
> 	Week of October 21,22,23
> 
> Time proposal for SIMILE weekly call
> 	12:00 noon eastern Fridays
> 
> History System Descriptive Note
> http://web.mit.edu/simile/www/resources/history-harmony/descri
> ptive-note.pdf
>      barriers/issues precluding "complete" status for this document
> 
> Update on History System use case in the Research Drivers document
> 
> As Time Available:
> - review open issues with Research Drivers Document
> http://web.mit.edu/simile/www/documents/researchDrivers/rd.html
> http://web.mit.edu/simile/www/documents/researchDrivers/rd_issues.html
> 
> - review open project level issues
> http://web.mit.edu/simile/www/resources/projectIssues/pi.html
> 
> =============================================
> Dr Mark H. Butler
> Research Scientist                HP Labs Bristol
> mark-h_butler@hp.com
> Internet: http://www-uk.hpl.hp.com/people/marbut/
> 
>  
> 

Received on Friday, 16 May 2003 08:28:14 UTC