DSpace RDF History Description

>RDF History Data
>         - I would like to set up an offline small-group
>           to review the modelling of this information,
>           identify any suggestions to improve, and
>           brainstorm good ways to view/browse/traverse
>           the info.

The following descriptions of RDF History within DSpace are relevant 
pre-reading.

- Mick

=====

 From 
http://www.dspace.org/live/implementation/design_documents/architecture.pdf

2.3.10 History
The goals of the history subsystem are to capture a time-based
record of significant changes in DSpace, in a manner suitable
for later refactoring or repurposing, and to provide a corpus of
data suitable for research by HP Labs and other interested
parties. Note that the history data is not expected to provide
current information about the archive; it simply records what has
happened in the past.
Currently, the History subsystem is explicitly invoked when
significant events occur (e.g., DSpace accepts an item into the
archive). The History subsystem then creates RDF data
describing the current state of the object. The RDF data is
modelled using Harmony/ABC, an ontology for describing
temporal-based data, and stored in the filesystem. Some simple
indices for unwinding the data are available.

 From 
http://www.dspace.org/live/implementation/design_documents/functionality.pdf

2.4.6 Store Object History, Serializations
DSpace offers history functionality to provide an audit trail of the
administration of the archive, to provide data supporting root-cause
analysis, and to support human-moderated rollback
capabilities.
Whenever selected events of archival interest occur within the
DSpace system, the system creates and stores a snapshot of the
objects involved, and the relationships among them.
The information in DSpace history snapshots is recorded using
open standards. The generated history snapshots are graph-oriented,
and are usable outside the DSpace system by emerging,
standard, semi-structured data manipulation toolkits. Specifically,
the history snapshots adopt the “ABC” data model from the
Harmony project
<http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc
/abc_draft.html>,
and implement this data model using RDF, see
<http://www.w3.org/RDF/ >.
Taken together, the history snapshots provide a time-based record
of significant changes to the DSpace corpus. The history data
does not provide current information about the archive; it
provides a record of significant changes that occurred in the past.
History snapshots are read-only. Once created, they are never
changed. Further, the intent from the outset is that they never be
deleted.
DSpace offers a basic lookup capability for history snapshots, that
provides an interested administrator with a list of all history
snapshots that pertain to a specified item.
Dspace does not currently offer query functionality that interprets
the contents of history snapshots. Doing so is a goal of the
DSpace research agenda.
2.4.6.1 History Events
The following events within DSpace create history snapshots:
· Communities
- create / modify / delete Community
- add collection to community
· Collections
- create / modify / delete Collection
- add item to collection
· Items
- create/modify/delete Item
- assign handle to item
- modify item contents
(bitstreams, metadata fields, etc.)
· Users
- create / modify / delete User
· Submission Process
- submission approval process completed
2.4.6.2 Example
An item is submitted to a collection via bulk upload. The Item is
eventually added to the collection. At this time, DSpace creates a
history snapshot that records information about the submittal.
The history snapshot includes the following new resources (all
with unique ids):
· an event. This event will be annotated with the time
that the addition occurred, and be used to relate the
addition to the resulting state of the archive (see next
bullet).
· a state. This resource provides a way to refer to the
state of some subset of the archive. Events within the
archive cause some prior state to transition to a
subsequent state. Examining the relationships
between states and events can allow administrators to
understand specifically what has occurred within the
archive, and how it got to be the way that it currently
is.
· an action. This models the addition itself. Other
actions might also be modeled, if they happen
atomically with the event. Actions are typically
performed by agents or tools. These tools may have
specific versions. The action was initiated by a
particular DSpace user. All of these assertions can
then be models as relationships with the action
resource.
For example, the system might includes the following
relationships in the snapshot:
event ––atTime® time
event ––hasOutput® state
Item ––inState® state
state ––contains® Item
action ––creates® Item
event ––hasAction® action
action ––usesTool® DSpace Upload
action ––hasAgent® User
The system further includes serializations that capture the state of
archival objects participating in these relationships (in this case,
the Item, the User, and the DSpace Upload).



=============================================
Mick Bass, Sloan MOT 2000

R&D Project Manager, Hewlett-Packard Company
Building 10-500 MIT, 77 Massachusetts Avenue
Cambridge, MA 02139-4307

617.253.6617 office    617.452.3000 fax
617.899.3938 mobile    617.627.9694 residence
bass@alum.mit.edu      mick_bass@hp.com
=============================================

Received on Friday, 23 August 2002 08:53:58 UTC