Web Choreography Use Case Overview

Bruce R. Barkstrom, Paula L. Sidell

Atmospheric Sciences Data Center

NASA Langley Research Center

Hampton, VA, USA 23681-2199

and

Donald M. Sawyer

National Space Sciences Data Center

NASA Goddard Space Flight Center

Greenbelt, MD, USA

June 4, 2003 5:00 am EDT

1 Introduction

Web choreography represents the culmination of a complex intersection of many disciplines that must work together in order to achieve a smooth and uninterrupted flow of data and services among many cooperating partners. These disciplines include

·: Workflow Management and Business Process Descriptions of activities within organizations
·: Agent and Autonomous Computing, which provides descriptions of interactions within communities of independent software or human actors
·: Computational and Mathematical Organization Theory, which investigates ways of understanding and improving organizations by using such mathematical and computational tools as colored Petri nets
·: Industrial Engineering, which provides descriptions of complex processes that must be coordinated and scheduled in order to allocate resources efficiently, particularly for production activities

The goal of web choreography is to provide a mechanism for coordinating collections of web services to achieve desired outcomes. Indeed, it is the intent of the Web Choreography Working Group of the W3C to produce a recommendation on a language that can be used to describe the actions and interactions of web services when they must be assembled in complex ways to achieve a particular purpose. In this context, the W3C Web Services Architecture Working Group has defined a web service as ``a software application identified by a URI, whose interfaces and bindings are capable of being defined, described, and discovered as XML artifacts. A Web service supports direct interactions with other software agents using XML-based messages exchanged via Internet-based protocols.'' [http://www.w3.org/TR/2002/WD-wsa-reqs-20020819, as quoted in Ferris and Farrell, 2003]. As part of the initial stages in developing such a recommendation, it has proven helpful to provide use cases that help define how a complex system should behave. For reference, we can follow Rosenberg's definition [1999, p. 38], ``A use case is a sequence of actions that an actor (usually a person, but perhaps an external entity, such as another system) performs within a system to achieve a particular goal.''

This document is intended to provide a collection of use cases that can be used for several purposes:

·: Providing generic examples of interactions that can guide the design of a web choreography language, as well as concrete designs
·: Providing the foundation for test cases against which we can test reference implementations or deployed implementations of the recommendations
·: Exposing the essential activities of the actors to analyses that will increase system reliability and security
·: Providing a basis for scheduling and resource allocation of the deployed activities, even in a distributed environment, as well as for optimizing the return on investment and reducing the cost of operations for widely distributed enterprises

The third of these objectives is particularly important for increasing confidence in web services and is certain to become even more important in the future. The fourth is likely to become important as e-commerce develops, particularly as grid computing is deployed in commercial environments.

As we describe below, the use cases that we provide are based on practical experience with a collection of large, distributed data centers and on an international standard describing the operation of such data centers. This basis is useful for several reasons:

·: Data Centers or Open Archives contain a wide range of activities that have analogues to other businesses. They need both people and machines to perform their work.
·: The activities of the data centers include both normal operations and exceptional situations. It is clear that descriptions of web choreography need to include exceptions as a fundamental part of their design in order to deal adequately with reliability and security.
·: Data Centers are already dealing with an environment that requires maintenance, hardware migration, software evolution, and related operational pleasantries. We need to include these `facts of life' in our use case scenarios - increasing the probability that a web choreography will survive exposure to the real world.
·: The current Data Center environment includes as wide a range of users as we are likely to find on the WWW now and in the foreseeable future. By incorporating data center experience, we can ensure that a web choreography language is robust enough to survive exposure to real users and is likely to survive exposure to the changes in the user community in the future. In addition, the diversity of data producers and data users exposes the Data Centers to the challenges of dealing with the dialects and world views of many different `communities of practice and discourse' - surely one of the challenges of the Semantic Web.

In addition, there is an increasing realization that digital preservation of information is a critical element for the preservation of human civilization [NDIIPP, 2003; NRC, 2003]. Thus, by using Data Centers or Open Archives as a basis for a web choreography recommendation, we may assist digital preservation efforts in achieving their goals.

The next sections of this document provide a brief description of the OAIS Reference Model [2002] and of the NASA EOSDIS data centers that provide the concrete experience that we use in developing test cases and shaping some of the use cases.

2 The OAIS Reference Model

The use cases we consider below are based on the ISO Standard for a Reference Model for an Open Archival Information System (OAIS) [CCSDS, 2001], which has become ISO 14721. This standard was developed by the Consultative Committee on Space Data Systems, an international body concerned with interoperability of governmental space assets. By using this standard, we ensure that there will be a very low probability of patent complications. Furthermore, this standard has caught the attention of the digital library community and is widely regarded as important there. An important aspect of the OAIS is that it seems likely to provide significant guidance to the architecture suggested by the U.S. Library of Congress project on the "National Digital Information Infrastructure and Preservation Project" [NDIIPP, 2003] and to related international projects.

It is useful to quote from the Standard itself in understanding the nature and function of an OAIS: ``An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. It meets a set of such responsibilities as defined in the standard, and this allows an OAIS archive to be distinguished from other uses of the term `archive'. The term `Open' in OAIS is used to imply that the standard, as well as future related Recommendations and standards, are developed in open forums. It does not imply that access to the archive is unrestricted.

The information being maintained in an OAIS has been deemed to need Long Term Preservation, even if the OAIS itself is not permanent. Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely. In this reference model there is a particular focus on digital information, both as the primary forms of information held and as supporting information for both digitally and physically archived materials. Therefore, the model accommodates information that is inherently non-digital (e.g., a physical sample), but the modeling and preservation of such information is not addressed in detail.

This reference model:

-: provides a framework for the understanding and increased awareness of archival concepts needed for Long Term digital information preservation and access;
-: provides the concepts needed by non-archival organizations to be effective participants in the preservation process;
-: provides a framework, including terminology and concepts, for describing and comparing architectures and operations of existing and future archives;
-: provides a framework for describing and comparing different long term preservation strategies and techniques;
-: provides a basis for comparing the data models of digital information preserved by archives and for discussing how data models and the underlying information may change over time;
-: provides a foundation that may be expanded by other efforts to cover long-term preservation of information that is NOT in digital form (e.g., physical media and physical samples);
-: expands consensus on the elements and processes for long-term digital information preservation and access, and promotes a larger market which vendors can support;
-: guides the identification and production of OAIS-related standards.

The reference model addresses a full range of archival information preservation functions including ingest, archival storage, data management, access, and dissemination. It also addresses the migration of digital information to new media and forms, the data models used to represent the information, the role of software in information preservation, and the exchange of digital information among archives. It identifies both internal and external interfaces to the archive functions, and it identifies a number of high-level services at these interfaces. It provides various illustrative examples and some `best practice' recommendations. It defines a minimal set of responsibilities for an archive to be called an OAIS, and it also defines a maximal archive to provide a broad set of useful terms and concepts.'' [[CCSDS, 2001, p. 1] We will later identify more specific ties and analogues between the OAIS functions and those of other kinds of businesses.

The Model has been done in UML and breaks down the functions in an archive into six large classes, to which we add one additional class

1.: Ingest (i.e. bringing data into the archive)
2.: Data Management
3.: Archival Storage
4.: Data Access (including such services as searching for information)
5.: Preservation Planning
6.: Administration
7.: Production [added for completeness]

The OAIS Working Group has been working on the protocol between an Open Archive and a Data Producer who will send data to the OAIS. We add functions for the actual work of the Data Producer. The functions identified here can provide a very good foundation to ensuring that a W3C Web Choreography recommendation has not missed important functionality. They also appear to be relatively straightforward to generalize to other kinds of businesses.

3 The NASA EOSDIS Distributed Active Archive Centers

The OAIS Reference Model can be made much more useful by ensuring that the use cases for web choreography are based on concrete experience. Specifically, we propose to describe the operation of NASA's EOSDIS Distributed Active Archive Centers (DAACs) in terms of the OAIS Reference Model and to use that description for use case instances or scenarios. The experience of these operational data centers can help ensure a realistic recommendation - and not just one based on academic interests. These centers have to deal with security, data production, data distribution, user access, reliability, and cost of operation on a daily basis. For example, the NASA Langley Atmospheric Sciences Data Center (ASDC) currently has about 500 TB of data in its store. It is adding about 20 TB per month to that store, and had more than 3,000 data-ordering users in the last year.

The EOSDIS DAACs hold data from forty instrument teams on fifteen satellites that have been launched during the period from 1997 to 2003, as well as similar data going back into the 1970's. The collection will grow even further in future years, as more instrument teams and satellites are added to the mix. Reber and Todirita [2003] provide a useful on-line summary <http://eosdatainfo.gsfc.nasa.gov/eosdata/>. Table 1 identifies these data centers and the specific kinds of data they hold.

TABLE 1. EOSDIS Data Centers.

Data Center Data Specialty

Oak Ridge National Laboratory (ORNL) DAAC, <http://www-eosdis.ornl.gov> Terrestrial biogeochemistry, ecosystem dynamics

Socioeconomic Data and Applications Center (SEDAC) <http://sedac.ciesin.org> Population and administrative boundaries

Land Processes (EDC) DAAC <http://edcdaac.usgs.gov/landdaac/main.html> Land remote sensing imagery, elevation, land cover

NSIDC DAAC <http://nsidc.org/daac> Sea ice, snow cover, ice sheet data, brightness, temperature, polar atmosphere

Goddard Space Flight Center (GSFC) DAAC <http://daac.gsfc.nasa.gov/DAAC_DOCS/gdaac_home.html> Ocean color, hydrology and precipitation, land biosphere, atmospheric dynamics, and chemistry

Langley Research Center (LaRC) Atmospheric Sciences Data Center <http://eosweb.larc.nasa.gov> Radiation budget, clouds, aerosols, and tropospheric chemistry

Physical Oceanography DAAC (PO.DAAC) <http://podaac.jpl.nasa.gov> Atmospheric moisture, climatology, heat flux, ice, ocean wind, sea surface height, temperature

Alaska Synthetic Aperture Radar (SAR) Facility DAAC <http://www.asf.alaska.edu> Sea ice, polar processes

Briefly, data from Low-Earth Orbiting satellites is sent to a data collection site at White Sands, New Mexico, either by using telemetry relays between the Tracking and Data Relay Satellite System (TDRSS) or by shipments from ground stations. Typically, the ground stations receive telemetry data from an entire orbit within the ten minutes or so that a satellite is above the horizon at the ground station, requiring data transmission of a ninety minute orbit's data collection into a window of opportunity that is only ten percent of the full orbit. Data rates for this window are rather high. After the White Sands site ensures that the telemetry packets from the satellite have been sorted into time-sequenced order, the collection of packets is shipped to a site at Goddard Spaceflight Center in Greenbelt, MD, and then redistributed to the EOSDIS Data Centers or to the instrument science teams in other locations.

Various data centers process the data through software provided by the science teams that develop the instruments and the algorithms for processing the data. In some cases, the processing is quite simple - multiply the data value from a satellite measurement by a constant for calibration and append information allowing the location of the measurement on the Earth to be derived by the user. In others, the science teams do very extensive processing that involves heavy computational loads. At ASDC, for example, one team has provided about one million lines of source code and another about two-thirds of a million lines. We will comment on production paradigms that teams use later. In most cases, the data production at the EOSDIS DAACs falls into moderate-rate, discrete batch production. This means that the DAACs or science team sites engaged in data production are running about 1,000 jobs per day, on average. Some sites run 100 per day or fewer; one runs 10,000 or more, and might be more accurately described as using an assembly line approach to production. The production facilities eventually send the data to the DAACs.

The EOSDIS data centers contain a fair amount of data. ASDC has about 500 TB; other data centers may have as much as 4 PB right now. At the end of mission life, they will probably have between ten and twenty PB in total. In addition to binary data, the DAACs also contain metadata and documentation. In this, they resemble most other enterprises on the WWW. Users are expected to search through the metadata, using database queries, and to use the documentation to find and understand what they can get out of the system. Each of the DAACs has their own interface and ordering tools. They also support a federated search system that can query all of the data centers to produce candidate files for users to order.

After they have ingested the data products, these data centers make the data available to almost anyone who wants it - usually at no cost or just recovering the marginal cost of distribution. Because the EOSDIS data centers are open to the Internet, they suffer the same environment as other open web sites. On a practical level, the lives of data center staff are consumed with data management and user services. The statistics collected as part of the EOSDIS system operations suggest that this system is serving data orders to more than 100,000 users per year, with some reports suggesting the system is getting about 2,000,000 distinct web users hitting the data centers in a year.

The EOSDIS system design started about 1990 - and was not strongly impacted by the sudden emergence of the web. There is so much data that this system has not been able to use databases for the bulk of it. The current system's design is based on a paradigm of file-based search and order, with many of the files as large as 500 MB. By search and order, we mean that users can search for and order one or more files. These data centers have had some experimentation with subsetting and supersetting, which would allow them to customize their data to particular user communities. Four of the larger data centers, have a system developed by a large aerospace contractor. This system has about 1.7 Million lines of code and uses about twenty to twenty five Commercial Off-The-Shelf (COTS) products. All of the data centers have other systems they've developed to handle the specific needs of their data sets and user communities. Based on cost considerations, most of the files in these systems are stored in robotic tape silos, although the U.S. National Research Council recently released a report [NRC, 2003] recommending that the centers move to much more disk-based storage.

4 The OAIS and Producer Context

Figure 1 identifies the major elements in the OAIS Context, including the relationship with the producer and supplier. The Production Facility identified in this figure, as well as the Supplier and Production Management, are not from the OAIS Reference Model, but are added for the sake of completeness. In the material that follows, we quote extensively from Chapter 2 of the OAIS Reference Model.

Figure 1. OAIS-Producer Context Diagram

4.1 External Entities

Outside the OAIS are Producers, Consumers, and Management.

·: Producer is the role played by those persons, or client systems, which provide the information to be preserved.
·: OAIS Management is the role played by those who set overall OAIS policy as one component in a broader policy domain. In other words, Management control of the OAIS is only one of Management's responsibilities. Management is not involved in day-to-day archive operations. The responsibility of managing the OAIS on a day-to- day basis is included within the OAIS in an administrative functional entity that will be described later.
·: Consumer is the role played by those persons, or client systems, that interact with OAIS services to find and acquire preserved information of interest. A special class of Consumers is the Designated Community. The Designated Community is the set of Consumers who should be able to understand the preserved information.

Other OAIS archives are not shown explicitly. Such archives may establish particular agreements among themselves consistent with Management and OAIS needs. Other archives may interact with a particular archive for a variety of reasons and with varying degrees of formalism for any pre-arranged agreements. One OAIS may take the role of Producer to another OAIS; an example is when the responsibility for preserving a type of information is to be moved to this other archive. One OAIS may take the role of Consumer to another OAIS; an example is when the first OAIS decides to rely on the other OAIS for a type of information it seldom needs and chooses not to preserve locally. Such reliance should have some formal basis that includes the requirement for communication between the archives of any policy changes that might affect this reliance. The range of possible interactions between OAIS archives is discussed later in the section on Archive Interoperability.

In addition to the OAIS, we add a Production Facility, within which (data) products are created. The entities for this part of the environment include

·: Supplier is the role played by those persons, or client systems, which provide the raw data to be transformed in the Production Facility.
·: Production Management is the role played by those who set overall production policy as one component in a broader policy domain. In other words, Management control of the Production is only one of Management's responsibilities. Management is not involved in day-to-day production operations. The responsibility of managing the Production Facility on a day-to- day basis is included within that facility in an administrative functional entity that will be described later.

4.2 High-Level External Interactions

4.2.1 Management Interaction

Management provides the OAIS with its charter and scope. The charter may be developed by the archive, but it is important that Management formally endorse archive activities. The scope determines the breadth of both the Producer and Consumer groups served by the archive. Some examples of typical interactions between the OAIS and Management include:

·: Management is often the primary source of funding for an OAIS and may provide guidelines for resource utilization (personnel, equipment, facilities).
·: Management will generally conduct some regular review process to evaluate OAIS performance and progress toward long-term goals.
·: Management determines, or at least endorses, pricing policies, as applicable, for OAIS services.
·: Management participates in conflict resolution involving Producers, Consumers and OAIS internal administration.

Effective Management should also provide support for the OAIS by establishing procedures that assure OAIS utilization within its sphere of influence. For example, management policies should require that all funded activities within its sphere of influence submit data products to the archive and also adhere to archive standards and procedures.

4.2.2 Producer Interaction

The first contact between the OAIS and the Producer is a request that the OAIS preserve the data products created by the Producer. This contact may be initiated by the OAIS, the Producer or Management. The Producer establishes a Submission Agreement with the OAIS, which identifies the SIPs to be submitted and may span any length of time for this submission. Some Submission Agreements will reflect a mandatory requirement to provide information to the OAIS, while others will reflect a voluntary offering of information. Even in the case where no formal Submission Agreement exists, such as a World Wide Web (WWW) site, a virtual Submission Agreement may exist specifying the file formats and the general subject matter the site will accept.

Within the Submission Agreement, one or more Data Submission Sessions are specified. There may be significant time gaps between the Data Submission Sessions. A Data Submission Session will contain one or more SIPs and may be a delivered set of media or a single telecommunications session. The Data Submission Session content is based on a data model negotiated between the OAIS and the Producer in the Submission Agreement. This data model identifies the logical components of the SIP (e.g., the Content Information, PDI, Packaging Information, and Descriptive Information) that are to be provided and how (and whether) they are represented in each Data Submission Session. All data deliveries within a Submission Agreement are recognized as belonging to that Submission Agreement and will generally have a consistent data model, which is specified in the Submission Agreement. For example, a Data Submission Session may consist of a set of Content Information corresponding to a set of observations, which are carried by a set of files on a CD-ROM. The Preservation Description Information is split between two other files. All of these files need Representation Information which must be provided in some way. The CD-ROM and its directory/file structure are the Packaging Information, which provides encapsulation and identification of the Content Information and PDI in the Data Submission Session. The Submission Agreement indicates how the Representation Information for each file is to be provided, how the CD-ROM is to be recognized, how the Packaging Information will be used to identify and encapsulate the SIP Content Information and PDI, and how frequently Data Submission Sessions (e.g., one per month for two years) will occur. It also gives other needed information such as access restrictions to the data.

Each SIP in a Data Submission Session is expected to meet minimum OAIS requirements for completeness. However, in some cases multiple SIPs may need to be received before an acceptable AIP can be formed and fully ingested within the OAIS. In other cases, a single SIP may contain data to be included in many AIPs. A Submission Agreement also includes, or references, the procedures and protocols by which an OAIS will either verify the arrival and completeness of a Data Submission Session with the Producer or question the Producer on the contents of the Data Submission Session.

Figure 2a shows the most common Producer-OAIS interaction, in which the Producer provides a Submission Information Package to the OAIS, which converts the SIP to an Archival Information Package.

Figure 2. OAIS-Producer Context Diagram

4.2.3 Consumer Interaction

There are many types of interactions between the Consumer and the OAIS. These interactions include questions to a help desk, requests for literature, catalog searches, orders and order status requests. Figure 2b illustrates the generic data access process, in which a consumer is interested in information, not in ordering a file. The consumer queries the archive, which responds with a Results Set.

The ordering process is of special interest to the OAIS Reference Model, since it deals with the flow of archive holdings between the OAIS and the Consumer. The Consumer establishes an Order Agreement with the OAIS for information. This information may currently exist in the archive or be expected to be ingested in the future. The Order Agreement may span any length of time, and under it one or more Data Dissemination Sessions may take place. A Data Dissemination Session may involve the transfer of a set of media or a single telecommunications session. The Order Agreement identifies one or more AIPs of interest, how those AIPs are to be transformed and mapped into Dissemination Information Packages (DIPs) and how those DIPs will be packaged in a Data Dissemination Session. The Order Agreement will also specify other needed information such as delivery information (e.g., name or mailing address), and any pricing agreements as applicable.

Ordering is a more formal process than querying. Figure 2c illustrates the generic ordering process. In this case, the consumer submits an Order, which allows the archive to convert an AIP into a DIP. The DIP is what is sent to the user.

There are two common order types initiated by Consumers: the Event Based Order and the Adhoc Order.

In the case of an Adhoc Order, the Consumer establishes an Order Agreement with the OAIS for information available from the archive. If the Consumer does not know a priori what specific holdings of the OAIS are of interest, the Consumer will establish a Search Session with the OAIS. During this Search Session the Consumer will use the OAIS Finding Aids that operate on Descriptive Information, or in some cases on the AIPs themselves, to identify and investigate potential holdings of interest. This may be accomplished by the submission of queries and the return of result sets to the Consumer. This searching process tends to be iterative, with a Consumer first identifying broad criteria and then refining these criteria based on previous search results. Once the Consumer identifies the OAIS AIPs of interest, the Consumer may provide an Order Agreement that documents the identifiers of the AIPs the Consumer wishes to acquire, and how the DIPs will be acquired from the OAIS. If the AIPs are available, an Adhoc Order will be placed. However if the AIPs desired are not yet available, an Event Based Order may be placed.

In the case of an Event Based Order, the Consumer establishes an Order Agreement with the OAIS for information expected to be received on the basis of some triggering event. This event may be periodic, such as a monthly distribution of any AIPs ingested by the OAIS from a specific Producer, or it may be a unique event such as the ingestion of a specific AIP. The Order Agreement will also specify other needed information such as the trigger event for new Data Dissemination Sessions and the criteria for selecting the OAIS holdings to be included in each new Data Dissemination Session.

The Order Agreement does not have to be a formal document. In general an OAIS will have a general pricing policy and maintain an information base of the electronic and physical mailing addresses of its users. In this case, the process of developing an Order Agreement may be no more than the completion of a World Wide Web form to specify the AIPs of interest.

4.2.3 Production Management Interaction

The interaction between the Production Facility and Production Management mirrors the interaction between the OAIS and OAIS Management. To put it another way, OAIS Management and Producer Management are separate instances of a more abstract entity, which we can simply label ``Management''.

4.2.3 Supplier Interaction

In many ways, supplier interactions with the Production Facility inversely mirror those between the OAIS and the Consumer, except that the Production Facility now serves in the role of the Consumer and the Suppliers serve in the role of the OAIS. However, the Supplier also resembles the Producer, in that the relationship is usually formalized by contract.

Thus, the first contact between the Production Facility and a Supplier is a request that the Supplier provide certain data products to the Production Facility. This contact may be initiated by the Production Facility, the Supplier, or by Production Management. The Supplier establishes a Submission Agreement with the Production Facility, which identifies the SIPs to be submitted and may span any length of time for this submission. Some Submission Agreements will reflect a mandatory requirement to provide information to the Production Facility, while others will reflect a voluntary offering of information. Even in the case where no formal Submission Agreement exists, such as a World Wide Web (WWW) site, a virtual Submission Agreement may exist specifying the file formats and the general subject matter the site will accept.

The Submission Agreement between the Supplier and the Production Facility is essentially identical in character with the one between a Producer and an OAIS. This is helpful in considering use cases for the model because we do not have to reinvent a separate kind of agreement.

5 Use Case Design Philosophy

The last section provides a broad overview of the concepts and activities of an OAIS and Production Facility. The core of this work is a collection of thirty to forty use cases. These are generic descriptions of activities and interactions that occupy the resources of the two kinds of facilities we have been describing. Because the number of use cases is fairly large, we believe it is useful to organize the use cases, more or less in accord with the rapidity with which the activities need to operate.

5.1 Enabling an Activity Description in Accord with a Hierarchical Control Description

In doing so, we have been strongly guided by the approach taken by S. B. Gershwin [1999, particularly Chapter 10 of this work]. As he suggests in the introductory material for this chapter [op. cit., p. 359], ``Most manufacturing systems are large and complex. It is natural, therefore, to divide the control or management into a hierarchy consisting of a number of different levels. Each level is characterized by the length of the planning horizon and the kind of data required for the decision-making process. Higher levels of the hierarchy typically have long horizons and use highly aggregated data, while lower levels have shorter horizons and use more detailed information. The nature of uncertainties at each level of control also varies.''

We can put this into a more concrete instantiation. For purposes of setting up the use cases, we may think of the top level planning horizon (for Preservation Planning, for example) as ten years. At this level we expect the management of either the OAIS or the Production Facility to try to make sensible plans that have annual subdivisions, with reviews on an annual basis. At the lowest practical level we consider, where CPU's are running individual jobs, the planning horizon may be as short as a few hours, with events occuring every few seconds. Clearly, the information needed to make decisions at this level is much more detailed than it is at the highest level. At the same time, it does not make sense to try to schedule ten years into the future with a precision of seconds. The range from 1 millisecond to ten years is about 3 ×10¹². It would require an extraordinary amount of storage - not to mention computation time - to try to keep track of all events over this dynamic range. Breaking down activities into a temporal hierarchy seems to be a sensible design philosophy. Accordingly, we arrange the use cases in more or less an inverse frequency ordering (lowest frequency first).

There are several key concepts we will use in our use case modeling: processes, roles, and resources. Gershwin [op. cit., p. 363] suggests that ``A resource is any part of the production system that is not consumed or transformed during the production process. Machines - both material transformation and inspection machines, workers, pallets, and sometimes tools - if we ignore wear or breakage - can be modeled as resources. Workpieces and processing chemicals cannot.

For the purposes of this ¼ [writing], we define event as a change in the discrete part of the state or a discontinuous change in a rate or parameter. ¼

An activity [or process] is a pair of events associated with a resource. The first event corresponds to the start of the activity, and the second is the end of the activity. Only one activity can appear at a resource at any time.''

We cover the relationship between processes (or activities) in more detail elsewhere in this material. The critical point is that the duration of processes is usually not deterministic. Processes always have a probability distribution for their duration. Because we want to be able to use our use case instances (or scenarios) to provide realistic cost and schedule estimates, we need to make sure that our business process description can accommodate this kind of probabilistic description.

In the subsections that follow immediately, we lay out a schedule-based structure for the use cases. We begin with a high-level breakdown of activity phases that are appropriate for a generic description of the Production Facility. Then, we provide a breakdown of use cases associated with the consumer and rogue user populations. The production use cases and the consumer ones form the driving forces on the activities of both the OAIS and the Production Facility. At the end, we consider the activities that arise out of the other functions these organizations must undertake.

5.1.1 Activity Phases of a Producer

We expect a producer to have six phases (one more than identified in the Producer-Archive Interface Methodology Abstract Standard [2002]:

·: Proposal Opportunity Seeking
·: Preliminary Phase - aiming toward establishing a preliminary agreement between the OAIS and the Producer
·: Formal Definition Phase - aiming toward establishing definitions of information transfer conditions and a preliminary schedule
·: Transfer Phase - in which the information transfer is tested and managed
·: Product Validation Phase - in which the information transfer is validated and products tested on users
·: Production Phase - in which the products are routinely produced and transferred into the OAIS

Figure 3. Large Scale Notional Production Activity Schedule

Figure 3 gives a notional schedule for production activity with this breakdown. In the subsections that follow, we expand the activities in each of the phases identified in this figure.

5.1.1.1 Proposal Opportunity Seeking

Figure 4. Notional Proposal Opportunity Seeking Schedule

5.1.1.2 Preliminary Phase

Figure 5. Notional Preliminary Phase Schedule

5.1.1.3 Formal Definition Phase

Figure 6. Notional Formal Definition Phase Schedule

5.1.1.4 Transfer Phase

Figure 7. Notional Transfer Phase Schedule

5.1.1.5 Product Validation Phase

Figure 8. Notional Product Validation Phase Schedule

5.1.1.6 Production Phase

Figure 9. Notional Production Phase Schedule

5.1.2 Activity Phases of Consumers in the Designated Community

5.1.3 Activity Phases of Information-Seeking Users

5.1.4 Activity Phases of Rogue Users

5.1.4 Activity Phases of the OAIS Administration, Production Planning, and Related Processes

5.2 Enabling Reliability, Exception Handling, and Security in the Use Case Description

If we have a stochastic duration, we can also deal quantitatively with reliability and Quality of Service (QOS) calcualtions. As a practical note, we expect each process to have a Cumulative Probability Distribution (CDF) that can be determined empirically. Given this information, it is possible to compute the duration of a group of services that have been assembled into a service composition. Technically, this means that we view the core of web choreography as assembling Directed Acyclic Graphs (DAGs) that describe the precedence relations between the services. Such a graph also provides a quantitative basis for assembling a schedule - with a distribution of uncertainty in the duration of the total service. In other words, if we attach a CDF to the duration of each process, we can, in principle, calculate the probability of providing the service within a specified time interval. This is exactly the meaning we would attach to Quality of Service - ``we agree to provide this service within 2 hours 95% of the time''.

To the extent possible, we encourage the use cases to show transaction protocols that encourage reliability. In other words, the use case instances we present - together with the attached synchronization diagrams - are intended to suggest patterns of interaction, transactions, that can substantially improve reliability. This means that the use cases should encourage the use of ``BEGIN - COMMIT OR ROLLBACK'' protocols, the keeping of transaction journals that are periodically used for automated auditing of activity, and systematic auditing of all activities in the system. This approach also provides a way of substantially reducing the time required to recover from an exception or security breach. Implicitly, we seek to encourage rapid diagnosis and fix, rather than a priori monitoring, particularly if personnel are involved.

We can also see that a CDF can include the probability of an exception occuring. From the standpoint of a DAG, an exception prunes the original graph of completed processes and those that cannot be completed. The exception also adds additional processes that must be grafted onto the revised graph. We then expect that the revised graph will provide the information to calculate a new duration with the exception handled. The probability of raising an exception gives us a convenient way of quantifying the reliability of the system. The revised graph gives us a way of systematically thinking about exception handling policies. Basically, the graph that describes the way the exception is handled gives us a systematic description of how the system will handle problems.

From this perspective, we can use the use cases to assist in security analysis. Several of the use cases deal with attacks by various categories of rogue users. We can use a quantitative model of user activities to estimate the frequency of attacks. Furthermore, the approach we are taking to system design encourages us to lay out potential vulnerabilities and to develop means of reducing the risk of successful attack.

5.3 Developing a Use Case Test Capability

The material we are describing is very complex. It involves both the interaction of computer services and organizations of people. Hopefully, we can make the use cases clear enough that they can help avoid major difficulties in more detailed design stages of system development. At the same time, it is difficult to gain experience with the impact of design choices. While modern approaches to system development do allow some flexibility to redesign the system, it would be helpful to be able to simulate the system - and to adjust the design on the basis of the simulation results.

Accordingly, we will try to provide a way of converting the use case instances to a form where the operation of the system can be simulated. We will also find that this approach (particularly when applied with Gershwin's suggestion of using a hierarchical control philosophy) lends itself to ensuring that the statistics collected by a management information system will be useful in controlling the system. This kind of test capability is also important for quantitative evaluation of system reliability.

File translated from T_EX by T_TH, version 3.01.
On 5 Jun 2003, 17:48.