CERT

Back to [24]   [25]    Forwards to [26]



Survivability Toolset
for
Object Service Architectures

David Langworthy
David Wells
Object Services and Consulting
{del,wells}@objs.com

This research is sponsored by the Defense Advanced Research Projects Agency and managed by Rome Laboratory under contract F30602-96-C-0330. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, Rome Laboratory, or the United States Government.

The military of the future will increasingly rely upon "information superiority" to dominate the battlespace. Achieving information superiority will require software applications that are far larger, far more complex, and far more distributed than comparable applications in existence today. The size and complexity of such systems makes them highly vulnerable to the loss or degradation of hosts, networks, or processes due to physical and information warfare attacks, hardware and infrastructure failures, and software errors. Since loss or degradation is inevitable, it is essential that such systems behave well when this occurs. A system that can repair itself or degrade gracefully to preserve as much critical functionality as possible in the face of attacks and failures is called a survivable system.

We are developing software mechanisms to ensure the survivability of such systems that go well beyond the traditional approaches of fault tolerance and replicated services. Those techniques, while valuable, are in themselves insufficient to respond to the full range of problems that can face a system since they create "islands of availability" but do nothing to address system-wide concerns. The following two examples illustrate the kinds of issues addressed by our survivability work.

Mission planning for a sortie in regional conflict with multiple coalition partners requires many resources, among them a map server. Assume the local map server becomes unavailable and that the backup map server is located at a remote location and reachable only over slow communication lines. There is a coalition map server available with good performance characteristics, but its data is considered to be of lower quality and the labels are specified in a foreign language. Under many circumstances, it would be desirable to use the coalition map server, but existing systems cannot switch an active connection and are limited to exact substitutes for a service. A survivable system needs to be able to switch compatible services in an established connection and substitute acceptable alternatives.

The ability to substitute services is only one aspect of survivability. Consider an information warfare attack focused on NT machines. As the NT machines begin to fail, essential processing must be moved over to UNIX machines. This in turn requires terminating or delaying non essential processing on those machines. However, there are many different threats, each with its own optimal response, and more than one threat may materialize at the same time. Addressing this in an ad hoc manner is not possible. A survivable system must be able to dynamically adapt to the threats in its environment to reallocate essential processing to the most robust resources.

Approach

This project is developing software mechanisms to make military and commercial software applications based on the popular Object Services Architecture (e.g., OMG's CORBA) model far more survivable than is currently possible, while at the same time maintaining the flexibility and ease of construction that characterizes OSA-based applications.

The keys to making systems survivable are:

We have developed a comprehensive approach to satisfying these requirements consisting of: a survivable object abstraction in which survivable services and applications can be developed; a collection of models describing capabilities, needs, and threats; and the architecture of a Survivability Service that manages the survival of systems constructed in accordance with the survivable object abstraction. Our goal is to demonstrate the feasibility of this approach by building and demonstrating a prototype Survivability Service consisting of all of the above capabilities except monitoring, which we assume will be developed elsewhere. To maximize the utility of the Survivability Service, we are leveraging related work such as fault tolerance techniques, OMG CORBA & Object Services, failure detectors, and various system models. We plan to propose the Survivability Service specification to the Object Management Group and to make the prototype available as a reference implementation.

Recent Accomplishments

We completed the definition of the survivable object abstraction that we began last year. The survivable object abstraction is used by developers to build survivable systems and by the Survivability Service while maintaining such systems The abstraction separates an application's functionality, logical connectivity, and survivability requirements. The abstraction consists of two major parts: a composition model for OSAs that defines legitimate (static) configurations, and an evolution model for OSA that defines strategies for migrating from one legitimate configuration to another and conditions that must be met in order to apply a particular transform. The completed survivable object abstraction makes it possible to define and build systems that can be reliably reconfigured by the Survivability Service. It also makes it possible to transparently tailor an application's configuration and resource use to allow it to operate under a variety of "survivability realms" defined by the various environments into which the application might be deployed. This allows applications to operate in configurations that were unanticipated at application development time, and should substantially increase the lifetime and utility of such applications.

We completed a second iteration of our Survivability Service architecture based on implementation experience. The architecture is compatible with existing research directions (e.g., monitors and replication) and with industry trends (particularly OMG CORBA, which is at the heart of many DOD programs). A major change from last year's architecture is that the internal modularization of the Survivability Service has been improved to allow for alternative resource allocation strategies and metrics. By ensuring that the architecture is compatible with other efforts and is open, we make it possible to grow the Survivability Service in a number of directions.

We implemented several key components of the Survivability Service prototype as part of our effort to both build a useable survivability capability and demonstrate the soundness of our approach. We now have:

We also anticipate being able to control the use replication to make individual services more robust, but this is dependent on the stability and availability of an externally provided replication subsystem.

We investigated the relationship between quality of service and survivability and developed metrics that integrate the two concepts. This is important, since QoS work attempts to guarantee a particular level of service delivery assuming resources are not lost, while survivability attempts to cope with loss and degradation. Both are needed for systems to function properly, but if the respective allocation strategies are pursuing different goals using different metrics as objective functions, it is unlikely that a unified solution will emerge. Our metrics partially solve that problem.




Back to the Table of Contents
Back to [24]   [25]    Forwards to [26]