[37]
![Forwards to [38]](../all_the_pictures/arrow_right.jpg)
Addressing Survivability in the Composable Replaceable Security Services Infrastructure
Roshan Thomas and Rich Feiertag
TIS Labs at Network Associates
8000 Westpark Drive, Suite 600
McLean, VA 22102-3105
{rthomas, feiertag}@tis.com
Background
The Composable Replaceable Security Services (CRSS) project is a DARPA-funded effort currently underway at Trusted Information Systems (TIS) Laboratories (see related information at http://www.tis.com/research/architecture/arc_composable.html). The goal of the project is to develop a prototype security infrastructure that can support the next generation of survivable distributed systems.The next generation of survivable distributed systems will include components that are autonomous, self-contained units of portable, transportable, and mobile software. For secure operation, interactions between these components must be controlled and regulated by distributed security mechanisms. The security mechanisms must have these same properties, so that security enforcement does not constrain these software components' flexibility of operation-- the same flexibility that is necessary for the system to achieve its goals of survivability and adaptability. These services will form an infrastructure that will be the security middleware used by applications in a manner independent of various OS and networking technologies.
To enable survivability, CRSS security services embody various properties such as independence, multiplicity, fault-tolerance, variability, and composability. Thus each CRSS service has multiple implementations that can coexist at the same time. For instance, the cryptographic service may have implementations for the RSA encryption as well as for the DES encryption. Second, services are designed to be composable. As systems evolve, the composability allows administrators to update individual services as needed without affecting the performance of the other services. Also, composability allows more complex services to be constructed from a few simple basic services and possibly permits these more complex services to be constructed in different ways using different basic services. Finally, each service is designed to be a part of a distributed environment. Every service consists of two components: the service framework and the collection of service providers that implement the desired service. The service framework accepts requests from applications or service providers and directs them to appropriate service providers. For example, for the cryptographic service, when an application requests encryption service, the service framework accepts the request and may forward it to the RSA encryption service provider.
Both the service framework and the collection of service providers run in a distributed environment. The service framework can fulfill a service request by invoking a service provider that resides on the local host or a remote host. Similarly, the framework itself is implemented in a redundant distributed manner allowing, for example, the framework on a host to continue operating even if its local copy of the database of service providers is corrupted.
Survivability in CRSS

Figure 1. Reference architecture for a survivability service.
We started investigating survivability for the CRSS project by first surveying existing approaches and architectures. This is turn led us to formulate a reference architecture for a survivability service. This architecture represents to a great extent a synthesis and unification of various existing concepts and approaches to survivability. Figure 1 shows the reference architecture. It proposes generic components that we believe will be present and necessary in any system (including CRSS) supporting a full-fledged survivability service. Clearly, these components have to be instantiated for specific environments and their complexity will vary from one instantiation to another.
The architecture contains various processing modules (components) and these in turn use various pieces of information (data sets). We now briefly describe them.
- Survivability Manager (SM). This module is the heart of the survivability service subsystem. Its functions include the following:
- Overall coordination and management of the survivability service.
- Continuous monitoring of the system survivability health.
- Survivability analysis and interpretation.
- Interfacing with modules outside the survivability system such as those responsible for system administration.
- Sending of notifications, alerts and alarms.
The Survivability Manager (SM) interfaces with the Service Monitor, Configuration Manager, Recovery Manager and Resource Manager (to be described shortly). The SM also uses the following data sets (files):
- Survivability Objectives. This data set contains information on the overall survivability objectives that the survivability service is to strive for. It identifies essential and nonessential services along with acceptable operating parameters. We may also want to express specific objectives to be accomplished under specific situations such as various failure and attack scenarios.
- Survivability Records. This data set contains historical and statistical information relating to the survivability performance of various CRSS service providers. Minimally, this will include an incident record for every failure or survivability-relevant event, statistical totals and averages for various failures and where possible, future projections of survivability performance.
- Failure Model. This data set contains detailed information on recognizing and analyzing failures. We anticipate it will contain at least the following. (i) Failure dictionary that contains detailed definitions of various failure modes (types) categorized by the various service types. (ii) Failure propagation models that enable one to model the impact of each failure on other services and how they may cause other cascading failures. Various quantitative and probabilistic methods may be integrated into these models.
- System Sensors. These sensors are responsible for observing and reporting low-level events that relate to the failure and recovery service providers. Strictly speaking, these sensors are external to the survivability architecture. However, they provide vital inputs to the survivability service and for completeness and clarity we have included them in an abstract form in our architecture.
- Service Monitor. This component in our architecture accepts low level signals from the various system sensors. Once a signal is received, the service monitor will attempt to distinguish and classify the signal. Once the low-level signal is identified, the service monitor will communicate to the Survivability Manager a high level version of the incident causing the signal. This may include information such as the origin of the signal, time of signal, possible cause etc.
- Configuration Manager. This component is responsible for maintaining an overall system configuration that can be used to meet the survivability objectives. Configurations are mappings of services to survivability objectives and system resources. There are basically two categories of configuration information maintained by the Configuration Manager, namely (i) Current configuration which consists of information on the current configuration of the system. (ii) Alternate configurations that are to be used when the current configuration is no longer feasible. A CRSS system may have several alternate configurations standing by as backups. These alternate configurations may be prioritized by how well they meet survivability objectives. Rather than using precomputed alternate configurations, a system may also dynamically compute a new alternate configuration based on certain rules and parameters.
- Resource Manager. The resource manager maintains extensive information on available resources in the operating environment. This information is constantly used to fine tune the survivability configuration and when possible improve the survivability posture of the system. For every resource, we maintain its properties, as well as information on includes information on availability, reliability (including past, present and projected) and utilization.
- Recovery Manager. This component maintains information required for recovery after the failure of a service provider. Recovery involves two separate actions, namely to clean-up after a failed provider and to restore and update a replacement provider. At the global level, recovery information may include actions to be taken if a group of related failures occurred. These groups of failures may be characterized in more abstract terms and the recovery action may span several service providers.
One of the first challenges we are addressing is the issue of capturing and modeling survivability objectives. Information survivability characteristics of a system are made of several quality attributes such as reliability, fault-tolerance, security, integrity, availability, and safety. As such, there exists no single or universal measure to express or assess the survivability capabilities of a system. Thus our perspective is that survivability specifications need to be expressed in terms of the subsets of the total functionality the system can deliver under various conditions/scenarios. These conditions will account for various failure scenarios. It is also important to realize that the various quality attributes may have to be expressed as a combination of quantitative (numeric) and qualitative measures. Once we understand how to specify survivability requirements, we are hoping to use various fault and failure propagation models to model the impact failures will have on active services. We will also look at ideas for building tools to support survivability modeling and analysis.
[37]
![Forwards to [38]](../all_the_pictures/arrow_right.jpg)





