Survivability requirements can vary substantially depending on system scope, criticality, and the consequences of failure and interruption of service. Categories of requirements definitions for survivable systems include function, use, development, operation, and evolution. In this section, we present what survivability requirements are, how these requirements can be expressed, and their impact on system survivability.
The new paradigm for system requirements definition and design is characterized by distributed services, distributed logic, distributed code (including executable content), distributed hardware, a shared communications and routing infrastructure, diminished trust, and a lack of unified administrative control. Assuring the survivability of mission-critical systems developed under this new paradigm is a formidable high-stakes effort for software engineering research. This effort requires that traditional computer security measures be augmented by new and comprehensive system survivability strategies.
2.1 Expressing Survivability Requirements
The definition and analysis of survivability requirements is a critical first step in achieving system survivability [Linger 97a]. Figure 2 depicts an iterative model for defining these requirements. Survivability must address not only requirements for software functionality, but also requirements for software use, development, operation, and evolution. Thus, five types of requirements definitions are relevant to survivable systems in the model. These requirements are discussed in detail in the following subsections.

Figure 2
: Requirements Definition for Survivable SystemsSystem/Survivability Requirements: The term system requirements refers to traditional user functions that a system must provide. For example, a network management system must provide functions to enable users to monitor network operations, adjust performance parameters, etc. System requirements also include non-functional aspects of a system, such as timing, performance, and reliability. The term survivability requirements refers to the capabilities of a system to deliver essential services in the presence of intrusions and compromises and to recover full services.
Figure 3 depicts the integration of survivability requirements with system requirements at node and network levels.

Figure 3: Integrating Survivability Requirements with System Requirements
Survivability requires that system requirements be organized into essential services and non-essential services. Essential services must be maintained even during successful intrusions; non-essential services are recovered after intrusions have been handled. Essential services may be stratified into any number of levels, each embodying fewer and more vital services as the severity and duration of intrusion increases. Thus, definitions of requirements for essential services must be augmented with appropriate survivability requirements.
As shown in Figure 2, survivable systems may also include legacy and acquired COTS components that were not developed with survivability as an explicit objective. Such components may provide both essential and non-essential services and may require functional requirements for isolation and control through wrappers and filters to permit their safe use in a survivable system environment.
Figure 3 shows that survivability itself imposes new types of requirements on systems. These new requirements include the resistance to, recognition of and recovery from intrusions and compromises, and adaptation and evolution to diminish the effectiveness of future intrusion attempts. These survivability requirements are supported by a variety of existing and emerging survivability strategies, as noted in Figure 2 and discussed in more detail below.
Finally, Figure 3 depicts emergent behavior requirements at the network level. These requirements are characterized as emergent because they are not associated with particular nodes, but rather emerge from the collective behavior of node services in communicating across the network. These requirements deal with the survivability of overall network capabilities (e.g., capabilities to route messages between critical sets of nodes regardless of how intrusions may damage or compromise network topology).
We envision survivable systems that are capable of adapting their behavior, function, and resource allocation in response to intrusions. For example, when necessary, functions and resources devoted to non-essential services could be reallocated to the delivery of essential services and to intrusion resistance, recognition, and recovery. Requirements for such systems must also specify how the system should adapt and reconfigure itself in response to intrusions.
Systems can exhibit large variations in survivability requirements. Small local networks may require few or no essential services and recovery times measured in hours. Conversely, large-scale networks of networks may require a core set of essential services, automated intrusion detection, and recovery times measured in minutes. Embedded command and control systems may require essential services to be maintained in real time and recovery times measured in milliseconds.
The attainment and maintenance of survivability consume resources in system development, operation, and evolution. The resources allocated to a system’s survivability should be based on the costs and risks to an organization associated with the loss of essential services.
Use/Intrusion Requirements: Survivable-system testing must demonstrate the correct performance of essential and non-essential system services as well as the survivability of essential services under intrusion. Because system performance in testing (and operation) depends totally on the system’s use, an effective approach to survivable-system testing is based on system-use scenarios derived from system-use models [Mills 92] [Trammell 95].
System-use models are developed from use requirements that specify use environments and scenarios of system use. Use requirements for essential and non-essential services must be defined in parallel with system and survivability requirements. Furthermore, intruders and legitimate users must be considered equally. Intrusion requirements that specify intrusion-use environments and scenarios of intrusion use must be defined as well. In this approach, intrusion use and legitimate use of system services are modeled together.
Figure 4 depicts the relationship between legitimate and intrusion use. Intruders may engage in scenarios beyond legitimate scenarios, but may also employ legitimate use for purposes of intrusion if they gain the necessary privileges.

Figure 4: The Relationship Between Legitimate and Intrusion Usage
Development Requirements: Survivability places stringent requirements on system development and testing practices. Inadequate functionality and software errors can have a devastating effect on system survivability and provide opportunities for intruder exploitation. Sound engineering practices are required to create survivable software.
The following five principles (four technical and one organizational) are example requirements for survivable-system development and testing practices:
- Precisely specify the system’s required functions in all possible circumstances of system use.
- Verify the correctness of system implementations with the system’s functional specifications.
- Specify the use of system functions in all possible circumstances of system use, including intruder use.
- Test and certify the system based on function use and statistical methods.
- Establish permanent readiness teams for system monitoring, adaptation, and evolution.
Sound engineering practices are required to deal with legacy and COTS software components as well.
Operations Requirements: Survivability places demands on requirements for system operation and administration. These requirements include defining and communicating survivability policies, monitoring system use, responding to intrusions, and evolving system functions as needed to ensure survivability as usage environments and intrusion patterns change over time.
Evolution Requirements: System evolution responds to user requirements for new functions. However, this evolution is also necessary to respond to increasing intruder knowledge of system behavior and structure. In particular, survivability requires that system capabilities evolve more rapidly than intruder knowledge. This rapid evolution prevents intruders from accumulating information about otherwise invariant system behavior that they need to achieve successful penetration and exploitation.
2.1.1 Requirements Definition for Essential Services
The preceding discussion distinguishes between essential and non-essential services. Each system requirement must be examined to determine whether it corresponds to an essential service. The set of essential services must form a viable subsystem for users that is complete and coherent. If multiple levels of essential services are required, each set of services provided at each level must also be examined for completeness and coherence. In addition, requirements must be defined for making the transition to and from essential-service levels.
When distinguishing between essential and non-essential services, all of the usual requirements-definition processes and methods can be applied. Elicitation techniques such as those embodied in Software Requirements Engineering can help to identify essential services [Ebert 97]. Tradeoff and cost/benefit analysis can help to determine the sets of services that sufficiently address business survivability risks and vulnerabilities. Provisions for tracing survivability requirements through design, code, and test must be established. As previously mentioned, simulation of intrusion through intruder-use scenarios are included in the testing process.
2.1.2 Requirements Definition for Survivability Services
After specifying requirements for essential and non-essential services, a set of requirements for survivability services must be defined. These services can be organized into four general categories: resistance, recognition, recovery, and adaptation and evolution. These survivability services must operate in an intruder environment that can be characterized by three distinct phases of intrusion: penetration, exploration, and exploitation.
Penetration Phase. In this phase, an intruder attempts to gain access to a system through various attack scenarios. These scenarios range from random inputs by hobbyist hackers to well-planned attacks by professional intruders. These attempts are designed to capitalize on known system vulnerabilities.
Exploration Phase. In this phase, the system has been penetrated and the intruder is exploring internal system organization and capabilities. By exploring, the intruder learns how to exploit the access to achieve intrusion objectives.
Exploitation Phase. In this phase, the intruder has gained access to desired system facilities and is performing operations designed to compromise system capabilities.
Penetration, exploration, and exploitation create a spiral of increasing intruder authority and a widening circle of compromise. For example, penetration at the user level is typically a means to find root-level vulnerabilities. User-level authorization is then employed to exploit those vulnerabilities to achieve root-level penetration. Finally, compromise of the weakest host in a networked system allows that host to be used as a stepping-stone to compromise other more protected hosts.
Requirements definitions for resistance, recognition, recovery, and adaptation and evolution services help select survivability strategies to deal with these phases of intrusion. Some strategies, such as firewalls, are the product of extensive research and development and currently are used extensively in bounded networks. New survivability strategies are emerging to respond to the unique challenges of unbounded networks.
Resistance Service Requirements. Resistance is the capability of a system to deter attacks. Resistance is thus important in the penetration and exploration phases of an attack, before actual exploitation. Current strategies for deterring resistance include the use of firewalls, authentication, and encryption. Diversification is a resistance strategy that will likely become more important for unbounded networks.
Requirements for diversification must define planned variation in survivable system function, structure, organization, and the means for achieving it. Diversification is intended to create a moving target and render ineffective the accumulation of system knowledge as an intrusion strategy. Diversification also eliminates intrusion opportunities associated with multiple nodes that execute identical software and typically exhibit identical vulnerabilities. Such systems offer tempting economies of scale to intruders, since when one node has been penetrated, all nodes can be penetrated. Requirements for diversification can include variation in programs, retained data, and network routing and communication. For example, systematic means can be defined to randomize software programs while preserving functionality [Linger 97b].
Recognition Service Requirements. Recognition is the capability of a system to recognize attacks or the probing that precedes attacks. Reacting or adapting during an intrusion is central to the capacity of a system to survive an attack that cannot be completely repelled. To react or adapt, the system must first recognize it is being attacked. In fact, recognition is essential in all three phases of attack.
Current strategies for attack recognition include both state-of-the-art intrusion detection and mundane but effective techniques such as logging applications and systems, administrative systems, frequent auditing, and follow-up investigations of reports generated by ordinary error detection. Advanced intrusion-detection techniques are generally of two types: anomaly detection and pattern recognition. Anomaly detection is based on models of normal user behavior. These models are often established through statistical analysis of system-use patterns. Deviations from normal system-use patterns are flagged as suspicious. Pattern recognition is based upon models of intruder behavior. User activity that matches a known pattern of intruder behavior raises an alarm.
Requirements for future survivable networks will likely employ additional strategies such as self-awareness, trust maintenance, and black-box reporting. Self-awareness is the process of establishing a high-level semantic model of the computations that a component or system is executing or has been asked to execute. A system or component that understands what it is being asked can refuse requests that would be dangerous, compromise a security policy, or adversely impact the delivery of minimum essential services.
Trust maintenance is achieved by a system through periodic queries among its components of (e.g., among the nodes in a network) to continually test and validate trust relationships. Detection of signs of intrusion would trigger an immediate test of trust relationships.
Black-box reporting is a dump of system information that can be retrieved from a crashed system or component for analysis to determine the cause of the crash (e.g., design error or specific intrusion type). This analysis can help to prevent other components from suffering the same fate.
A survivable-system design must include explicit requirements for recognition of attack. These requirements ensure the use of one or more of the preceding strategies through the specification of architectural features, automated tools, and manual processes. Since intruder techniques are constantly advancing, recognition requirements should be frequently reviewed and continuously improved.
Recovery Service Requirements. Recovery is a system’s ability to restore services after an intrusion has occurred. Recovery also contributes to a system’s ability to maintain essential services during intrusion.
Requirements for recoverability are what most clearly distinguish survivable systems from systems that are merely secure. Traditional computer security leads to the design of systems that rely almost entirely on hardening (i.e., resistance) for protection. Once security is breached, damage may follow with little to stand in the way. The ability of a system to react during an active intrusion is central to its capacity to survive an attack that cannot be completely repelled. Recovery is thus crucial during the exploration and exploitation phases of intrusion.
Recovery strategies in use today include replication of critical information and services, use of fault-tolerant designs, and incorporation of backup systems for hardware and software. These backup systems include master copies of critical software in isolation from the network. Some systems, such as large-scale transaction processing systems, employ elaborate, fine-grained transaction roll-back processes to maintain the consistency and integrity of state data.
Adaptation and Evolution Service Requirements. Adaptation and evolution are critical to maintaining resistance to ever-increasing intruder knowledge of how to exploit otherwise unchanging system functions. Dynamic adaptation permanently improves a system’s ability to resist, recognize, and recover from intrusion attempts. For example, an adaptation requirement may be an infrastructure that enables the system to inoculate itself against newly-discovered security vulnerabilities by automatically distributing and applying security fixes to all network elements. Another adaptation requirement may be that intrusion detection rule sets are updated regularly in response to reports of known intruder activity from authoritative sources of security information, such as the CERT Coordination Center.
Adaptation requirements ensure that such capabilities are an integral part of a system’s design. As in the cases of resistance, recognition, and recovery requirements, the constant evolution of intruder techniques requires that adaptation requirements be frequently reviewed and continuously improved.
[Title] [Chapter 1] [Chapter 2] [Chapter 3] [Chapter 4] [Chapter 5] [Bibliography] [Glossary] [DTIC]





