CERT
Back to [29]   [30]    Forwards to [30]



Complexity, Open Standards and Survivability

Lui Sha
Department of Computer Science
University of Illinois
Tom Longstaff
Software Engineering Institute
Carnegie-Mellon University

July 28, 1998

1.0 Introduction

     Information system survivability can be modeled as the computational complexity levels faced by administrators and intruders in a race to gain control of the system. This complexity view of information system survivability (assurance) provides us with a way to quantitatively measure a system's survivability and to analyze the effectiveness of protective, monitoring, and reactive measures.

     Conceptually, the intrinsic complexity to gain the control of a system or a subsystem is defined as the irreducible computational complexity faced by one with the full knowledge of the system. These are the minimal steps that one has to take before the control is granted. Relative complexity is the ratio of the complexity faced by intruders vs. the complexity faced by administers. Relative complexity depends upon secret information known to administrators but supposedly unknown to intruders, for example, a password or an encryption key.

     The fundamental and stable properties of an information system should be protected by using high intrinsic complexity. This is the only way to protect against inside-attackers who account for a significant percentage of serious intrusions. A great example of a critical information system protected by the principle of intrinsic complexity is the US Constitution. The numbers of steps that have to be taken to make any change to the constitution is enormous. There is no short cut to change the Constitution, no matter who you are and what you know. An Internet example is the change of Internet protocols through the mechanism of the IETF. The process of change is such that no single company or government can manipulate the protocols to its own benefit secretively.

     Intrinsic complexity protection is robust but inflexible. For system properties that we want to be able to change quickly, we must rely on relative complexity. Relative complexity relies on the administrator of a system having some knowledge, product, or physical characteristic that must be discovered or replicated by the adversary in order to have an equal ability to manage change. Traditionally, these secrets have been developed to short-circuit the ability to manage complexity for the specific purpose of defeating an adversary.

     In information systems, passwords give the owner of the password the ability to produce one-way hash values with very few steps (low complexity). For an adversary without the password, the hash can only be generated with a large amount of complexity (as in an exhaustive search of the key space). In general, a system's relative complexity degrades quickly over time if it remains static. Since most traditional security mechanisms rely on relative complexity to prevent the adversary from breaching security, it can be said that security also degrades quickly over time if the protection mechanism remains static over time.

2.0 Standardization and Survivability

     As Paul Strassman mused in the 94 keynote speech at the Software Technology Conference, "US computer networks might have been resilient to sabotage [to date], because the networks are chaotic." The many diverse, incompatible and rapidly evolving systems accidentally created a high degree of intrinsic complexity that resists large scale intrusion. Of course, they also created great difficulty for system development and maintenance.

     This is rapidly changing. There has been a great movement towards the use of open standard based COTS components. The introduction of standardized components reduces diversity and thus the (accidental) intrinsic complexity. It lowers the cost of system development and administration, but it also reduces the barrier to intrusions. The danger of "break into one and break into all" is real and growing.

     Nevertheless, we cannot and should not turn the clock back to the era of chaos. We need ways to improve information system assurance by the appropriate use of intrinsic and relative complexities. First of all, a fundamental design decision in high assurance system is to identify the important and stable system properties that should be immutable to attacks from inside and outside. For those properties, we want to make sure that the path to change is guarded by a high degree of intrinsic complexity that does not rely on secrets that can be compromised by a few individuals.

     A high degree of intrinsic complexity can be generated by the use of diversity in system construction and/or the use of distributed consensus protocols. Diversity is less susceptible to design and implementation flaws than consensus protocols at the cost of maintenance complexity. In the final analysis, a system that is easily compromised by our adversaries is not worthwhile to maintain. Carefully engineered diversity must remain an important design requirement in critical national infrastructures, especially in C4ISR systems.

     We also need to maintain a high degree of relative complexity. There is nothing that reduces the complexity of breaking into a system faster than the discovery of security flaws in design and implementation. With an inevitable reduction of the degree of diversity in our system due to standardization, the search space for flaws is greatly reduced and the scope of damage due to the discovery of a flaw greatly increases. It is difficult to eliminate all the software flaws, security related or otherwise. One effective way to maintain a high degree of relative complexity is to disrupt the intruders' learning process. Consider the following example.

     Example: a door is protected by 10 toy combination locks. Each lock has only 10 numbers and it opens when it is set to the right number. For example, lock one will be opened if the number is set to 7, and lock two will be opened if the number is set to 3. There is a team of 10 intruders working together in parallel, trying to open the door. Each intruder can test 1 number of a lock per second.

     What is the probability the door is opened in the first trial? Since each intruder has a probability of 0.1 to succeed, the probability of opening the door at the first trial is (0.1)10. Opening this door at the first trial is harder than wining a lottery. Unfortunately, this system can be quickly defeated by a learning process. Each lock has only 10 numbers to choose from. Since each intruder can test 1 number in a second and there are 10 intruders working in parallel, in 5 seconds the probability of opening any lock is 0.5 and the probability of opening the door increases to (0.5)10, still a small number. However, at the end of 10 seconds, all the 10 numbers in each lock is tested and the door will be opened with probability 1.

     As is, this is a very poor security system, because the search problem can be decomposed into learning each lock's key digit independently. The standard remedy in the trade of lock-smiths is to defeat the "divide and conquer" strategy by adding a interlocking mechanism. The interlock will not open the door and gives no information on the effect of individual dial settings, unless all ten dial-settings are correct. This forces intruders to search in a 1010 space rather than in a 10x10 space. Interlock is a special case of inter-dependency. Unstructured inter-dependency is a "curse" for users because it creates complexity for development and maintenance. This is how a software system becomes unmanageable after a series of ad hoc changes. On the other hand, we can "cast" a powerful "curse" on intruders by creating a structured and preferably hidden inter-dependency between subsystems, especially between distributed components.

     Another technique to maximize the complexity faced by intruders is to "erase" (or equivalently confuse) what they have learned during the search process. For example, we can shuffle the locks in example above:

     Step 1: The 10 locks are put in a randomized lock position.

     Step 2: after 1 second, randomly replace the 10 locks in use from a sufficiently large bag of the toy combination locks.

     Since the locks are shuffled randomly and they all look alike, what has been learned in the previous trial is now useless. Thus, each intruder always has a probability of 0.1 to succeed at any trial. The probability of all 10 locks being opened at any trial is now stuck at p = (0.1)10 = 10-10. The average time for intruders to succeed in opening all the 10 locks is (1/p) seconds. This translates to 1010 seconds ~ 317 years. From 10 seconds to 317 years is a dramatic improvement.

     Since the administrator knows the numbers opening each lock and where each lock is assigned to, his/her complexity of opening the door remains low. In short, we can keep a high degree of relative complexity with simple devices, provided that we can find a way to defeat the intruders' learning process.

     A characteristic behavior of intruders is to search for enabling information that they are not authorized to know. This simple example shows the powerful effect of learning as well as the effect of interrupting the learning process and thus keeping the complexity faced by intruders high. Technologies for dynamic component binding allow us to shuffle different vendors' products for the same open standards without interrupting the system operation. Like shuffling the locks, this negates the intruders' attempt to discover and use holes in any specific design, implementation and configuration. From the users' viewpoint, the infrastructure is stable and easy to use. From the adversaries' viewpoint, the infrastructure is constantly mutating in a seemingly random way, automatically replacing components compromised by them, rendering their knowledge gained by probing obsolete. Two forms of mutation can be explored: mutation of critical parts of the infrastructure itself and the mutation of sentinels to the infrastructure components.

3.0 Summary and Conclusion

     With the definitions of intrinsic and relative complexity, the survivability of a system can be measured in terms of the gap between administrator and adversaries' ability to handle complexity needed to gain control of information systems. This model infers that

1) standardization lowers the cost of system development and administration, but it also reduces the barrier to intrusions. The danger of "break into one and break into all" is real and growing.
2) as technology progresses and time passes by, the relative complexity of a static system will decrease and makes our system more vulnerable.
     To counter the threat of intrusions from outside and inside, the fundamental and stable properties of an information system should be protected by a high degree of intrinsic complexity. In addition, we need to keep a high degree of relative complexity for properties with short time constants. To keep the relative complexity high, it is important to create a large search space for intruders. One effective way is to create structured inter-dependency between components. Furthermore, we can keep the relative complexity high by disrupting the intruders' learning process. One way is to use randomized dynamic binding of system components.

Acknowledgment: the authors wish to thank Howard Lipson for his comments and encouragement and thank Mark Firth for his suggestion for the improvement of the random shuffling algorithm in the example.



Back to the Table of Contents
Back to [29]   [30]    Forwards to [31]