| ![]() ![]() |
Systematic Generation of Stochastic Diversity as an Intrusion Barrier in Survivable Systems SoftwareRichard C. LingerSoftware Engineering Institute Carnegie Mellon University Email: rlinger@sei.cmu.edu
Contents:
Abstract
1. Survivable systemsPervasive societal dependence on large-scale network systems in virtually every area of business, government, and defense magnifies the consequences of intrusions and failures. As a result, the survivability and security of such systems is receiving increasing attention. Survivable systems are characterized by the ability to provide essential services even in the presence of intrusions and compromises, and to recover full services in a timely manner [1]. Survivability requires effective means to resist, recognize, and recover from intrusions, and to adapt system behavior to reduce the effectiveness of future intrusion attempts [6]. One method to improve resistance to intrusion is diversification of system software, to increase the cost and difficulty of identifying vulnerabilities. This paper describes a systematic technique for achieving software diversification.
2. The idea of stochastic diversificationNetwork intrusion strategies rely on exploitation of information about networks and their nodes. Such information may be acquired systematically, through extensive usage and analysis, or accidentally, through ad hoc intrusion attempts. The effectiveness of the information acquisition process depends on the perceived complexity of the artifacts analyzed. The usefulness of the information depends on the stability and predictability of the system being attacked. Today's intrusion strategy will work tomorrow only if the system continues to exhibit identical properties. The time value of intrusion information thus depends to a great extent on an absence of variation in system properties.The capability to acquire and apply information for defining intrusion strategies can be severely limited by reducing predictability and increasing variability and perceived complexity in systems, while preserving correct function and acceptable performance. The process of stochastic diversification provides systematic and dependable methods for achieving increased resistance to intrusion through random variation and obscuration of survivable system properties. Stochastic diversification can increase the effort to acquire information for intrusions to prohibitive levels, and help make the information itself of limited value.
3. Identifying software vulnerabilities through code analysisVulnerabilities in survivable systems software that can be exploited for attack and intrusion are often revealed through code reading and analysis. Such analysis depends on discovery of true software behavior, as revealed through control logic, data structures, operations, and comments. The understanding sought by intruders is similar to that required by maintenance programmers seeking knowledge sufficient to make reliable changes. In fact, software developers create systematic and understandable code and document its semantics for the very purpose of simplifying maintenance activities. Of course, such information may be readily available to intruders as well.However, program understanding by intruders can be made extremely difficult, if not impossible, and hardly worth the cost and effort. By transforming a program into a sea of logical complexity that effectively obscures its behavior and design while maintaining correct function and performance, a significant barrier to intrusion can be created. Such transformations can be carried out in a systematic process based on a reinterpretation of the Structure Theorem [3]. In Section 4, we first describe the theorem and illustrate its original intended application for program structuring to reduce complexity and increase intellectual control in software maintenance. In Section 5, we reverse the process to increase complexity through stochastic diversification.
4. The structure theorem and the program structuring processThe constructive proof of the Structure Theorem defines a process for transforming confusing and arbitrary program logic into function-equivalent structured form for improved understandability and maintenance. The resulting programs are expressed as nested and sequenced single-entry, single-exit control structures drawn from a basis set composed of sequence, ifthen, ifthenelse, case, whiledo, dountil, etc.Such programs can be read, understood, and documented in a systematic manner. The structuring process is independent of program subject matter, and is the basis for structuring engines that automatically carry out the required transformations. As an illustration, consider the miniature program depicted in graphic form in Figure 1 and text form in Figure 2 (scalar data items are integers, table is an array of integers), intended to represent how a program might look after being patched and repatched in maintenance activities. Imagine the difficulties of understanding such arbitrary control flow in a larger program. Key steps in application of the constructive proof of the theorem are illustrated below. Each step is a mechanical process, with no creativity or invention required, and is applied in the same manner to any program being structured. This stepwise process has been implemented in the development of commercial automated structuring products aimed at improving conditions of software maintenance [4]. In particular, COBOL structuring systems have been successfully applied in a variety of environments. For full details of the structuring process, see [3]. Figure 1. Arbitrary control flow in a miniature program
The constructive proof of the theorem specifies the following steps in the program structuring process to create a function-equivalent structured version:
Figure 2. The miniature program in text form
proc
Step 1. Node Numbering: Number the function and predicate nodes of the original program in any order. Number the exit line 0. This step is shown in Figure 1. Step 2. Sequence and Ifthenelse Construction: Introduce a new variable into the state space, arbitrarily named L for label. For each numbered function node in the original program, construct a new sequence composed of the function node as the firstpart, and an assignment to L of the number of the next node to be visited as the secondpart. For each predicate node, construct a new ifthenelse, with thenpart and elsepart containing an assignment to L of the number of the next node to be visited on the respective branches. The required constructions are illustrated in Figure 3 for two of the eight nodes in the program (for the predicate node, up is true, down is false). Figure 3. Example sequence and ifthenelse constructions
Figure 4. The label-structured version of the program
Step 3. Loop Construction: Construct an initialized whiledo loop on L as shown in Figure 4, with dopart a case statement on values of L. Each casepart of the case statement is a sequence or ifthenelse as constructed in Step 2. This program is function-equivalent to the original program, and is constructed solely of single-entry, single exit sequences, ifthenelses, cases, and whiledos. That is, it is a structured program, although not very understandable at this point. Step 4. Casepart Substitution and Simplification: For caseparts that are singly referenced, that is, those caseparts for which only one assignment of their label value exists, substitute the caseparts for their corresponding label assignments. For example, casepart 6 is referenced only in casepart 5, and can be substituted for that reference. (Note that multiply-referenced caseparts can also be substituted. They are often good candidates for program modularization through subroutine calls, etc.) Figure 5 depicts several steps in the casepart substitution process, with several more steps possible. Note that the L := 0 setting is never substituted, as it represents the exit from the program. Figure 5. The casepart substitution process
The substitution process is continued until no candidates remain. At this point, it may also be possible to eliminate the initialized loop on L. Such is the case in this example, to arrive at the final version depicted in Figure 6 in graphic and Figure 7 in text form. This version has a cyclomatic complexity of one. Figure 6. The final structured version in graphic form
Figure 7. The final structured version in text form
proc This version is more understandable, and is quickly recognized as an initialized whiledo with dopart composed of a sequence, with secondpart a pair of nested ifthenelses. Each single-entry, single-exit control structure can now be read, understood, and documented to maintain intellectual control over the program [7], [2]. The logic is revealed to be a binary search for key in table. The value of this structuring process can be difficult to appreciate in this small illustration, but it is not difficult to appreciate when applied to 25 or 50 KLOC programs with arbitrary control flow beyond human understanding.
5. Reversing the structure theorem proof for the stochastic diversification processWe now describe reversal of the program structuring process for purposes of stochastic diversification. First, we define the structured, well-documented version of a program as the clearcode version. The proof of the Structure Theorem can be applied in reverse to systematically add arbitrary complexity to clearcode programs, to severely limit understandability in searching for vulnerabilities.In illustration, consider the structured program of Figure 6 as a clearcode program. Recall that the casepart substitution process applied to the label-structured program of Figure 4 resulted in the clearcode program of Figure 6, a function-equivalent structured version with reduced complexity. This structuring transformation is many-to-one. Function-equivalent permutations on the order of caseparts in a label-structured program are transformed into the same structured program. Working in reverse, every clearcode program can be expressed in label-structured form. The caseparts can then be stochastically permuted to create diverse representations of the label-structured version. This process is one-to-many, and any number of different yet function-equivalent versions can be generated. Imagine now a code implementation of a label-structured program with control flow expressed in labels and go to logic. The number of each casepart becomes its label, the assignments to L become go to statements that transfer control to the proper label, and the loop on L is no longer necessary. (Developer controls on the extent and location of branching logic introduced can manage performance overhead.) Such an implementation preserves function-equivalence, yet adds substantial complexity by replacing the nested and sequenced single-entry, single-exit control structures in the clearcode program with arbitrary branching logic. Every casepart permutation results in a new version. Such programs, while difficult to understand, can still be structured to reveal their clearcode representations. However, the diversification process can be made effectively irreversible as follows. We define greycode as spurious data definitions and initializations, control structures, and operations within which clearcode programs can be embedded, to increase perceived complexity and to vary load module size and organization. Greycode can be stochastically generated, and can itself exhibit substantial internal complexity in control and data flow. It is unstructured, and contains arbitrary branching logic and labels. Greycode can be configured to appear logically live to compilers, and thus require code generation, while remaining operationally dead so as not to impact clearcode performance. The organization of label-structured programs, as illustrated in Figure 4, suggests candidate greycode insertion points before and after clearcode function and predicate nodes. The structuring process defined by the Structure Theorem cannot distinguish between clearcode and logically live greycode. Thus, attempts to structure mixed clearcode and greycode would propagate both into the structured version. The result would be a structured sea of complexity, of little value for human understanding. The addition of greycode to clearcode effectively prevents structuring from revealing clearcode logic. In addition, names of data items and labels in mixed clearcode and greycode can be systematically replaced with stochastically generated names devoid of semantic content, to further increase complexity with no loss of performance. Based on the program transformations described above, the stochastic diversification process is comprised of the following steps carried out under developer control. Step 1. Label Structuring: The clearcode version of the program is converted to label-structured form. Step 2. Casepart Permutation: Caseparts in the label-structured program are stochastically permuted, and the program is implemented using labels and branching logic. Figure 8 depicts implementation of the label-structured program of Figure 4 using branching logic. Figure 8. Branching logic implementation of the label-structured program
proc Step 3. Greycode Insertion: Stochastically generated greycode is inserted at points stochastically selected from the candidate insertion points in the label-structured program. Figure 9 depicts the program of Figure 8 following insertion of a small amount of greycode. Note that the greycode insertion process can even be considered for real-time and time sensitive applications, as the execution properties of the software are largely unaffected by the transformations (greycode is never executed). Figure 9. Greycode insertion in the label-structured program
proc
Step 4. Stochastic Renaming: Data and label names in the program are systematically renamed using stochastically generated strings devoid of semantic information to complete the diversification process. Figure 10 shows the program of Figure 9 following stochastic renaming. Even in this miniature example, the level of complexity is such that the logic is virtually impossible for an intruder to comprehend. Figure 10. Completed stochastic diversification of the clearcode program
proc
6. Additional diversification through component replacementThe process described above can be extended as follows. The case-structured versions of programs produced by application of the Structure Theorem reveal candidate components for replacement by alternate implementations as a means to introduce additional diversification. Each case in the casepart construction is a single-entry, single-exit component that can be replaced with a different implementation based on alternative control flow and data structures and storage. For example, if, say, four caseparts in a program were each implemented in two ways, the alternatives could be included one, two, three, and four at a time in the program to create a substantial number of variants. Such alternative implementations could be, but need not be, function-equivalent. Alternative functionality could be introduced in a controlled manner to further increase diversification, provided users (people or other programs) are prepared to adapt to the changes.
7. Operational application of stochastic diversificationFigure 11 summarizes the stochastic diversification process in terms of a component architecture for an automated diversification engine. An implementation of the diversification process can undergo correctness verification [5] and statistical testing to certify fitness for use [8], [9]. Application of the diversification process can then proceed with high confidence. The value of the process lies in the systematic and repeatable transformations it applies, which nevertheless introduce unique and unpredictable diversity while preserving required function and acceptable performance.Figure 11. Component architecture for an automated diversification engine
Stochastic diversification can be an important tool in management and operation of survivable systems. Software subject to intrusion can undergo the diversification process, and be swapped on a stochastic schedule with new diversified versions, to create a complex and moving target for intruder analysis. At the same time, authorized system personnel have access to clearcode versions for maintenance and evolution.
8. References[1] Ellison, R.J.; Fisher, D.; Linger, R.C.; Lipson, H.F; Longstaff,. T.; and Mead, N. R.Survivable Network Systems: An Emerging Discipline. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, Technical Report CMU/SEI-97-032.[2] Hausler, P.A.; Pleszkoch, M. G.; Linger, R. C.; and Hevner, A.R. "Using Function Abstraction to Understand Program Behavior," IEEE Software, Jan. 1990, p. 55-63. [3] Linger, R.C.; Mills, H.D.; and Witt, B.I. Structured Programming: Theory and Practice. Reading, Ma: Addison-Wesley Publishing Company, 1979. [4] Linger, R.C. "Software Maintenance as an Engineering Discipline." Proceedings of Conference on Software Maintenance, IEEE Computer Society Press, 1988. [5] Linger, R.C. and Trammell, C.J., Cleanroom Software Engineering Reference Model, Software Engineering Institute, Carnegie Mellon University, Technical Report CMU/SEI-96-TR-022. [6] Linger, R.C.; Lipson, H.F.; and Mead, N.R. "Requirements Definition for Survivable Network Systems." Proceedings of International Conference on Requirements Engineering, Colorado Springs, CO, IEEE Computer Society Press, Los Alamitos, CA. Available at http://www.cert.org/research , 1998. [7] Mills, H.D.; Linger R.C.; and Hevner, A.R. Principles of Information Systems Analysis and Design. New York: Academic Press, 1986. [8] Mills, H.D. "Certifying the Correctness of Software." Proceedings of 25th Hawaii International Conference on System Sciences. IEEE Computer Society Press, January 1992. [9] Whittaker, J.A.. and Thomason, M.G. "A Markov Chain Model for Statistical Software Testing." Cleanroom Software Engineering: A Reader. Oxford, England: Blackwell Publishers, 1996.
For More InformationYour business mission depends on critical network systems. It is prudent risk management to analyze and improve their survivability.
For information, contact Return to top of the page
Copyright 2001 by Carnegie Mellon University Disclaimers and copyright information Last updated November 12, 2002 |














