CERT
 
Research Staff Biographies CMU Heinz School CMU School of Computer Science CERT Statistics US-CERT CyLab
 

Systematic Generation of Stochastic Diversity as an Intrusion Barrier in Survivable Systems Software

Richard C. Linger
Software Engineering Institute
Carnegie Mellon University
Email: rlinger@sei.cmu.edu

Contents:
Abstract
Survivable systems
The idea of stochastic diversification
Identifying software vulnerabilities through code analysis
The structure theorem and the program structuring process
Reversing the structure theorem proof for the stochastic diversification process
Additional diversification through component replacement
Operational application of stochastic diversification
References
For More Information


Abstract

Survivable systems software must exhibit high resistance to intrusion. A process of stochastic diversification can help increase resistance to intrusion through random obscuration of survivable system properties. Intruders often rely on analysis of source code to identify and exploit vulnerabilities in software. The ability of intruders to understand and analyze code can be dramatically reduced through a process of stochastic unstructuring to increase software complexity as an intrusion barrier while preserving function and performance. The constructive proof of the Structure Theorem was originally applied as a systematic process for transforming complex, unstructured programs into function-equivalent structured form for improved understandability and maintenance. This process can be reversed, to systematically introduce stochastic diversity by transforming structured programs into function-equivalent unstructured programs of arbitrary complexity that are virtually impossible to understand.


1. Survivable systems

Pervasive societal dependence on large-scale network systems in virtually every area of business, government, and defense magnifies the consequences of intrusions and failures. As a result, the survivability and security of such systems is receiving increasing attention. Survivable systems are characterized by the ability to provide essential services even in the presence of intrusions and compromises, and to recover full services in a timely manner [1]. Survivability requires effective means to resist, recognize, and recover from intrusions, and to adapt system behavior to reduce the effectiveness of future intrusion attempts [6]. One method to improve resistance to intrusion is diversification of system software, to increase the cost and difficulty of identifying vulnerabilities. This paper describes a systematic technique for achieving software diversification.


2. The idea of stochastic diversification

Network intrusion strategies rely on exploitation of information about networks and their nodes. Such information may be acquired systematically, through extensive usage and analysis, or accidentally, through ad hoc intrusion attempts. The effectiveness of the information acquisition process depends on the perceived complexity of the artifacts analyzed. The usefulness of the information depends on the stability and predictability of the system being attacked. Today's intrusion strategy will work tomorrow only if the system continues to exhibit identical properties. The time value of intrusion information thus depends to a great extent on an absence of variation in system properties.

The capability to acquire and apply information for defining intrusion strategies can be severely limited by reducing predictability and increasing variability and perceived complexity in systems, while preserving correct function and acceptable performance. The process of stochastic diversification provides systematic and dependable methods for achieving increased resistance to intrusion through random variation and obscuration of survivable system properties. Stochastic diversification can increase the effort to acquire information for intrusions to prohibitive levels, and help make the information itself of limited value.


3. Identifying software vulnerabilities through code analysis

Vulnerabilities in survivable systems software that can be exploited for attack and intrusion are often revealed through code reading and analysis. Such analysis depends on discovery of true software behavior, as revealed through control logic, data structures, operations, and comments. The understanding sought by intruders is similar to that required by maintenance programmers seeking knowledge sufficient to make reliable changes. In fact, software developers create systematic and understandable code and document its semantics for the very purpose of simplifying maintenance activities. Of course, such information may be readily available to intruders as well.

However, program understanding by intruders can be made extremely difficult, if not impossible, and hardly worth the cost and effort. By transforming a program into a sea of logical complexity that effectively obscures its behavior and design while maintaining correct function and performance, a significant barrier to intrusion can be created. Such transformations can be carried out in a systematic process based on a reinterpretation of the Structure Theorem [3]. In Section 4, we first describe the theorem and illustrate its original intended application for program structuring to reduce complexity and increase intellectual control in software maintenance. In Section 5, we reverse the process to increase complexity through stochastic diversification.


4. The structure theorem and the program structuring process

The constructive proof of the Structure Theorem defines a process for transforming confusing and arbitrary program logic into function-equivalent structured form for improved understandability and maintenance. The resulting programs are expressed as nested and sequenced single-entry, single-exit control structures drawn from a basis set composed of sequence, ifthen, ifthenelse, case, whiledo, dountil, etc.

Such programs can be read, understood, and documented in a systematic manner. The structuring process is independent of program subject matter, and is the basis for structuring engines that automatically carry out the required transformations.

As an illustration, consider the miniature program depicted in graphic form in Figure 1 and text form in Figure 2 (scalar data items are integers, table is an array of integers), intended to represent how a program might look after being patched and repatched in maintenance activities. Imagine the difficulties of understanding such arbitrary control flow in a larger program.

Key steps in application of the constructive proof of the theorem are illustrated below. Each step is a mechanical process, with no creativity or invention required, and is applied in the same manner to any program being structured.

This stepwise process has been implemented in the development of commercial automated structuring products aimed at improving conditions of software maintenance [4]. In particular, COBOL structuring systems have been successfully applied in a variety of environments. For full details of the structuring process, see [3].

Figure 1. Arbitrary control flow in a miniature program

The constructive proof of the theorem specifies the following steps in the program structuring process to create a function-equivalent structured version:


Figure 2.

The miniature program in text form

    proc
    I := 0
    lo := 1
    hi := n
    go to a
b:  hi := j - 1
    go to a
c:  lo := j + 1
a: if lo <= hi & I = 0
    then go to d
    else go to e
d: if table(j) = key
    then I := j
    go to a
    else
    if table(j) < key
    then go to c
    else go to b
    endif
e: endproc


Step 1. Node Numbering:

Number the function and predicate nodes of the original program in any order. Number the exit line 0. This step is shown in Figure 1.

Step 2. Sequence and Ifthenelse Construction:

Introduce a new variable into the state space, arbitrarily named L for label. For each numbered function node in the original program, construct a new sequence composed of the function node as the firstpart, and an assignment to L of the number of the next node to be visited as the secondpart. For each predicate node, construct a new ifthenelse, with thenpart and elsepart containing an assignment to L of the number of the next node to be visited on the respective branches. The required constructions are illustrated in Figure 3 for two of the eight nodes in the program (for the predicate node, up is true, down is false).

Figure 3. Example sequence and ifthenelse constructions

Figure 4. The label-structured version of the program

Step 3. Loop Construction:

Construct an initialized whiledo loop on L as shown in Figure 4, with dopart a case statement on values of L. Each casepart of the case statement is a sequence or ifthenelse as constructed in Step 2. This program is function-equivalent to the original program, and is constructed solely of single-entry, single exit sequences, ifthenelses, cases, and whiledos. That is, it is a structured program, although not very understandable at this point.

Step 4. Casepart Substitution and Simplification:

For caseparts that are singly referenced, that is, those caseparts for which only one assignment of their label value exists, substitute the caseparts for their corresponding label assignments. For example, casepart 6 is referenced only in casepart 5, and can be substituted for that reference. (Note that multiply-referenced caseparts can also be substituted. They are often good candidates for program modularization through subroutine calls, etc.) Figure 5 depicts several steps in the casepart substitution process, with several more steps possible. Note that the L := 0 setting is never substituted, as it represents the exit from the program.

Figure 5. The casepart substitution process

The substitution process is continued until no candidates remain. At this point, it may also be possible to eliminate the initialized loop on L. Such is the case in this example, to arrive at the final version depicted in Figure 6 in graphic and Figure 7 in text form. This version has a cyclomatic complexity of one.

Figure 6. The final structured version in graphic form

Figure 7. The final structured version in text form

proc
   i := 0
   lo := 1
   hi := n
   while (lo <= hi) & (i = 0)
   do
     j := (lo + hi)/2
     if table(j) = key
     then
       i := j
    else
     if table(j) < key
     then
      lo := j + 1
     else
      lo := j - 1
     endif
    endif
  enddo
endproc

This version is more understandable, and is quickly recognized as an initialized whiledo with dopart composed of a sequence, with secondpart a pair of nested ifthenelses. Each single-entry, single-exit control structure can now be read, understood, and documented to maintain intellectual control over the program [7], [2]. The logic is revealed to be a binary search for key in table. The value of this structuring process can be difficult to appreciate in this small illustration, but it is not difficult to appreciate when applied to 25 or 50 KLOC programs with arbitrary control flow beyond human understanding.


5. Reversing the structure theorem proof for the stochastic diversification process

We now describe reversal of the program structuring process for purposes of stochastic diversification. First, we define the structured, well-documented version of a program as the clearcode version. The proof of the Structure Theorem can be applied in reverse to systematically add arbitrary complexity to clearcode programs, to severely limit understandability in searching for vulnerabilities.

In illustration, consider the structured program of Figure 6 as a clearcode program. Recall that the casepart substitution process applied to the label-structured program of Figure 4 resulted in the clearcode program of Figure 6, a function-equivalent structured version with reduced complexity. This structuring transformation is many-to-one. Function-equivalent permutations on the order of caseparts in a label-structured program are transformed into the same structured program.

Working in reverse, every clearcode program can be expressed in label-structured form. The caseparts can then be stochastically permuted to create diverse representations of the label-structured version. This process is one-to-many, and any number of different yet function-equivalent versions can be generated. Imagine now a code implementation of a label-structured program with control flow expressed in labels and go to logic. The number of each casepart becomes its label, the assignments to L become go to statements that transfer control to the proper label, and the loop on L is no longer necessary. (Developer controls on the extent and location of branching logic introduced can manage performance overhead.) Such an implementation preserves function-equivalence, yet adds substantial complexity by replacing the nested and sequenced single-entry, single-exit control structures in the clearcode program with arbitrary branching logic. Every casepart permutation results in a new version. Such programs, while difficult to understand, can still be structured to reveal their clearcode representations. However, the diversification process can be made effectively irreversible as follows.

We define greycode as spurious data definitions and initializations, control structures, and operations within which clearcode programs can be embedded, to increase perceived complexity and to vary load module size and organization. Greycode can be stochastically generated, and can itself exhibit substantial internal complexity in control and data flow. It is unstructured, and contains arbitrary branching logic and labels. Greycode can be configured to appear logically live to compilers, and thus require code generation, while remaining operationally dead so as not to impact clearcode performance. The organization of label-structured programs, as illustrated in Figure 4, suggests candidate greycode insertion points before and after clearcode function and predicate nodes.

The structuring process defined by the Structure Theorem cannot distinguish between clearcode and logically live greycode. Thus, attempts to structure mixed clearcode and greycode would propagate both into the structured version. The result would be a structured sea of complexity, of little value for human understanding. The addition of greycode to clearcode effectively prevents structuring from revealing clearcode logic.

In addition, names of data items and labels in mixed clearcode and greycode can be systematically replaced with stochastically generated names devoid of semantic content, to further increase complexity with no loss of performance.

Based on the program transformations described above, the stochastic diversification process is comprised of the following steps carried out under developer control.

Step 1. Label Structuring:

The clearcode version of the program is converted to label-structured form.

Step 2. Casepart Permutation:

Caseparts in the label-structured program are stochastically permuted, and the program is implemented using labels and branching logic. Figure 8 depicts implementation of the label-structured program of Figure 4 using branching logic.

Figure 8. Branching logic implementation of the label-structured program

   proc
     go to L1
L4: if lo <= hi & I = 0
     then go to L5
     else go to L6
     endif
L1: i := 0
     lo := 1
     hi := n
     go to L4
L7: i := j
     go to L4
L2: hi := j - 1
     go to L4
L5: j := (lo + hi)/2
     go to L6
L8: if table(j) < key
     then go to L3
     else go to L2
     endif
L6: if table(j) = key
     then go to L7
     else go to L8
     endif
L3: lo := j + 1
     go to L4
   endproc

Step 3. Greycode Insertion:

Stochastically generated greycode is inserted at points stochastically selected from the candidate insertion points in the label-structured program. Figure 9 depicts the program of Figure 8 following insertion of a small amount of greycode.

Note that the greycode insertion process can even be considered for real-time and time sensitive applications, as the execution properties of the software are largely unaffected by the transformations (greycode is never executed).

Figure 9. Greycode insertion in the label-structured program

proc
      go to L1
L4:  if a > 0
      then
      a := (b + c)*d
      endif
      if lo <= hi & I = 0
      then
      b := a -e
      go to L5
      else go to L6
      endif
L1:  i := 0
      lo := 1
      hi := n
      go to L4
L7:  i := j
      go to L4
L2:  hi := j - 1
      if a < b
      then
      c := a + b - e
      endif
      go to L4
L5:  j := (lo + hi)/2
      go to L6
L8:  if table(j) < key
      then go to L3
      else
      if a < e
      then
      a := d
      endif
      go to L2
      endif
L6:  if table(j) = key
      then go to L7
      else
      e := a + b - c
      go to L8
      endif
L3:  lo := j + 1
      b := b + 182
      go to L4
endproc

Step 4. Stochastic Renaming:

Data and label names in the program are systematically renamed using stochastically generated strings devoid of semantic information to complete the diversification process.

Figure 10 shows the program of Figure 9 following stochastic renaming. Even in this miniature example, the level of complexity is such that the logic is virtually impossible for an intruder to comprehend.

Figure 10. Completed stochastic diversification of the clearcode program

  proc
        go to RJ5D
G5U9: if drt2w > 0
        then
        drt2w := (eb59f + lk4yu)*hpm7d
        endif
        if bk143 <= mjhpd & blk1a = 0
        then
        eb5f9 := drt2w - d73az
        go to SM23
        else go to K43L
        endif
RJ5D: blk1a := 0
        bk143 := 1
        mjhpd := q46q2
        go to G5U9
LG2Y: blk1a := tyw4c
        go to G5U9
T71H: mjhpd := tyw4c - 1
        if drt2w < eb5f9
        then
        lk4yu := drt2w + eb5f9 - d73az
        endif
        go to G5U9
SM23: tyw4c := (bk143 + mjhpd)/2
        go to K43L
W298: if cw2k1(tyw4c) < k49gg
        then go to CPLQ
        else
        if drt2w < d73az
        then
        drt2w := hpm7d
        endif
        go to T71H
        endif
K43L: if cw2k1(tyw4c) = k49gg
        then go to lg2y
        else
        d73az := drt2w + eb5f9 - lk4yu
        go to W298
        endif
CPLQ: bk143 := tyw4c + 1
        eb5f9 := eb5f9 + 182
        go to G5U9
  endproc


6. Additional diversification through component replacement

The process described above can be extended as follows. The case-structured versions of programs produced by application of the Structure Theorem reveal candidate components for replacement by alternate implementations as a means to introduce additional diversification. Each case in the casepart construction is a single-entry, single-exit component that can be replaced with a different implementation based on alternative control flow and data structures and storage. For example, if, say, four caseparts in a program were each implemented in two ways, the alternatives could be included one, two, three, and four at a time in the program to create a substantial number of variants. Such alternative implementations could be, but need not be, function-equivalent. Alternative functionality could be introduced in a controlled manner to further increase diversification, provided users (people or other programs) are prepared to adapt to the changes.


7. Operational application of stochastic diversification

Figure 11 summarizes the stochastic diversification process in terms of a component architecture for an automated diversification engine. An implementation of the diversification process can undergo correctness verification [5] and statistical testing to certify fitness for use [8], [9]. Application of the diversification process can then proceed with high confidence. The value of the process lies in the systematic and repeatable transformations it applies, which nevertheless introduce unique and unpredictable diversity while preserving required function and acceptable performance.

Figure 11. Component architecture for an automated diversification engine

Stochastic diversification can be an important tool in management and operation of survivable systems. Software subject to intrusion can undergo the diversification process, and be swapped on a stochastic schedule with new diversified versions, to create a complex and moving target for intruder analysis. At the same time, authorized system personnel have access to clearcode versions for maintenance and evolution.


8. References

[1] Ellison, R.J.; Fisher, D.; Linger, R.C.; Lipson, H.F; Longstaff,. T.; and Mead, N. R.Survivable Network Systems: An Emerging Discipline. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, Technical Report CMU/SEI-97-032.

[2] Hausler, P.A.; Pleszkoch, M. G.; Linger, R. C.; and Hevner, A.R. "Using Function Abstraction to Understand Program Behavior," IEEE Software, Jan. 1990, p. 55-63.

[3] Linger, R.C.; Mills, H.D.; and Witt, B.I. Structured Programming: Theory and Practice. Reading, Ma: Addison-Wesley Publishing Company, 1979.

[4] Linger, R.C. "Software Maintenance as an Engineering Discipline." Proceedings of Conference on Software Maintenance, IEEE Computer Society Press, 1988.

[5] Linger, R.C. and Trammell, C.J., Cleanroom Software Engineering Reference Model, Software Engineering Institute, Carnegie Mellon University, Technical Report CMU/SEI-96-TR-022.

[6] Linger, R.C.; Lipson, H.F.; and Mead, N.R. "Requirements Definition for Survivable Network Systems." Proceedings of International Conference on Requirements Engineering, Colorado Springs, CO, IEEE Computer Society Press, Los Alamitos, CA. Available at http://www.cert.org/research , 1998.

[7] Mills, H.D.; Linger R.C.; and Hevner, A.R. Principles of Information Systems Analysis and Design. New York: Academic Press, 1986.

[8] Mills, H.D. "Certifying the Correctness of Software." Proceedings of 25th Hawaii International Conference on System Sciences. IEEE Computer Society Press, January 1992.

[9] Whittaker, J.A.. and Thomason, M.G. "A Markov Chain Model for Statistical Software Testing." Cleanroom Software Engineering: A Reader. Oxford, England: Blackwell Publishers, 1996.


For More Information

Your business mission depends on critical network systems. It is prudent risk management to analyze and improve their survivability.

For information, contact
CERT Coordination Center
Software Engineering Institute
Carnegie Mellon University
Office: 412 / 268-7783
FAX: 412 / 268-6989
CERT hotline +1 412-268-7090
E-mail: survivable-systems@cert.org
World Wide Web: http://www.cert.org

Return to top of the page


Copyright 2001 by Carnegie Mellon University

Disclaimers and copyright information

Last updated November 12, 2002