CERT
 
All Research Papers Research Staff Biographies CMU Heinz School CMU School of Computer Science CERT Statistics US-CERT CyLab
 

STAR*Lab Function Extraction for Software Assurance

Engineering Automation for Computing Software Behavior

Principal Investigator: Richard C. Linger

Problem Addressed
STAR*Lab recognizes both the importance of software assurance to national defense and economic security and the difficulty of achieving it. Software assurance depends on knowing and verifying the complete behavior of software. Unfortunately nothing less will do, because behavior that is not known can contain errors, vulnerabilities, and malicious content. It is a sobering fact that current software engineering provides no practical means for developers to determine the full behavior of software, and no testing effort, no matter how elaborate, can exercise more than a small fraction of possible behavior. Complex software systems are difficult to understand because of their immense numbers of execution paths, any of which can contain errors and security exposures. Faced with innumerable execution possibilities, developers and analysts often achieve no more than a general understanding of system behavior. This technology gap is at the heart of many issues in software and security engineering. Simply put, systems experience errors and vulnerabilities in large measure because their developers have no practical means to determine what they do in all possible uses.

Research Approach
While software assurance has been limited by engineering capabilities in the past, it may be less so in the future. Function-theoretic foundations of software illuminate a challenging but feasible strategy for developing automated tools to calculate the behavior of software and present it to users in understandable form. STAR*Lab is conducting research and development in the emerging technology of function extraction (FX). The objective of FX is to move from an uncertain understanding of program behavior derived in a human time scale of days to a precise understanding automatically computed in a machine time scale of seconds. This technology applies function-theoretic mathematical foundations to automate calculation of the functional behavior of software to the maximum extent possible. These foundations define the transformation of code structures into procedure-free functional form and are the basis for the function extraction process [1,2]. While theoretical results impose some constraints on behavior calculation (for example, for certain forms of loops), STAR*Lab development of engineering solutions suggests that nearly all software behavior will be amenable to calculation. And any level of behavior calculation can help improve human capabilities for understanding and analysis.
To explore the impact of FX technology, STAR*Lab developed a proof-of-concept function extractor prototype that calculates the behavior of programs expressed in a small subset of the Java programming language. In a controlled experiment, the group using the FX prototype reduced the time required to derive the functional behavior of example programs by several orders of magnitude, and was about four times better at providing correct answers to comprehension and verification questions in a fourth of the time, compared to the control group [3].
Function extraction technology can be applied to any programming language and has the potential to impact many aspects of the software engineering life cycle. To better understand this impact, STAR*Lab conducted an SEI sponsored study with a major corporation to determine how FX could improve engineering operations in activities ranging from software specification and design to implementation and testing [4]. This study produced guidance for FX evolution from experienced software developers, including the following recommendations:

  • Development of FX automation for assembly language should be a priority.
  • FX automation should be developed for correctness verification of software.
  • FX automation should be developed for high-level languages, starting with Java.
  • Research on FX automation for specification and architecture should be initiated.
Expected Benefits
The function extraction system currently under development targets programs written in or compiled into Intel assembly language. The system is expected to help security analysts to determine intruder strategies by providing precise information on the structure and function of malicious code [5]. In terms of broader application, opportunities exist to make progress on the problems of malicious code detection, computational security analysis, correctness verification, legacy system understanding, creation of assured software repositories, and automated component composition. The basis for all of this is the realization that programs are mathematical artifacts subject to mathematical analysis. Human fallibility may still exist in interpreting the analytical results, but there can be little doubt that routine availability of calculated behavior would help reduce errors, vulnerabilities, and malicious code in software and make software development more manageable and predictable.

2007 Accomplishments
In its current state of development, the FX system demonstrates (a) transformation of spaghetti-logic assembly language programs into understandable structured form and (b) automated computation of behavior for sequence, alternation, and iteration structures. For example, Figure 1 demonstrates behavior computation for the miniature program shown on the left (depicting the result of the FX system transforming the original spaghetti-logic version into structured form; jump instructions are retained in the program text as comments for traceability but have no effect). The program has an initialized nested loop structure. The computed behavior on the right shows that the net behavioral effect of the program is to always (condition is true) carry out the following concurrent assignments:
  • set register EAX to 0
  • set register ECX to the product of the initial values of EAX and EBX plus the initial value of ECX, and
  • if initial EAX is 0, leave EDX unchanged; otherwise, if initial EAX is not 0, set EDX to 0
In terms of malware analysis, the current system can demonstrate examples of behavior computation for (a) viruses with repeatedly obfuscated control logic, (b) viruses hidden in large programs, and (c) repeatedly obfuscated virus unpackers, including use of computed behavior to unpack the virus payload.

Figure 1. FX System Computation of the Behavior of a Miniature Looping Program

2008 Plans
FX system development is planned to continue in 2008. Sponsors are welcome to participate in completing the system and moving the technology forward. STAR*Lab is also ready to apply FX to additional languages and phases of the software engineering life cycle.

References

[1] Prowell, S., Trammell, C., Linger, R., & Poore, J. Cleanroom Software Engineering: Technology and Practice. Reading, MA: Addison Wesley, 1999.

[2] Mills, H. & Linger, R. “Cleanroom Software Engineering.” Encyclopedia of Software Engineering, 2nd ed. Edited by J. Marciniak. New York, NY: John Wiley & Sons, 2002.

[3] Collins, R., Walton, G., Hevner, A., & Linger, R. The CERT Function Extraction Experiment: Quantifying FX Impact on Software Comprehension and Verification (CMU/SEI-2005-TN-047). Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University, 2005. http://www.sei.cmu.edu/publications/documents/05.reports/05tn047.html

[4] Hevner, A., Linger, R., Collins, R., Pleszkoch, M., Prowell, S., & Walton, G. The Impact of Function Extraction Technology on Next-Generation Software Engineering (CMU/SEI-2005-TR-015). Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University, 2005. http://www.sei.cmu.edu/publications/documents/05.reports/05tr015.html

[5] Pleszkoch, M. & Linger, R. “Improving Network System Security with Function Extraction Technology for Automated Calculation of Program Behavior.” Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37). Waikoloa, HI, Jan. 5-8, 2004. Los Alamitos, CA: IEEE Computer Society Press, 2004.


Disclaimers and copyright information

Last updated May 7, 2007.