STAR*Lab Function Extraction for Software Assurance
Engineering Automation for Computing Software Behavior
Principal Investigator: Richard C. Linger
Problem Addressed
STAR*Lab recognizes both the importance of software assurance to national defense and
economic security and the difficulty of achieving it. Software assurance depends on
knowing and verifying the complete behavior of software. Unfortunately nothing less will
do, because behavior that is not known can contain errors, vulnerabilities, and malicious
content. It is a sobering fact that current software engineering provides no practical
means for developers to determine the full behavior of software, and no testing effort,
no matter how elaborate, can exercise more than a small fraction of possible behavior.
Complex software systems are difficult to understand because of their immense numbers of
execution paths, any of which can contain errors and security exposures. Faced with
innumerable execution possibilities, developers and analysts often achieve no more than a
general understanding of system behavior. This technology gap is at the heart of many
issues in software and security engineering. Simply put, systems experience errors and
vulnerabilities in large measure because their developers have no practical means to
determine what they do in all possible uses.
Research Approach
While software assurance has been limited by engineering capabilities in the past, it
may be less so in the future. Function-theoretic foundations of software illuminate a
challenging but feasible strategy for developing automated tools to calculate the
behavior of software and present it to users in understandable form. STAR*Lab is
conducting research and development in the emerging technology of function extraction
(FX). The objective of FX is to move from an uncertain understanding of program behavior
derived in a human time scale of days to a precise understanding automatically computed
in a machine time scale of seconds. This technology applies function-theoretic
mathematical foundations to automate calculation of the functional behavior of software
to the maximum extent possible. These foundations define the transformation of code
structures into procedure-free functional form and are the basis for the function
extraction process [1,2]. While theoretical results impose some constraints on behavior
calculation (for example, for certain forms of loops), STAR*Lab development of
engineering solutions suggests that nearly all software behavior will be amenable to
calculation. And any level of behavior calculation can help improve human capabilities
for understanding and analysis.
To explore the impact of FX technology, STAR*Lab developed a proof-of-concept function
extractor prototype that calculates the behavior of programs expressed in a small subset
of the Java programming language. In a controlled experiment, the group using the FX
prototype reduced the time required to derive the functional behavior of example programs
by several orders of magnitude, and was about four times better at providing correct
answers to comprehension and verification questions in a fourth of the time, compared to
the control group [3].
Function extraction technology can be applied to any programming language and has the
potential to impact many aspects of the software engineering life cycle. To better
understand this impact, STAR*Lab conducted an SEI sponsored study with a major
corporation to determine how FX could improve engineering operations in activities
ranging from software specification and design to implementation and testing [4]. This
study produced guidance for FX evolution from experienced software developers, including
the following recommendations:
- Development of FX automation for assembly
language should be a priority.
- FX automation should be developed for
correctness verification of software.
- FX automation should be developed for high-level
languages, starting with Java.
- Research on FX automation for specification and
architecture should be initiated.
Expected Benefits
The function extraction system currently under development targets programs written in
or compiled into Intel assembly language. The system is expected to help security
analysts to determine intruder strategies by providing precise information on the
structure and function of malicious code [5]. In terms of broader application,
opportunities exist to make progress on the problems of malicious code detection,
computational security analysis, correctness verification, legacy system understanding,
creation of assured software repositories, and automated component composition. The basis
for all of this is the realization that programs are mathematical artifacts subject to
mathematical analysis. Human fallibility may still exist in interpreting the analytical
results, but there can be little doubt that routine availability of calculated behavior
would help reduce errors, vulnerabilities, and malicious code in software and make
software development more manageable and predictable.
2007 Accomplishments
In its current state of development, the FX system demonstrates (a) transformation of
spaghetti-logic assembly language programs into understandable structured form and (b)
automated computation of behavior for sequence, alternation, and iteration structures.
For example, Figure 1 demonstrates behavior computation for the miniature program shown
on the left (depicting the result of the FX system transforming the original
spaghetti-logic version into structured form; jump instructions are retained in the
program text as comments for traceability but have no effect). The program has an
initialized nested loop structure. The computed behavior on the right shows that the net
behavioral effect of the program is to always (condition is true) carry out the following
concurrent assignments:
- set register EAX to 0
- set register ECX to the product of the initial
values of EAX and EBX plus the initial value of ECX, and
- if initial EAX is 0, leave EDX unchanged;
otherwise, if initial EAX is not 0, set EDX to 0
In terms of malware analysis, the current system can demonstrate examples of behavior
computation for (a) viruses with repeatedly obfuscated control logic, (b) viruses hidden
in large programs, and (c) repeatedly obfuscated virus unpackers, including use of
computed behavior to unpack the virus payload.
Figure 1. FX System Computation of the Behavior of a Miniature Looping Program
2008 Plans
FX system development is planned to continue in 2008. Sponsors are welcome to
participate in completing the system and moving the technology forward. STAR*Lab is also
ready to apply FX to additional languages and phases of the software engineering life
cycle.
References
[1] Prowell, S., Trammell, C., Linger, R., & Poore, J. Cleanroom Software Engineering: Technology and Practice. Reading, MA: Addison Wesley, 1999.
[2] Mills, H. & Linger, R. “Cleanroom Software Engineering.” Encyclopedia of Software Engineering, 2nd ed. Edited by J. Marciniak. New York, NY: John Wiley & Sons, 2002.
[3] Collins, R., Walton, G., Hevner, A., & Linger, R. The CERT Function Extraction Experiment: Quantifying FX Impact on
Software Comprehension and Verification (CMU/SEI-2005-TN-047). Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon
University, 2005. http://www.sei.cmu.edu/library/abstracts/reports/05tn047.cfm
[4] Hevner, A., Linger, R., Collins, R., Pleszkoch, M., Prowell, S., & Walton, G. The Impact of Function Extraction
Technology on Next-Generation Software Engineering (CMU/SEI-2005-TR-015). Pittsburgh, PA: Software Engineering Institute, Carnegie
Mellon University, 2005. http://www.sei.cmu.edu/library/abstracts/reports/05tr015.cfm
[5] Pleszkoch, M. & Linger, R. “Improving Network System Security with Function Extraction Technology for Automated Calculation of Program Behavior.” Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37). Waikoloa, HI, Jan. 5-8, 2004. Los Alamitos, CA: IEEE Computer Society Press, 2004.
Disclaimers and copyright information
Last updated May 7, 2007.