![Forwards to [2]](../all_the_pictures/arrow_right.jpg)
Information Survivability Workshop '98
David Alderson
Department of Engineering-Economic Systems and Operations Research
Stanford University
alderd@stanford.edu
The issues faced in addressing this situation are summarized by the following basic research questions:
- How does one detect and protect against vulnerabilities in large, open, complex, interdependent information systems?
- How does one design and implement systems that survive accidents, failures, and attacks to system components?
The goal is to have systems that continue to fulfill their mission objective at all times. There are a number of strategies that might be employed to achieve this result. One such strategy is to track down all potential weaknesses independently within each system and then "harden" them against any possible failure. This is the traditional approach to computer security. However, there are a number of reasons why system hardening alone is not sufficient for creating systems that survive.
- It may not be realistic to expect that all vulnerabilities can be discovered. The critical infrastructure systems are "dynamic" in the sense that they are under constant development, improvement, and change. These changes potentially introduce new weaknesses and vulnerabilities. Also, attackers intent on compromising a system are continuously finding new means for doing so, creating a type of "arms race" with information security professionals. In many ways, this process of finding and plugging these holes can be like chasing a moving target. We must assume that accidents will happen, failures will occur, and attacks will be successful.
- The critical infrastructure systems often depend on one another and cannot reasonably be treated as closed systems. A system is "closed" if it can be analyzed in isolation, independent of other systems. The system hardening approach relies on the assumption that it is appropriate to treat the system under consideration as a closed system. But the critical infrastructures are really a "system of systems" that are highly dependent on one another. This interdependence also questions the ability to separate the individual systems into subsystems that can be independently analyzed.
- The independent treatment of critical infrastructures effectively ignores any common vulnerabilities across systems caused by similar internal properties. The critical infrastructures under consideration represent a diverse and complicated set of technologies. Yet their behavior and potential vulnerabilities appear related. This similarity is certainly caused by a common dependence on the same information networks. They are also likely to be caused by interdependencies between the two systems. However, it is possible that there is another cause for this commonality - something about the internal complex structure of these systems that causes them to behave in similar ways and exhibit similar weaknesses.
For these reasons, system hardening by itself is not likely to ensure that systems will be survivable. However, the process of identifying and repairing individual weaknesses will still be an important part of securing these systems. If the traditional approach of computer security hardening is not sufficient to ensure that systems will survive, then the question remains "How do we proceed?"
As researchers and scientists, it is our job to ask "What is it that makes these systems vulnerable?" If the answer is obscured in the technical details of a single system component, then hardening that component may be appropriate, and the job is best left to experts in these individual technical areas. Fresh approaches to thinking about system vulnerability may be useful for discovering these "critical failure points" but the strategy for alleviating these types of vulnerabilities is largely known - simply find and then plug the holes in the system.
However, if as suggested there is something else at work, something about the internal structure of these systems, that causes complex behavior, and it is this complex behavior that induces vulnerabilities in these systems, then another approach is required. Efforts should be spent on understanding how the organization of system components and their interactions contribute to the overall vulnerability or robustness of the system as a whole. Our ability to design and implement systems that are survivable may critically depend on our ability to understand this relationship.
There has been considerable debate among researchers and scientists as to the role that modeling and simulation is to have in understanding the critical infrastructure problem.[3] However, it is widely recognized that there is a need to develop test facilities that permit real-time simulation. The appropriate question is then, "What is it that should be simulated?" Again, detailed knowledge about the functionality of individual system components already resides with the domain experts. Comprehensive simulation and analysis tools for understanding the individual behavior of these components already exist. But it is the interaction among components that remains largely a mystery. Efforts should be concentrated on trying to understand the behavior and interaction among system components, not on the technical details of these systems.
In order to proceed with understanding this behavior, we need to characterize these systems. Consider the following system characteristics:
- Overall behavior of system is governed by individual components (agents) that make local, rule-based decisions using local, imperfect information only. There is no central coordination mechanism.
- These components are interconnected and there are large numbers of interactions between these components.
- The systems have no boundary conditions (open) or have rapidly changing boundary conditions.
Although the critical infrastructures are comprised of a diverse set of components delivering a diverse set of services, they all fit into the description above. Furthermore, we make the following observations about these systems:
- Most of the weaknesses of these systems remain unknown.
- These systems have demonstrated a capacity for cascading failures.
- There exists the possibility for emergent behavior among the agents.
While the general study of these types of systems is an enormous field of research, it is possible to narrow the research problem with the original research questions:
- How does one detect and protect against vulnerabilities in these systems?
- How does one design and implement systems that survive accidents, failures, and attacks to system components?
As a first step to addressing these broader issues, one might consider the following specific questions:
- What does it mean for these systems to be vulnerable? In the language of dynamical systems, a system is vulnerable when it is unstable, or sensitive to small disturbances. Small changes to the system result in large deviations in overall system behavior. In other words, small inputs yield large outputs. Efforts should be concentrated on understanding how the interaction of system components contributes to this behavior.
- In the context of these systems, what is robust behavior? Conversely, a system is robust when it is stable, or insensitive to small disturbances. It requires large changes to the system before deviations in overall system behavior occur at all. In other words, large inputs yield small/no outputs. This is perhaps a good working definition for robustness - Stability relative to a performance threshold.
- Are there patterns of behavior among system components that are self-reinforcing? Self-reinforcing patterns of behavior are one way in which component interaction might contribute to robust behavior.
- Is it possible to induce self-reinforcing behavior in a system that is unstable? If patterns can be identified, it might be possible to recognize systems that are inherently unstable and interact with them to induce stable, self-reinforcing patterns. If so, then system administrators have a powerful new tool for system stabilization.
- How does increasing the quantity or frequency of system interactions affect the overall system behavior? If the interaction of components is a major contributor to overall system performance, then changing the nature of this interaction is likely to have a profound effect on this behavior. Current efforts to connect almost everything on the Internet and then automate this interaction is perhaps cause for concern, as the effects of this heightened interaction remain largely unknown.
- How does one think about the design tradeoffs when designing robust systems? Traditionally, systems have not been engineered to be robust in responding to successful attacks. Design efforts have largely focused on the prevention of failures, or the use of redundancy to mitigate failures. It is likely that engineers are going to need new ways of thinking about survivability and the design process.
The ISW'98 Conference provides a unique opportunity to gain the support of domain experts from the individual infrastructures in these research initiatives. Efforts should be concentrated on understanding the principal system components within each infrastructure as well as the rules that govern their interaction. Specific similarities and differences among infrastructures should be examined. It is important that testing environments be developed with specific purpose of providing understanding and insight about the survivability of real-world systems. A healthy exchange between real-world experts, practitioners, and researchers is essential to this mutual understanding and development.
Research Background
In an initial research effort, I studied the email system at Stanford University in order to understand the reliability of the system to deliver service in the presence of failures by individual system components. Performance measures from both the user and administrator perspective were compared and found to be greatly different under various scenarios. In a separate study, I worked with a team to examine the effect of deregulation on the electric transmission infrastructure. Specifically, we considered whether the additional deregulation of the transmission infrastructure would bring additional benefits. Tradeoffs between system reliability, efficiency, and cost effectiveness were examined. My current research efforts involve the development of an agent-based test environment for understanding the behavior of datacom networks.
Endnotes:
- [1]
- On August 10, 1996, faults along individual power transmission lines in Oregon led to blackouts in 11 U.S. states and two Canadian provinces.
- [2]
- On May 27, 1998, the failure of a single commercial satellite caused nation-wide outages of more than 40 million pagers, as well as causing other disturbances with credit card transactions and radio transmissions.
- [3]
- This debate about the role of modeling and simulation persisted throughout a series of workshops on protecting and assuring critical national infrastructures that was co-sponsored by the Center for International Security and Arms Control (CISAC) at Stanford University and Lawrence Livermore National Labs. References to similar debates in other forums were also made during these workshops.
References:
- D. Alderson. Reliability of the Leland Email System. Unpublished report. May 1998.
- D. Alderson, D. Elliott, G. Grove, T. Holliday, S. Lukasik, S. Goodman. Workshop on Protecting and Assuring Critical National Infrastructure: Next Steps, February 26-27, 1998. Center for International Security and Arms Control, Stanford University. June 1998.
- D. Alderson, P. Kuzminski, S. Lamping, R. White. Powerful Stuff: Reliability and Deregulation of America's Electricity Infrastructure. Unpublished report. June 1998.
[1]
![Forwards to [2]](../all_the_pictures/arrow_right.jpg)





