CERT
   [1]    Forwards to [2]



Position Paper
Information Survivability Workshop '98

David Alderson
Department of Engineering-Economic Systems and Operations Research
Stanford University
alderd@stanford.edu

Recently, there has been increasing attention from the federal government and the broader defense community toward the protection of national critical infrastructures. Specifically, the race to automate and interconnect telecommunications, energy, transportation, and other national systems through common information infrastructure has generated concern about the possibility of new interdependencies leading to their widespread failure. Incidents such as the Western Power Outage[1] and the PanAmSat satellite outage[2] have demonstrated our dependence on these systems, and they have also provided a bleak view into the potential consequences of such failure.

The issues faced in addressing this situation are summarized by the following basic research questions:

The goal is to have systems that continue to fulfill their mission objective at all times. There are a number of strategies that might be employed to achieve this result. One such strategy is to track down all potential weaknesses independently within each system and then "harden" them against any possible failure. This is the traditional approach to computer security. However, there are a number of reasons why system hardening alone is not sufficient for creating systems that survive.

  1. It may not be realistic to expect that all vulnerabilities can be discovered. The critical infrastructure systems are "dynamic" in the sense that they are under constant development, improvement, and change. These changes potentially introduce new weaknesses and vulnerabilities. Also, attackers intent on compromising a system are continuously finding new means for doing so, creating a type of "arms race" with information security professionals. In many ways, this process of finding and plugging these holes can be like chasing a moving target. We must assume that accidents will happen, failures will occur, and attacks will be successful.

  2. The critical infrastructure systems often depend on one another and cannot reasonably be treated as closed systems. A system is "closed" if it can be analyzed in isolation, independent of other systems. The system hardening approach relies on the assumption that it is appropriate to treat the system under consideration as a closed system. But the critical infrastructures are really a "system of systems" that are highly dependent on one another. This interdependence also questions the ability to separate the individual systems into subsystems that can be independently analyzed.

  3. The independent treatment of critical infrastructures effectively ignores any common vulnerabilities across systems caused by similar internal properties. The critical infrastructures under consideration represent a diverse and complicated set of technologies. Yet their behavior and potential vulnerabilities appear related. This similarity is certainly caused by a common dependence on the same information networks. They are also likely to be caused by interdependencies between the two systems. However, it is possible that there is another cause for this commonality - something about the internal complex structure of these systems that causes them to behave in similar ways and exhibit similar weaknesses.

For these reasons, system hardening by itself is not likely to ensure that systems will be survivable. However, the process of identifying and repairing individual weaknesses will still be an important part of securing these systems. If the traditional approach of computer security hardening is not sufficient to ensure that systems will survive, then the question remains "How do we proceed?"

As researchers and scientists, it is our job to ask "What is it that makes these systems vulnerable?" If the answer is obscured in the technical details of a single system component, then hardening that component may be appropriate, and the job is best left to experts in these individual technical areas. Fresh approaches to thinking about system vulnerability may be useful for discovering these "critical failure points" but the strategy for alleviating these types of vulnerabilities is largely known - simply find and then plug the holes in the system.

However, if as suggested there is something else at work, something about the internal structure of these systems, that causes complex behavior, and it is this complex behavior that induces vulnerabilities in these systems, then another approach is required. Efforts should be spent on understanding how the organization of system components and their interactions contribute to the overall vulnerability or robustness of the system as a whole. Our ability to design and implement systems that are survivable may critically depend on our ability to understand this relationship.

There has been considerable debate among researchers and scientists as to the role that modeling and simulation is to have in understanding the critical infrastructure problem.[3] However, it is widely recognized that there is a need to develop test facilities that permit real-time simulation. The appropriate question is then, "What is it that should be simulated?" Again, detailed knowledge about the functionality of individual system components already resides with the domain experts. Comprehensive simulation and analysis tools for understanding the individual behavior of these components already exist. But it is the interaction among components that remains largely a mystery. Efforts should be concentrated on trying to understand the behavior and interaction among system components, not on the technical details of these systems.

In order to proceed with understanding this behavior, we need to characterize these systems. Consider the following system characteristics:


Although the critical infrastructures are comprised of a diverse set of components delivering a diverse set of services, they all fit into the description above. Furthermore, we make the following observations about these systems:

While the general study of these types of systems is an enormous field of research, it is possible to narrow the research problem with the original research questions:

  1. How does one detect and protect against vulnerabilities in these systems?
  2. How does one design and implement systems that survive accidents, failures, and attacks to system components?

As a first step to addressing these broader issues, one might consider the following specific questions:

The ISW'98 Conference provides a unique opportunity to gain the support of domain experts from the individual infrastructures in these research initiatives. Efforts should be concentrated on understanding the principal system components within each infrastructure as well as the rules that govern their interaction. Specific similarities and differences among infrastructures should be examined. It is important that testing environments be developed with specific purpose of providing understanding and insight about the survivability of real-world systems. A healthy exchange between real-world experts, practitioners, and researchers is essential to this mutual understanding and development.

Research Background
In an initial research effort, I studied the email system at Stanford University in order to understand the reliability of the system to deliver service in the presence of failures by individual system components. Performance measures from both the user and administrator perspective were compared and found to be greatly different under various scenarios. In a separate study, I worked with a team to examine the effect of deregulation on the electric transmission infrastructure. Specifically, we considered whether the additional deregulation of the transmission infrastructure would bring additional benefits. Tradeoffs between system reliability, efficiency, and cost effectiveness were examined. My current research efforts involve the development of an agent-based test environment for understanding the behavior of datacom networks.

Endnotes:

[1]
On August 10, 1996, faults along individual power transmission lines in Oregon led to blackouts in 11 U.S. states and two Canadian provinces.
[2]
On May 27, 1998, the failure of a single commercial satellite caused nation-wide outages of more than 40 million pagers, as well as causing other disturbances with credit card transactions and radio transmissions.
[3]
This debate about the role of modeling and simulation persisted throughout a series of workshops on protecting and assuring critical national infrastructures that was co-sponsored by the Center for International Security and Arms Control (CISAC) at Stanford University and Lawrence Livermore National Labs. References to similar debates in other forums were also made during these workshops.

References:




Back to the Table of Contents
   [1]    Forwards to [2]