[16]
![Forwards to [17]](../all_the_pictures/arrow_right.jpg)
"Issues and Insights Regarding Survivable Inter-Network Design and Retrofit"
Information Survivability Workshop '98 (ISW 98): 28-30 October 1998 (Orlando,FL)Stephen F. Fiore - principal consultant: Priority Technology, State College, PA 16803
email: priority@vicon.net
Abstract
The context of this white paper assumes that the network system architect/designer is requested by the client to . . . "Make our baseline inter-network (combination of corporate intranet/ LANs with WAN/ Internet gateways) "survivable" or to . . . "Develop from scratch a new survivable inter-network architecture".The purpose of this brief paper is to identify the issues facing the network architect given this task and to initiate insight into possible logical and methodical processes for designing or retrofitting inter-networked systems with survivable features.
Introduction
Survivable systems are said to embody two essential characteristics (ref[1,2]):- preserve mission essential services given failure or attack
- function in unbounded and dynamic network environments (i.e., LANs/ WANs/ web-based intranets with gateways to other LANs/ WANs/ intranets or the Internet) .
The following requirements areas must be included in a comprehensive approach :
- Overall system availability
- Fault tolerance
- (Network) reconstitution and restoration
- Security & integrity
- Information Warfare (IW) mitigation.
For retrofit of existing systems, we interpret "practical" as employing "sufficient" enhancement for supporting survivability requirements without critically impacting legacy system performance for "essential" services.
Current state of practice employs non-methodical approaches which typically address each requirement area independently. For example, we often select techniques and tools to satisfy a specific security or integrity vulnerability based on what is the "latest" or "available" for a specifc machine, O/S and/or application (eg., WinTel PC, NT4.0, Office '97, NS Communicator ) environment. Such "ad-hoc" approaches often result in less than desirable outcomes to the extent that survivability improvements are negligible, non-scalable, non extensible, too costly and/or negatively impact overall system performance. An objective of the desired comprehensive design process is to guard against these negative possibilities which to not support logical migration plans.
To achieve our goal, a comprehensive and integrated design process should consider the inter-relationships between the above requirement areas in a logical, methodical (and if possible) quantitative manner.
Requirements Analysis Areas
First, we need to understand how the aforementioned requirement areas inter-relate to determine specific survivability techniques and mechanisms (or "tools" ref.[3]).
Overall System Availability
A system availability specification is a function of both system reliability (in terms of Mean Time Between "Failures" - MTBF) and system maintainability (in terms of Mean Time To "Repair"- MTTR). "Failures" for a survivable system should include stalled or hung processes, critical thread connection delays, denial of service IW attacks, etc. "Repair" for a survivable system should include: reconstitution and restoration via resource reassignments and fail-over switch. Given an end-to-end system/service availability specification and inter-network topology (both physical and logical), we can identify single thread/ point vulnerabilities. Once identified, we need to be able to assign individual MTBF & MTTR probabilities to conduct critical path analysis and identify critical elements (processes, resources, objects, physical processor nodes (servers), storage, I/O, communication Gateways, power supplies, etc.)
Typically, the analysis proceeds to derived regarding the numbers of redundant physical elements needed including hot-standby nodes, shadow storage (RAID drives), parallel I/O and communication ports (eg., serial (232,488), bus (Ethernet, fast Ethernet), diverse communication Gateway medium options: (eg., telephony-modem, ISDN); wireless -cellular, MMDS); fiber -SONET, ATM; satellite (VSAT, DBS, MSS, FSS).
Fault Tolerance
Assuming a fault tolerant O/S (e.g., future NT Clustering service (ref[4]) or application (e.g., SAP R/3 business development middleware, Oracle fail-safe database) is included in the architecture, then the derivation of system availability requirements must be continued to determine the number and configuration of redundant cluster groups, virtual resources and nodes, O/S services, processes and protocols and/or application services, processes, protocols, and objects.
Reconstitution and Restoration
Reconstitution and restoration represent the response part of survivable systems. Design attributes of candidate fault tolerant O/S and applications need to be considered along with system performance requirements related to response timeliness, latencies, file/stream throughputs and network loading efficiencies to determine the appropriateness of candidate fail-over processes and mechanisms. Thus requirements are further derived for first identifying adaptive services and processes for bandwidth management, media type management, meta-networking, etc. Then these selected services and processes are allocated to specific mechanisms such as server node renaming, IP address redirectors, resource redirectors, database redirectors, etc.
Security & Integrity
This area is typically treated as a layered afterthought which often results in multitude of complex issues regarding inter-operability and compatibility. Many requirements appear completely new (through added functionality - eg., NAT/ proxy addressing) or demand enhanced or strengthen attributes in terms of protection (eg., increasing key lengths). Functional sub-areas include:
- user/client station identity & authentication
- information content verification and filtering
- traffic data encryption/decryption
- monitor and control data encryption/decryption
- network address verification, translation and/or filtering
- layered protocol encapsulation (eg., using TCP wrappers, ESP of IP_Sec).
In addition, industry-developed nonstandard layered protocols and "plug-ins" often place limits on easy ways to increase protection levels using incremental or phased implementation approaches. Just in the area of authentication and user identification alone many technology options exists which are not easily transitionable.
For example, combining secure transfer protocols such as Secure Sockets Layer (SSL), secure Hyper Text Transfer Protocol (HTTP-S) for user/password negotiation and log-on with hard tokens such as special PCMCIA cards, smart cards w/ PKC, or adding biometrics such as finger-print recognition or retinal scan methods for authentication with hard identity verification.
Information Warfare (IW) Mitigation
Due to the ambiguous nature of IW intrusion recognition and attack detection, this area includes both intentional attacks (such as IP spoofing, TCP SYN flooding, smurfing, Trojan horse and virus attacks (ref. [5]) as well as other non-intentional anomalous (i.e., "out of the ordinary") events. Current state of practice is, for the most part, void of any generally applicable quantitative measures of effectiveness (MOE). For the time being, assessment of potential threats and red team testing are the only ways to reasonably scope appropriate attributes for event detection, analysis & countermeasures.
IW mitigation design is the most subjective area of survivability assessment. But it must be performed in some manner to determine the number and type of anomalous event detectors, port scanners, virus scanners, usage profilers, audit trail analyzers, interactive boundary controllers (e.g., firewalls, routers with IP packet filtering/ address translation). Similar to the security/ integrity area, commercial products for IW mitigation are new and immature. Also similarly, they often lack clear definition of their functional utility. Due to lack of quantitative MOE, they almost always lack quantitative measures which can be used for comparison and evaluation.
Design Process Considerations
Initial inroads for new software design to select survivability tools have been made in the area from the SEI (ref.[3]). However, a need exists to develop an integrated survivability design process which results in an efficient, practical design providing "necessary and sufficient" survivability (encompassing all previously identified five areas) while supporting "acceptable" performance in terms of processing response timeliness, latencies and throughput performance for one or more levels of survivability.
Desirable Characteristics
A comprehensive survivability design process must consider:
- "essential" (or fallback) services which are necessary and sufficient for continuing operations during crisis conditions
- dissection of full-up normal services (from the legacy system) to determine which to keep and which to remove in a multi-level adaptive service scheme w/ precedence
- trading off decreasing levels of performance for increasing degrees of survivability
- developing system use cases, software processes and objects within the context of the service suite, performance levels and survivability levels of each mode
- evaluating performance impact to overall service relative to specific use cases
- developing streamlined process transaction models unique to each use case.
The following benefits are expected from such a comprehensive design process:
- allocation of service suites and commensurate performance levels which permit graceful degradation with increasing levels of severity of events (failures & attacks)
- achieving a high degree of reuse of legacy processes and managed objects
- achieving simplicity and efficiency in specified interfaces and APIs with per use case specification of transactions.
Summary
A comprehensive design process should result in an integrated and efficient manner in which to architect a new/ retrofit an existing inter-networked system with survivability features. Through the process, the designer should develop a good feel for the degree of severity of the (failure-related) crisis and (IW-related) threat conditions and how these factors impact the design of multiple levels of services/ performance corresponding to different levels of survivability. The process should also allow him to gauge the appropriateness of survivability mechanisms and consider practical aspects regarding cost and interoperability issues with other services and processes selected for the system.
References
- 1.
- "Survivable Network Systems: An Emerging Discipline", CMU/SEI-97-TR-013, R.J. Elision, DA Fisher, R.C. Linger, H.F. Lipton, T. Longstaff, N.R. Mead; Carnegie Mellon University (CMU) Software Engineering Institute (SEI), November 1997.
- 2.
- "Requirements Definition for Survivable Network Systems", R.C. Linger, N. R. Mead and H. F. Lipson,
- 3.
- "An Approach for Selecting and Specifying Tools for Information Survivability", CMU/SEI-97-TR-009, R. Firth, B, Fraser, S. Konda, D. Simmel, July 1998.
- 4.
- "Windows NT Clustering Service", R. Gamache, R. Short, M.Massa, Microsoft Corp., IEEE Computer, October 1998.
- 5.
- "Trends in Computer Attacks", Elias Levy, USENIX Login Security Issue, May 1998.
[16]
![Forwards to [17]](../all_the_pictures/arrow_right.jpg)







