|
The CERT® Coordination Center (CERT®/CC) was responsible for incident response on the Internet during the period of this research. As such, it was important, as part of this research, to understand the extent and characteristics of the rapidly growing and changing Internet. These will be described in this chapter. This chapter also explains why the organizational level at which the analysis was conducted of the CERT®/CC records was the site level, which is the level where the CERT®/CC could expect to be working with the site administrator or other authority with responsibility for the computers and networks at that site. In addition, the growth of the Internet will be quantified for comparison to the trends in Internet incidents described in later chapters. The growth in the Internet has not been uniform across the top-level domains. While the number of hosts is growing in all of these domains, the growth in the commercial domains (.com, .net) is more rapid than the growth in those domains associated with education and government (.edu, .gov, .org, .mil). 2.1. Description and Origins of the Internet An internetwork, or internet, is a network of networks which has established methods of communication. The Internet is the "world's largest collection of networks that reaches universities, government labs, commercial enterprises, and military installations in many countries [Hug95:348]." Although the Internet connects large networks, such as those belonging to large communications companies, the Internet consists primarily of local area networks (LANs) [GaS96:456]. The principle method of communication on the Internet is the TCP/IP protocol suite (Transmission Control/Internet Protocol). The Internet, however, is increasingly becoming an environment with multiple protocols [Cer93:80]. The Internet is rapidly growing and evolving, which makes it difficult to define. Lynch and Rose describe it this way: The Internet community spans every continent across the globe. The Internet is so large that its size can only be estimated, and it is evolving so quickly that its rate of growth can only be guessed. It is so diverse that it uses hundreds of different technologies, and is so decentralized that its administrators don't even know each other. The Internet is an electronic infrastructure that enables intense communications between colleagues, competitors, and disciplines. Despite these extremes, the Internet community is bound together by a framework of computer communications networking protocols and infrastructure [LyR93:xiii]. The basis for the Internet was an experiment begun in 1968 by the Defense Department's Information Processing Techniques Office (ARPA/IPTO) to connect computers over a network in order to ensure command and control communications in the event of a nuclear war. The original network was known as the ARPAnet, and the project quickly became a "straight research project without a specific application [Lyn93:5]." In the 1980s, the number of local area networks increased significantly and this stimulated rapid growth of interconnections to the ARPAnet and other networks. These networks and interconnections are known today as the Internet [Til96:168]. 2.2. Internet Hosts and Domains Computers that communicate across the Internet are known as a host computers, or simply hosts [GaS96:455]. A host's connection to the Internet can be continuous or part-time, and it can be through dialup or direct connections [Lot96:defs.html]. Each host computer is identified by both a unique 32-bit IP address (Internet Protocol address) and a unique domain name. Each of these has two parts, one part that specifies the host computer, and a second part that specifies the location (either physical or organizational) of the host computer [ABH96:7]. 2.2.1. IP addresses - IP addresses are generally written as four decimal numbers vvv, www, xxx, and yyy, each between 0 and 255, and each representing an 8-bit octet of the address. The numbers are grouped together separated by "dots" (periods) in the form vvv.www.xxx.yyy, with the most significant (leftmost) digits representing the physical network, and the least significant digits (the rightmost ) representing the individual host. An example is 192.2.200.34. There are two predominant methods currently used to divide the 32 bits of an IP address into the host and network portions [CaS96:456]. The original addressing scheme was to use the first octet to identify the network and then to use the other 3 octets to identify the host. This limited the Internet to 256 networks. With the rapid growth in the number of LANs, this addressing scheme was abandoned in favor of an addressing scheme with three primary classes. This remains the most widely used addressing scheme [Cer93:91-92]. In this "classical" addressing scheme, the division between the network bits and the host bits are as shown in Table 2.1.
A newer Internet addressing scheme, the Classless InterDomain Routing (CIDR) method, has recently come into use. Using CIDR, the most significant k bits of each address specifies the network, and the remaining (32 - k) bits specify the host. The size of k is unrestricted [GaS96:458]. 2.2.2. Domain Names - Each host computer's domain name is a group of labels (words or letters) separated by dots. Domain names are assigned because users find it easier to work with symbolic names rather than IP addresses [Cer93:95]. Similar to IP addresses, domain names are divided into a host portion and a location portion. The leftmost label or group of labels identifies the host [Sob95:150], and the rest usually refer to the location. An example is howard.epp.cmu.edu, which is a fully qualified domain name because it has complete host and domain portions. 2.2.3. Domains - The network portion of IP addresses and domain names identify a partition of host computers. Both of these partitions are sometimes referred to as the domain of the host. This domain distinction was originally intended to separate the protocols in the Internet into two parts: an interdomain protocol between domains, and an intradomain protocol within domains [Per93:161]. This separation of protocols is not a universal distinction, which is part of the reason there is no generally accepted definition of domain. For some, the domain is the entire network portion of an IP address or domain name. For others, the domain refers only to the highest partition of the Internet into educational (.edu), commercial (.com), military (.mil), etc., networks. These are sometimes called the top-level domain names. Perlman states, however, that none of these definitions are particularly intelligible or accurate [Per93:180]." He suggests instead using a definition based more on functionality: a domain is a partition of networks "that is administrated by a single administrative plan [Per93:180]." A typical university or company illustrates the confusion between the terms IP address, domain name, domain, host, and network. An example is my computer at CMU, which was assigned an IP address of "128.2.19.200" and a domain name of "howard.epp.cmu.edu." As is usually the case, there is a direct correspondence between the host portions of both the IP address ("200") and domain name ("howard"). There is usually not, however, a one-to-one correspondence between the network portions of the IP address and domain. In this example, "128.2" indicates a large (class B) network at CMU and the "19" indicates a subnetwork within this large network. This is the most common IP address arrangement [Sob95:150]. In the domain name, "cmu.edu" indicates the host is on the CMU network, and "epp" indicates the host is administered by the EPP Department. This does not mean, though, that "128.2" = "cmu.edu" or "19" = "epp". While the "128.2" network is the largest network partition at CMU, "cmu.edu" identifies hosts on both this network and on 15 other networks. The "128.2.19" subnetwork contains most of the EPP department's computers, but "epp" computers are located on other subnetworks, and at least one other department has computers on the "128.2.19" subnetwork. In addition, CMU uses portions of each domain name to identify the functional location of the host computers. For example, the "128.2.19" subnetwork has computers that are identified as being in campus "clusters" for student use (such as "pc.cc.cmu.edu" or "mac.cc.cmu.edu" computers), in campus-wide functional networks (such as the "andrew.cmu.edu" UNIX network), or part of the campus "backbone" network ("net.cmu.edu"). CMU IP addresses and domain names also illustrate three other sources of confusion. First, many hosts on the Internet have multiple connections, and therefore one host can have multiple IP addresses, often on different networks. Second, different domain names can be assigned to the same host computer, and even the same IP address. Finally, a single domain name can refer to more than one IP address [GaS96:459; Lot96:notes.html]. IP addresses and domain names are related by keeping a list. At the local level, the /etc/hosts file on UNIX systems associate IP addresses and domain names for routing within networks. Specifically, this file lists the IP addresses, domain names and aliases for the computers authorized to be within a network. The Domain Name System (DNS), which consists of name servers on thousands of computers throughout all levels of the Internet, provides a hierarchically organized distributed database relating IP addresses and domain names for routing on the Internet. As shown in Table 2.2, the /etc/hosts file at CMU on September 7, 1996, listed a total of 19,888 IP addresses distributed among 16 large networks and 206 subnetworks. The actual number of computers at CMU is less than the 19,888 IP addresses because many of the computers have multiple IP addresses, and not all the computers are connected to the network at any one time.
The data in Table 2.2 puts this discussion in perspective by illustrating a fundamental distinction between IP addresses and domain names: the location portion of IP addresses correspond in general to the physical location of a host computer, while the location portion of domain names correspond to the organizational location. An example is the CMU campus-wide network of UNIX computers known as the Andrew System. These host computers can be found all over the CMU campus. The IP addresses of these computers reflect the subnetwork that they are physically connected to. As such, the Andrew System hosts near the EPP Department have subnetwork IP addresses of either 128.2.19 or 128.2.58. If they are connected to a different subnetwork in a different location, then their IP addresses have a different subnetwork number. In other words, the IP addresses of the Andrew hosts is based on their physical location. With respect to their domain names, however, every one of the Andrew hosts have a domain name of the form host.andrew.cmu.edu. This reflects their organizational location within the Andrew System and not their physical location. Another interesting example is the entry for the IP addresses beginning with 192.17 in Table 2.2. These hosts are physically located at CMU, but are functionally part of two other organizations: The Evolution Group (evo.org) located elsewhere in Pittsburgh, and the University of Illinois at Urbana-Champaign, Illinois (uiuc.edu). Shown also are two connections to the commercial part of the Internet: tartan.com and netbill.com. The network, subnetwork and host pattern described above is typical of large Internet sites. 2.3. Domain Name System (DNS) Terminology Returning to the question of what a domain is, Sobell defines domain to be a "name associated with an organization, or part of an organization, to help identify systems uniquely [Sob95:772]." This relates to the location portion of a domain name and not to an IP address. This is consistent with the Domain Name System (DNS) which identifies all of the domain name that is not the name of the host itself as the domain. In other words, the location portion of the domain name is defined to be the domain. Using the previous example, in the domain name howard.epp.cmu.edu, the domain is epp.cmu.edu. In the domain name like pc6.mac.cc.andrew.cmu.edu, the domain is mac.cc.andrew.cmu.edu.
![]()
The DNS database is arranged in a hierarchy, which is a name-space tree such as shown in Figure 2.1. Each node in the tree is identified with a label, and the domain name at each node is the ordered list of the label for that node, plus the label for every node on the path back to the to the top or root node of the tree (separated by dots) [Moc93:478]. For example, host6 in Figure 2.1 has a domain name of mac.cc.cmu.edu, which makes the fully qualified domain name host6.mac.cc.cmu.edu. Mockapetris defines a domain to be the subtree that is included under a domain name. For example, the cmu.edu domain is all the hosts located in the subtree under the cmu node as shown in Figure 2.1. Therefore, each node in the tree corresponds to a domain name (the path back to the root of the tree), and a domain (the subtree under the node). The concept in structuring the tree is that any portion of the tree "should parallel the administrative organization using it [Moc93:478]." The DNS terms host, domain, and domain name will be used for domain names in this research. The term domain will not refer to IP addresses. Instead, the terminology for IP addresses will be network, subnetwork, and host. For example, as stated earlier, my computer at CMU (with the IP address of 128.2.19.200), is on the "128.2" network, and the "128.2.19" subnetwork, and has the host number "200." This same computer (with the fully qualified domain name howard.epp.cmu.edu), is the host howard, in the epp.cmu.edu domain. Each of the nodes in the hierarchy of the DNS tree is also referred to as being at a specific level of the tree, with the domains at the highest level in the tree referred to as top-level domains. As of July, 1996, the DNS had 183 top-level domains. Of these top-level domains, one had a four-letter label (nato), and seven had three-letter labels: commercial (com), educational (edu), network (net), military (mil), government (gov), organization (org), and international (int). With the exception of int, these three-letter, top-level domains contained hosts primarily located in the United States. The remaining 175 top-level domain labels were the International Standards Organizations (ISO) two-letter country codes [Lot96:dist-bynum.html]. A point to be emphasized is that domain names do not necessarily indicate the physical location of the host (unlike IP addresses). Lottor gives the following caution regarding domain names and the location of the host: Note, there is not necessarily any correlation between a host's domain name and where it is actually located. A host with a .NL domain name [the Netherlands] could easily be located in the U.S. or any other country. In addition, hosts under domains EDU/ORG/NET/COM/INT could be located anywhere. There is no way to determine where a host is without asking its administrator [Lot96:notes.html]. The level of the tree where particular organizations are placed also varies, and therefore, this does not indicate the size of the organization. As an example, assume there is a commercial company called Widgets that has a host computer called pc1. If this company is located in the United States, its domain name might be pc1.widgets.com, and if it were located in Canada, it might be pc1.widgets.ca, both at the second level of the DNS tree. If the company were in the United Kingdom, a similar commercial domain name would be pc1.widgets.co.uk, one level further down in the tree. The host could be even further down in the tree, such as pc1.dept5.widgets.denver.co.us, which would indicate that the host might be located in Widget's Department 5 in Denver, Colorado. This illustrates that the level of a domain name does not necessarily indicate the size of the domain. 2.4. Site Names During the preliminary analysis of the CERT®/CC records, an attempt was made to conduct the analysis at the level of individual host computers. It was felt that, had this been possible, it would have provided the most detailed and useful information for analysis. This proved infeasible for several reasons. First, information on individual hosts was incomplete. The records for many incidents did not provide any information on individual hosts. When records had host information, it could generally not be determined if the list of hosts was complete. Attempts to estimate the data at the host level were also not successful. Second, even if the data were available at the host level, analysis at this level would have been very difficult. Take for example, CMU. As was previously discussed, CMU had nearly 20,000 IP addresses in 1996. This number alone illustrates that keeping track of incidents at the host level would be several orders of magnitude more difficult than keeping track of incidents at a higher level, such as the "CMU" organizational level. Finally, CERT®/CC personnel did not track incidents at the host level. They instead recorded information at an organizational level that matched their interactions with the organization involved in the incident. If a host computer at CMU were involved in an incident, then an incident record was opened in the CERT®/CC files for CMU and not for the individual host, nor for the specific organization where the host was located (such as "EPP"). The organizational level used to track incidents was generally referred to in the CERT®/CC records as a site. This is the level at which the analysis was conducted of the CERT®/CC records. More specifically, a site name was defined to be the domain name for the organization involved in the incident, and site referred to the domain under that site name. For sites in the United States and Canada, site names were generally at the second level of the DNS tree. Examples would be cmu.edu or widgets.com. In other countries, the site name was the third or lower level of the DNS tree, such as widgets.co.uk. A site was also the organizational level where the CERT®/CC could expect to be working with the site administrator or other authority with responsibility for the computers and networks at that site. Some organizations, such as larger universities and companies, were large enough to be physically divided into more than one location, with separate administration. This separation could not be determined from CERT®/CC records, because these different locations generally had the same site name. Therefore, different locations with the same site name were treated as one site. For some incidents, site names were not listed for all of the sites involved (around 6% of sites). These were typically not reporting sites, but other sites known to be involved. In these incidents, IP addresses of the other sites were often available instead. As discussed earlier, IP addresses do not have a direct correlation with domain names, and therefore they may have limited relationship to site names. However, for many organizations, there is a level of agreement between the network portion of the IP address and the site name. For example, IP addresses beginning with "128.2" were generally part of the "cmu.edu" domain. As such, it was assumed that the first two octets of an IP address corresponded to a site name when the actual site name was not available. 2.5. The Internet Domain Survey Lottor has estimated the growth in the number of hosts and domains on the Internet from 1981 through the period of this research. Between 1981 and 1986, this estimate was taken from the host table maintained at the Internet's Network Information Center (SRI-NIC) [Lot92:1]. After 1986, estimates were made using the ZONE (Zealot of Name Edification) program. The ZONE program gathered information by "walking" through the DNS tree as it recorded domain names and IP addresses, creating a table of hosts. The ZONE program repeated this process until the program had cycled through the entire list of domains without receiving any new information [Lot92:2-3]. Counting hosts that have multiple domain names or IP addresses more than once is prevented by the groupings in the DNS. The number of domains is determined by including all domains referenced by a record in the DNS [Lot92:4]. This process is assumed to underestimate the number of hosts. This is primarily because not all hosts on the Internet are registered in a domain server. On the other hand, errors and duplicates (under different names) in the DNS cause the results of ZONE to be higher. The former effect (underestimate) is seen by Lottor to be the larger effect. Manual scanning of the data indicates that the additional entries are insignificant compared to the missing entries. ZONE data can thus be viewed as the minimum number of Internet hosts, and not the actual figures [Lot92:3]. Lottor's evaluation of the accuracy of the ZONE program and it's ability to estimate the number of hosts and domains is as follows: We consider the numbers presented in the domain survey to be fairly good estimates of the minimum size of the Internet. We cannot tell if there are hosts or domains we could not locate. In summary, it is not possible to determine the exact size of the Internet, [or] where hosts are located . [Lot96:notes.html] At the time of this research, the Internet Domain Survey was produced by Network Wizards. The data was available on the Internet at http://www.nw.com/ [Lot96:report.html]. Statistics prior to 1992 were found in Request for Comments (RFC) 1296, published by SRI International, and also available at the same Network Wizards Web site [Lot92]. 2.6. Estimated Growth of the Internet As of July, 1996, the Internet connected together a minimum of approximately 13 million host computers [Lot96:report.html]. The Internet's current growth rate, shown in Figure 2.2, results in its size doubling every 12 to 15 months [Lot96:notes.html].
![]()
If this current trend continues, this would
result in the Internet having around 200 million host computers
at the turn of the century (January, 2001), as shown in Figure
2.3. A common method of estimating the number of people
that use the Internet host computers is to multiply the number
of hosts by a factor of 10 [Mer95:history.hosts].
This seems to be a high estimate, particularly considering the reduced percentage of Internet hosts that are found at educational institutions (discussed later in this section). This is because students would tend to share hosts computers more than other classes of users, such as users at commercial sites or in private homes. In any case, the number of users would certainly be greater than one user per host computer, and therefore, it is possible that between 200 million to 2 billion people will be using the Internet by the turn of the century.
![]()
The growth in the Internet has not been uniform across the top-level domains. For example, most of the three-letter, top-level domains contain hosts predominantly in the United States. Figure 2.4 shows the growth of these domains. While the number of hosts is growing in all of these domains, the growth in the commercial domains (.com, .net) appears more rapid than those domains associated with education and government (.edu, .gov, .org, .mil).
Figure 2.5 shows the natural logarithm of the same data in Figure 2.4. Table 2.3 shows the estimates for the slope of these lines obtained from linear regression, and the percentage these slopes are greater than the slope for the .edu domain. The growth in the .com domain was about 30% greater than in the .edu domain, but the most significant growth was in the .net domain, which was 144% greater.
These trends can also be seen in the entire Internet. Figure 2.6 shows the size of all of the top-level domains as a percentage of the entire Internet. The domains with predominantly U.S. government hosts (.gov, .org, .mil) have declined as a percentage of the total Internet from about 13% in 1991, to 9% in 1996. The trend is even more pronounced in the U.S. educational institutions which have declined as a percentage of the total Internet from about 36% in 1991, to 16% in 1996. Growth has been experienced in the top-level domains that contain primarily North American commercial hosts (.com, .net, .us, .ca) which have grown from approximately 29% to 42%, and in the other domains located outside of North America, which have grown from 22% to 33%. These last two domain groups now represent 75% of the Internet.
![]()
As discussed previously, CERT®/CC incidents were analyzed at the site level. The Domain Survey estimates both the number of hosts and the number of domains. The site level is between the top-level domains and the lowest-level domains in the DNS system, both of which were estimated by the Domain Survey. The trends in both Internet hosts and Internet domains as estimated by the Domain Survey will be compared to the trends in incidents at the site level in later chapters. As such, it is appropriate to examine the trends in Internet domains.
![]()
Figure 2.7 shows the growth in the number of Internet Domains in the DNS system. As of July, 1996, there were estimated to be 488,000 of these domains. The average growth rate in domains was 36% per year, but in the first half of 1995, it was 69% per year, and during both the second half of 1995 and the first half of 1996, the growth rate was 100% per year. The trend in domains looks similar to the trend in hosts (Figure 2.2), but there are significant differences. The number of hosts per DNS domain has declined in the last three years as shown in Figure 2.8. Perhaps this trend reflects the increased growth in the .com and .net Internet domains. A new commercial site is more likely to have less hosts per site than either an established commercial or educational site. This may also reflect an increase in domain names that was not accompanied by an increase in IP addresses (recall that an IP address may have more than one domain name). For example, several organizations may share a host computer and its access to the Internet, while appearing to be separate sites, and also appearing in DNS servers as separate domains.
![]()
One final trend of interest is the change in the World Wide Web, an Internet service that has grown rapidly in the last few years. The Web has its origins in research by Berners-Lee at the European Physics Laboratory (CERN) beginning in 1989. He created client-server software for conveniently publishing and retrieving formatted documents on the Internet. The client portion of this software is commonly called a Web browser. Documents are published at sites with Web server software and are retrieved using one of these Web browsers [Til96:140]. A Web site is not the same as an Internet site. An Internet site was defined previously to be a network of computers under the administrative control of an organization. A Web site is instead a set of files on a host computer that can be linked to over the Internet using a Web browser. There may be numerous Web sites on a single network or on the same host computer.
The growth in the World Wide Web was estimated by Matthew Gray of the Massachusetts Institute of Technology as shown in Table 2.4 and Figure 2.9 [Gra96]. The World Wide Web grew significantly faster than the Internet, although that trend had been slowing. In the second half of 1993, the Web was doubling in less than three months. The 1995 growth rate resulted in doubling in under 6 months, which was more than twice the growth rate of the Internet [Gra96].
![]()
2.7. Summary of Internet Characteristics The Internet is the world's largest network of networks. It consists primarily of local area networks that communicate with each other using the TCP/IP protocol suite. Computers that communicate across the Internet are known as a host computers, or simply hosts. Each host computer is identified by both a unique 32-bit IP address (generally written as four decimal numbers vvv, www, xxx, and yyy, each between 0 and 255) and a unique domain name (a group of labels separated by dots). IP addresses and domain names are both divided into a portion identifying the host, and portion identifying a partition of host computers. For IP addresses, this partition is known as a network. For domain names, it is known as the domain. The Domain Name System (DNS) provides an Internet service that relates domain names to IP addresses. The DNS terms host, domain, and domain name will be used for domain names in this research. The terminology for IP addresses will be network, subnetwork, and host. As of July, 1996, the DNS had 183 top-level domains. Of these top-level domains, one had a four-letter label (nato), and seven had three-letter labels: commercial (com), educational (edu), network (net), military (mil), government (gov), organization (org), and international (int). With the exception of int, these three-letter, top-level domains contained hosts primarily located in the United States. The remaining 175 top-level domain labels were the International Standards Organizations (ISO) two-letter country codes. The organizational level at which the analysis was conducted of the CERT®/CC records was at the site level, which is the level where the CERT®/CC could expect to be working with the site administrator or other authority with responsibility for the computers and networks at that site. The analysis of the CERT®/CC records was not conducted at the level of host computers for three reasons: information on individual hosts was incomplete, an analysis at this level would have been very difficult, and CERT®/CC personnel did not track incidents at the host level. Lottor has estimated the growth in the number of hosts and domains on the Internet since 1981. Since 1986, estimates were made using the ZONE (Zealot of Name Edification) program. As of July, 1996, the Internet connected together a minimum of approximately 13 million host computers. The Internet's current growth rate results in it's size doubling every 12 to 15 months. If this current trend continues, this would result in the Internet having around 200 million host computers at the turn of the century (January, 2001). The growth in the Internet has not been uniform across the top-level domains. For example, most of the three-letter, top-level domains contain hosts predominantly in the United States. Figure 2.4 shows the growth of these domains. While the number of hosts is growing in all of these domains, the growth in the commercial domains (.com, .net) appears more rapid than those domains associated with education and government (.edu, .gov, .org, .mil). These trends can also be seen in the entire Internet. The various Internet growth rates are summarized in Table 2.5.
[2]
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














