This presentation will be about profiling Voice of Internet Protocol (VoIP) traffic in flow. I have been working on a VoIP Training class for our customers and wanted share that knowledge with the greater flow community. Examples given will be heavily obfuscated traffic from various sources but will be based on real data. The following key technical points will be covered: Even though they are many different protocols being deployed you can see in flow that they essentially function in the same way. Comparison of SIP SCCP and Skype protocols how they appear flows with great sensor placement. The parts are broken down into the following:
- Communication between the user agent (phone) and the user server (server database)
- Signaling to find the distant end
- Conversation between end points
- Comparison between exfiltration voice flow and human conversation voice flow.
- Data exfiltration using a continues sound codex over a VoIP channel looks like in flow. The exfiltration codex transformed .txt file into a .wav file allowing it to be replayed through VoIP channels. This particular home grown codex provides continuous audio stream. In the flow records you can see the continuous one way stream of data from one end point to another.
- The human element what voice communication looks like in flow. This will explore the fact that most human conversations are two way. Must like HTTP traffic in where you can see human interactions with the web server within VoIP traffic you can see a conversation happen in a limited capacity depending on the time stamps.
Distributed Summary Statistics with Bro, Vlad Grigorescu
When analyzing network traffic a number of questions have historically been too difficult to answer not just in realtime but also in post-hoc analysis. As the amount of traffic has increased so too has the complexity of queries such as "Which of my machines have talked to the largest number of unique external IPs?" "Which HTTP POST requests had the largest increase in volume from yesterday?" or "What users had the highest number of failed DNS queries?" While the bottleneck for these types of queries has traditionally been the high memory cost a new class of probabilistic algorithms can summarize billions of elements in a couple of kilobytes of RAM. The fact that these algorithms are also mergeable allows for distributing and scaling these calculations to the 100-gigabit level and beyond. Designed specifically to summarize huge datasets Bro summary statistics offer a fresh view of network activity. With this new framework the answers to these types of questions only require a few lines of Bro scripting while making use of Bro's advanced protocol analysis to summarize anything from layer 2 through 7.
This talk is an example-heavy look at the new framework designed to teach attendees how to write such scripts. No previous Bro experience is necessary. Users are encouraged to bring questions about their networks that they've been dying to know but have never before been able to calculate.
10:00-10:30 Morning Break, Demo Room
Technical Session: Big Analysis
10:30 – 12:00
Data Fusion at Scale, Markus De Shon
The network flow data analysis community has long recognized that robust network defense requires fusion with other comprehensive data sources such as DNS, log data, threat data, system-level metadata and (ideally) full packet capture. In such a complex enterprise, it helps to have a clear conceptual framework for the analytic goals, to guide the design and reveal missing capabilities. I here present such a framework based on Multisensor Data Fusion, with specific consequences for data collection and analysis.
Streaming Analysis: An Alternate Analysis Paradigm, John McHugh, John Zachary, Andy Freeland, Dougles Creager, RedJack LLC
Since the development of the SiLK tools early in this century NetFlow data has been a mainstay of network security analysis in DoD DHS and some commercial settings. As network traffic volumes and rates have increased complete capture and retention at the packet level has become impossible and even NetFlow is straining the ability of archiving and analysis facilities. One of the problems is that the vast majority of the traffic captures is of little or no interest from a forensic or analytic standpoint. Streaming analysis offers an alternative to capture and archive approaches. Streaming analytics perform analysis on the fly preferably close to the origin of the data where volumes may be more tractable. The analytics can be broadly based or narrowly targeted. They lend themselves to distributed computations taking advantage of inexpensive multi-core processors and cheap memory. Streaming computations must operate in a single pass over the data and can retain only limited state. This requires rethinking of some computational processes in order to avoid requirements for multiple passes and excessive state retention.
In this talk we will present the results of prototype implementations of two streaming analytics initially implemented in an IBM InfoSphere Streams processing environment and currently being reimplemented in RedJack's Fathom streaming environment. One produces a statistical characterization of a packet stream producing partial statistics (counts sums sums of squares) for some 260000 variables (ports protocols subnets etc.) at one minute intervals. This can be used to create models to identify outbreaks of abnormal activity that could be used to generate alerts or control selective capture. The other is an implementation of the TRW scan detection algorithm extended to create and maintain service specific oracles that allow the detection of both horizontal vertical and mixed TCP scans ICMP scans and non protocol specific host scans including those based on UDP. This could be used to block or blacklist scanners at the border of the monitored network and partial results can be combined to detect distributed or very slow scans.
What Does "Big Data" Even Mean?, Joshua Goldfarb
Enterprises today do a reasonably good job instrumenting their networks for data and log collection -- so much so that they find themselves bombarded by incredible volumes of data sourcing from a diverse ecosystem. Talk about "Big Data" is just about everywhere these days but what does "Big Data" even mean and how can we exploit it to meet our needs? This talk discusses hands-on proven techniques for tackling the "Big Data" challenge and also touches on optimizing workflow integrating intelligence and producing high fidelity actionable alerting.
12:00 – 1:00: Lunch, Colonial Ballroom
Technical Session: Measurements and Metrics
1:00 – 2:30
Analysis of Some Time-Series Metrics for Network Monitoring, Soumyo Moitra
In this presentation we present a method and metrics to enhance network Situational Awareness. Since network Situational Awareness (SA) is primarily concerned with monitoring trends and changes in traffic patterns the analysis of traffic over time or time series data is a key element. Therefore metrics related to time series analysis play an important role in SA. In particular correlations over time or the autocorrelation functions need to be analyzed.
However these correlations the autocorrelation function and other time series metrics need to be interpreted with respect to the time window and time scale being considered. The presentation will discuss the autocorrelation function under different time scales and the inferences we can make from it. The inferences from changes in the autocorrelation with changes in time scales can shed light on the presence of short-term or long-term dependencies in traffic patterns. This issue has been identified as important for anomaly/intrusion detection in the literature.
We report on the findings from an analysis of flow data that investigates this issue. We first construct an initial time series of traffic volumes over a given time window. Then we estimate the autocorrelation function for this series. Next we vary the time scale and estimate the corresponding autocorrelation functions for these new time series. Finally we compare these autocorrelation functions in relation to their time scales and develop a metric to quantify the differences. This metric can be tracked over time that is over successive time windows. This approach could detect attacks and intrusions that do not perturb the network traffic in other discernible ways and thus may not be identified early enough by other detection techniques to enable effective mitigation. This method also allows us to distinguish between short-term and long-term dependencies within the traffic patterns. This distinction is important for selecting the appropriate techniques for further analyzing network traffic. The analyses are illustrated with publicly available data.
PCR - A Flow Metric for the Producer/Consumer Relationship, John Gerth
The classic fields reporting packets and bytes in flow records have long been used by network engineers and analysts to gauge both the health and operational characteristics of networks. For example it is well-known that the packet counts in TCP connections are typically balanced due to the end-points needing to send frequent acknowledgments even though the application level data transfer may be asymmetric. One aspect of communication that is interesting not only for traffic engineering but also for security is the producer/consumer relationship.
That is what roles do the participants in a communication play; data producer data consumer or both? In typical client-server associations it is tempting to assign the role of producer to the server and consumer to the client but this view is too simplistic. For example a file server frequently oscillates between the two roles depending on whether files are being read or written; or from a security perspective the transformation of a node from being a consumer to being a producer may be the only wire-line evidence that it has been compromised for data exfiltration. In the study below we look at these relationships in more detail and slice them in different ways.
In order to get to the semantic relationships we define a new flow metric for the producer/consumer relationship based on the application data bytes exchanged.
...PCR Metric definition The metric is based on the bytes transferred between two end-points not including the bytes required for transport purposes. The Argus system conveniently provides these values in the optional 'sappbytes' and 'dappbytes' fields and we will use these for this study.
Our producer/consumer metric is a value from
-1.00 to 1.00 defined as
pcr = (sappbytes-dappbytes)/(sappbyte+dappbytes)
Thus a pcr = 1 defines the src as a pure producer; pcr = -1 defines the dst that way; while pcr = 0 indicates perfectly balanced exchange. This normalized dimensionless ratio helps compare entities which may be operating similarly but at very different volumes. Likewise it can be computed not only for a single flow but also over all the flows of a host or subnet or port - any categorical flow variable. When aggregated this way the metric is a measure of export/import activity. In this presentation we look at uses of the PCR across a range of variables in traffic drawn from a large production network.
Analyzing Flow Using Encounter Complexes, Leigh Metcalf
Collecting flow for any length of time can lead to an overwhelming number of records especially if one is attempting to find anomalies. Creating a method that will reduce the number of records that require examination and highlighting anomalous flows is the goal of this paper. We do this by using encounter traces to form an encounter complex and examining the result.
2:30 – 4:30: Demo and Poster Session, Demo Room
Connect with FloCon