Applying Machine Learning To Advance Cyber Security Analytics – Part 2


The second trend is the lack of qualified, experienced individuals to successfully defend vital infrastructure and systems. The defensive game is complex and never ending; and one slip up by a security team can be enough to open the door for a security incident. In addition, the projected demand for excellent security professionals will continue to grow, compounding the current challenges around the dearth of talent.

Given these two points, machine learning techniques are a great fit to improve the security posture of an organization. And in fact, there are probably machine learning approaches implemented at some level in your organization. But what we should see over the next couple of years is a vast improvement in current state-of-the-art machine learning in cyber security, and an increase in the number of areas where machine learning techniques are prevalent.

As an example of what the impact of improved machine learning will bring to cyber security, let’s consider the case of an analyst responsible for an incident response case. In this example, a network has been penetrated and malware has been placed on various machines in the network, with the purpose of exfiltration of sensitive information. The analyst in this case is charged with multiple tasks here; discover what exactly has been stolen, how it was stolen, and repair the system to prevent the same or similar attacks again.

Without the help of any form of machine learning system, the analyst would have a difficult time resolving these issues in a short timeframe. For example, to determine what has been stolen, perhaps file access logs or network traffic would be reviewed by the analyst, looking for access to sensitive files, or large amounts of data flowing out of the network. To determine how the attacked gained a persistent foothold in the network, malware analysis of the disk may be needed to try and track down known malware samples using signatures developed by other human analysts. Or perhaps an analysis of the running system, looking for unusual processes running or other anomalous behaviours would be conducted as part of the incident response.

With a machine learning approach, many of these tasks can be automated, and even deployed in real time to catch these activities before any damage is done. For example, a well-trained machine learning model will be able to identify unusual traffic on the network, and shut down these connections as the occur. A well-trained model would also be able to identify new samples of malware that can evade human generated signatures, and perhaps quarantine these samples before they can even execute. In addition, a machine learning model trained on the standard operating procedure of a given endpoint may be able to identify when the endpoint itself is engaging in odd behaviour, perhaps at the request of a malicious insider attempting to steal or destroy sensitive information.

Currently, a large majority of machine learning approaches in cyber security is used as a type of “warning” system. They often require a human in the loop to make the final decision. This requirement is usually the result of machine learning models that are not sufficiently accurate, to the point where a typical human analyst is more accurate. As a result, the analyst has the final decision due to their lower false rates.

But what we are starting to see, and projecting to become increasing common, are machine learning systems that are in fact more accurate than their human counterparts. This is happening due to not only the improvement in machine learning, but also to the difficulty in growing the cyber security analyst human talent pool. As an example, consider a SOC, where operations often last 24 hours. It may not be possible to have a exceptional security analyst around at all times for the purpose of analyzing potential malware threats. In some cases, a junior analyst will be tasked with making threat decisions. Being junior, they are expected to have a higher error rate in their ability to assess threat. In this case, it might be better to trust a machine learning solution that is proven to be as effective as an exceptional analyst.

In the cyber security industry at the moment, the answer to if one should trust machine learning over human analysis is often ‘no’. To some extent, a shift in the way we think about technology and its capabilities needs to occur before we fully trust the next wave of machine learning systems. Perhaps this is more a matter of trust. It’s easy to cultivate a relationship based on respect and trust with your peers in the cyber security industry. But to develop the same trust with a black box machine learning model will take time, and will only shift after repeated successful results from these systems.

The next few years will be interesting in the cyber security landscape. The massive amounts of data that can be generated, along with the problems of conducting large scale analysis to find the proverbial needle in the haystack, are the perfect combination for extensive and successfully machine learning architectures.