In a field as digitally focused as cybersecurity, you’d think that the use of innovative technologies such as machine learning would be widespread, however, as yet that has not proved to be the case.

“If you look at what the security companies are doing, machine learning is embryonic. Nobody is doing it yet and nobody is doing it right,” says Misha Govshteyn, founder and senior vice-president of products and marketing at security-as-a-service cloud provider Alert Logic.

Despite the clear and significant potential for machine assistance in the detection of cybersecurity threats and trends, this has so far failed to materialise, not because of a lack of interest, but because the inherently siloed structure of the industry is preventing it from becoming a reality.

“You have to have all these elements in one place in order to actually get some sort of value out of it,” says Govshteyn.

The state of automated cybersecurity

In a world where self-driving cars are hitting the headlines, it may seem baffling that automation is not seeing widespread use in cybersecurity, but, Govshteyn argues, that’s exactly the situation we find ourselves in.

“We have self-driving cars but we don't have self-monitored security operation centres. Security operation centres are kind of behind the latest and greatest in car technology, right, the question is why,” he says.

“We have self-driving cars but we don't have self-monitored security operation centres.”

“I think Tesla is a good example of this: Tesla is not giving a machine learning engine to their customers to program and to tune. It's a supervised machine learning model: they're not asking their customers to go write that, it’s their job to make sure it works, but that's not the way it works in the security industry.”

Instead, he says that the security industry provides businesses with individual tools that they cobble together themselves, resulting in a patchwork of different products to which it would be very difficult to apply machine learning analytics, even if a company was willing and able to put in the time and effort to implement.

The data problem: What machine learning needs to work for cybersecurity

However, it is not just an issue of software integration. What machine learning also needs is access to large bodies of data, such as those that Google had access to when it built its analytics-based approach to search and revolutionised the way we browsed the web forever.

“Yahoo was originally a list curated by people, and then Google came in and Google changed the game because it was algorithmic, right? What blows me away is when we're talking about security, the industry is still very much like Yahoo; it’s a bunch of people making imperfect decisions,” Govshteyn says.

“What would it take for us to build that level of analytical approach to security? We would need a lot more data available to us about how breaches happened, what attacks are happening at this moment and so on and so forth,” he says.

“Those exchanges of data, they've got to get much more powerful. Right now they're not useful.”

And herein lies the biggest barrier to cybersecurity’s automation. Companies each hold their own data about incidents and very few opt to share it with the world. As a result, many incidents go unnoticed, and researchers are left to build their own databases of attacks that will ultimately be too small and too incomplete to warrant applying machine learning.

“Those exchanges of data, they've got to get much more powerful. Right now they're not useful.”

The failed attempts to bring machine learning to cybersecurity

While such databases are too small to produce meaningful results, this hasn’t stopped companies from trying.

“Having been watching and participating in the machine learning evolution for a while, there was a time where people thought 'hey, I'm going to get all this data, I'm going to throw it into a Hadoop database and then I'm going to stick machines on it and find out what's going on,” he says, referring to an open-source software framework for big data developed by Apache. “That has not borne fruit.”

“They're doing it within the confines of a single company,” adds Marc Willebeek-Lemair, chief strategy officer of Alert Logic. “When we do it, we do it with 4,000 customers, we get visibility into 4,000 clients. Even that's a small sample, but when you think about a large company doing machine learning with a sample size of one – good luck!”

Towards a machine learning future

However, if the datasets can be more widely integrated – a process that will require greater industry collaboration and perhaps even legislation – the potential of machine learning is huge for cybersecurity.

“One of the reasons I'm such a believer in [an integrated data model] is it more easily enables these two other factors. The first one is analytics: you've got to have the data to perform those analytics. The machine learning is just beginning. We've got some very nice results already, we see that happening, that is going to change the way we defend,” says Willebeek-Lemair.

“I think we're going to see some pretty dramatic improvements over the next five or ten years.”

“The second one is automation. Sure you've got that need for human experts, but there's a lot of activities that human experts perform today that they don't need to perform, that could be automated. And we've been investing also in that area, we make sure that when it’s time for a human to put their eyeballs on it, a lot of work has been done to minimise the energy and the effort that we need to now put into that.

“So we see those two things happening to facilitate the defensive side, I think we're going to see some pretty dramatic improvements over the next five or ten years as well.”

PR nightmares: Ten of the worst corporate data breaches

LinkedIn, 2012

Hackers sold name and password info for more than 117 million accounts

Target, 2013

The personal and financial information of 110 million customers was exposed

JP Morgan, 2014

One JP Morgan Chase’s servers was compromised, resulting in fraud schemes yielding up to $100m

Home Depot, 2014

Hackers stole email and credit card data from more than 50 million customers

Sony, 2014

Emails and sensitive documents were leaked, thought to be by North Korea im retaliation for Sony’s production of a film mocking the country’s leader Kim Jong Un

Hilton Hotels, 2015

Dozens of Hilton and Starwood hotels had their payment systems compromised and hackers managed to steal customer credit card data

TalkTalk, 2015

The personal data of 156,959 customers, including names, addresses, dates of birth and phone numbers, were stolen

Tesco, 2016

Hackers made off with around $3.2m from more than 9,000 Tesco Bank accounts

Swift, 2016

Weaknesses in the Swift payment system resulted in $81m being stolen from the Bangladesh Central Bank’s account at the New York Federal Reserve

Chipotle, 2017

Phishing was used to steal the credit card information of millions of Chipotle customers, thought to be part of a wider restaurant customer scam orchestrated by an Eastern European criminal gang