From Machine Learning

Classifying Phishing

Classifying Phishing URLs Using Recurrent Neural Networks

Share Button

Easy Solutions data scientists, including the author of this article, will present extensive research on the use of deep neural networks to detect phishing sites. They will share their expertise at the prestigious APWG eCrime 2017 Symposium on Electronic Fraud Research today in Scottsdale, Arizona. The article below details some of the findings our experts will discuss at the symposium. Read more

Fake news and digital trust

Fake News and Digital Trust: How to Take Back Control of the Web from Cybercriminals

Share Button

Last year was an unprecedented time for cyber security and fraud with a record number of exploited vulnerabilities and high-profile breaches. Read more

Machine-Learning Misconceptions

Debunking Machine Learning Misconceptions

Share Button

Machine learning has never been more prevalent and accessible than it is today. It has managed to influence and boost several industries and markets. Retailers use it for product recommendations, email providers use it to filter spam, social networks use it for face recognition and sentiment analysis, and the list goes on and on. The cyber security industry is heavily invested in including machine learning in its arsenal against malicious actors. It is very difficult to find a vendor that is not claiming to use machine learning somehow.

In the past, a lot of human effort was invested in developing strong domain knowledge and translating it into form of signatures, rules, black/white lists or static correlation patterns that could be shipped into a usable product. Today, we are strengthening our protection layers by adding predictive capabilities based on powerful algorithms that are able to extract knowledge from apparently unrelated or obscure datasets and identifying relations that span over time, places and actions.

Machine learning is poised to conquer those challenges where human analysis capabilities and static systems are limited. It is now providing us with unprecedented capabilities, allowing us to make sense of huge volumes of unstructured data coming from sources as diverse as user interaction, transactional data, network activity, phishing history and endpoint detection systems.

Unfortunately, thanks to its own success in facing hard challenges and its enormous potential, machine learning is accompanied by massive hype, positioning it as a magic box able to effortlessly produce definitive results. It has led to some very high expectations about performance and often to disappointment for misguided consumers. In fact, machine learning, when applied to cybersecurity, is typically surrounded by many misconceptions.

The points below highlight some of the misconceptions users should be aware of before embracing machine learning in order to avoid failures:

Machine learning does not have the ability to create knowledge; it extracts knowledge. Only when fed with sufficient quality data is machine learning able to reach its true potential and outperform traditional approaches. Accuracy and size of data are critical for successful application of machine learning.

If your company decided to invest in machine learning, it is important to develop data consciousness across the whole organization and specifically in all those areas involved in the detection and mitigation of incidents. All the data that may be relevant when dealing with security or fraud incidents must be tracked and labeled meticulously. Both anomalous and normal are very relevant.

It is additive technology, not foundational technology. While marketers believe that machine learning outperforms all existing systems, it’s important to be cautious and set reasonable expectations. Do not throw away old playbooks and replace them with a shiny new machine-learning algorithm. Successful defensive strategies have never relied on a single layer of protection, and it is not going to change any time soon. Be sure to incorporate machine learning into a robust, multi-layer protection strategy. Remember, it offers the best chance of catching attacks that elude static preventive defenses, so it is usually a good compliment for those static systems that companies have fine-tuned through years of expertise.

Performance assessments produced by data scientists tend to be elusive, so be sure to understand their essence.  Get used to terms such as false positives rate, true positives rate, precision and F-Score. Those terms are very important when adjusting the model to fit specific needs.

One of the main sources of disappointment when using machine learning comes from false positive and alert rates. Predictive capabilities always come at some cost. It is frustrating to implement an algorithm when its benchmarks state that it provides outstanding performance only to find that it is exhausting your operational capacity.

Evaluating a machine-learning model by throwing to it a couple of non-representative examples is unfair and deceptive. Good machine-learning models are evaluated by performing well-designed statistical tests and using a significant data samples. It means that the performance is evaluated by running the algorithm several times using a large data set which is a good representation of the real-life problem. When planning to evaluate a machine-learning model, be sure to ask your vendor how they evaluated the model and to run a valid process with your own data.

Organizations should be coherent with their operational setup needs. If a vendor states the algorithm only has a 2-percent false positive rate, map it to the proportions of the operation. If you feed such an algorithm with 1 million events each day, understand that about 20,000 of its daily alerts may be false positives.

Machine-learning output is not always easy to explain. The cybersecurity industry is accustomed to rules, blacklisting, fingerprints and indicators of compromise, so explaining why an alert went off is easy and natural. In contrast, machine-learning models are able to identify patterns in large data sets, extrapolate answers and make predictions based on non-trivial compositions. This makes it nearly impossible to get a feeling for its inner workings. Tremendous effort is being invested in discovering ways to explain the output of machine-learning models, however, these state-of-the-art systems only offer educated estimations in terms of interpretable explanations and may not always be followed strictly.

The takeaway

If applied correctly, machine learning can dramatically augment an organization’s ability to fight off sophisticated cyber-attacks while deriving more value out of security data and threat intelligence. But be ready to evolve quickly. Our adversaries are skilled minds, and each day they become more adept at how machine learning works. They will be ready to retaliate by trying to circumvent the most advanced defenses. Machine learning improves over time if organizations are able to let it evolve by leveraging updated data. Be sure the operational setup is always tracking both successful and failed predictions of the model, so that it can be retrained and evolve quickly.


An Answer to the Failing Blacklist System

Share Button

This post is part of our continuing series of exploring and integrating new probabilistic tools for fraud prevention. Read the first and second installments.

The most common way to mitigate phishing attacks is by warning end users that they have navigated to a phishing website, for which the browser must Read more

Fraud Detection

Fraud Detection That Accounts for Misclassification Using Cost-Sensitive Logistic Regression

Share Button

This article is a continuation of a previous post entitled “Evaluating a Fraud Detection Using Cost-Sensitive Predictive Analytics.” Read more

Cost sensitive predictive analytics

Evaluating a Fraud Detection Using Cost-Sensitive Predictive Analytics

Share Button

A credit card fraud detection algorithm consists in identifying those transactions with a high probability of being fraudulent based on historical fraud patterns. Read more