From Real-Time Learning to Reinforcement Learning with Asynchronous Feedback

Share Button

Online, or real-time, transactional fraud detection systems have recently created quite the buzz in the info security industry. They are an appealing concept: Because we know that fraud patterns change over time, the ability to use machine-learning algorithms to automatically learn new patterns instantly allows us to have a stronger defense system. We often find completely new types of fraud, and training a system to detect attacks using only historical, out-of-date information can cause it to miss newer patterns. Real-time systems allow us to adapt more easily to such changes.


Going back in time a bit, we can see in this figure how fraudulent transactions exhibit variable behavior: In June 2012, the number of illegitimate transactions suddenly doubled in relation to the total number of transactions carried out.Reinforcement Learning

The fraud patterns learned between January 2011 and May 2012, therefore, could not serve as a good representation of what was happening at the end of 2012. This constant evolution of fraud and fraudsters is the biggest motivation for using a real-time learning system.

However, when we look at the academic literature on reinforcement learning (the technology behind online and real-time learning), we find that in order to have a system that learns in real time, it must receive immediate feedback. In fact, the existing theoretical models do not allow the system to make new predictions until it receives feedback from the previous case. Feedback, in the sense of reinforcement learning, refers to teaching the system when it has made errors. In the case of transactional fraud detection, this consists of alerting the system when it has predicted a legitimate transaction to be a fraud, or a fraudulent transaction to be legitimate. However, is it really possible to give this feedback to the system in real time? First, let’s analyze a simplistic representation of a transaction flow:

Reinforcement Learning

From this image, we can see that the security system made a decision regarding whether to approve or decline a transaction, but there was no immediate feedback to the security system. Normally, the feedback is sent to the system in two different stages. First, when there is false positive, meaning a transaction was labeled as fraud when it was actually legitimate, the feedback arrives that same day; either an analyst contacts the customer to confirm the transaction, or that the customer calls customer service to inquire about a blocked transaction.

On the other hand, when a transaction is predicted as legitimate but is actually fraudulent, the feedback takes between 15 and 45 days to arrive. This is because we must wait until the customer realizes that fraud has been committed and consequently informs the company.

Reinforcement Learning

These two issues leave us with a problem called ‘reinforcement learning with asynchronous feedback,’ and it requires using probabilistic classification methods to deal with the fact that we cannot be 100 percent sure of the results of the systems’ algorithms.

By highlighting why building real-time learning systems for the purpose of fraud detection is impractical and unrealistic, we hope to help you better understand what machine learning can really do, and what can be done to actually incorporate recent fraud patterns into a model. There is still the need to construct reinforcement learning systems that handle asynchronous feedback, but that is still an open question that needs to be answered. In the meantime, make sure to constantly evaluate how your fraud detection models are behaving, and be ready to retrain your models as soon as you see early alerts.

Leave a Reply

Your email address will not be published. Required fields are marked *