Machine Learning in Anti-Money Laundering (AML) – Risks and opportunities

One of the biggest problems in AML today is the huge number of false positives, estimates indicate that the average, of false positives being generated, in the industry, is between 95 and 99%, and this puts a great burden on Financial Institutions. Investigation of false positives is time-consuming and costs money. A recent study found that banks were spending close to 3.01€ billion per year investigating false positives. Institutions are seeking for more efficient approaches to combat crime and, in this context, Machine Learning can prove to be an invaluable tool, but…

We must not neglect the main purpose of AML Software, to prevent money laundering and the financing of terrorism! All actions should converge towards this goal and, for this reason, most software is overly conservative when generating alerts, it is better to analyze more alerts than necessary than risk missing a single money laundering crime.

Next, we would like to propose four steps, of increasing complexity, which could be used to integrate Machine Learning in an already existing AML Software.

  1. Classify alerts according to their priority

The goal is to learn with alerts created in the past, predict the probability of new alerts being true positives, and classify them accordingly. Considering the unbalanced nature of existing information – false positives occur overwhelmingly more often than true positives – it becomes a serious challenge to train adequate machine learning models. Choosing a good metric to evaluate the performance of the model is paramount.

Accuracy is an obviously bad choice for a metric, if we consider that 99% of all alerts are false positives, then creating a model that simply labels all alerts as false positives will achieve 99% accuracy. This is great but… it would miss all the true positives which would be catastrophic!

Precision measures how precise the model is, out of those predicted positive how many of them are actually positive. This measurement is important when trying to minimize false positives, which is an important goal in AML, but not the most important one.

Sensitivity measures the proportion of actual negatives that are correctly identified as such (e.g. the percentage of false positives that are correctly identified as not being suspicious). This does not seem like a very good metric for us because, although we are trying to minimize false positives it is not the main goal, in this sense precision is a more interesting metric.

Sensitivity/Recall calculates how many of the true positives were actually captured by the model, it is the most important metric because missing a true positive is the one thing that must be avoided at all costs, simply put it cannot happen! The consequences of missing criminal activity are too high. This does not mean that Sensitivity should be maximized at the expense of all other metrics though, one could easily achieve 100% sensitivity by simply labeling all alerts as true positives, but this would yield an accuracy of 1% since we would be mislabeling all the false positives which, while not catastrophic, is undesirable and would provide no additional benefit over traditional AML mechanisms.

In conclusion, it seems like the most interesting model would be one that achieves a good balance of Accuracy, Precision, and Sensitivity while maximizing Sensitivity.

The overall risk of introducing Machine Learning at this point is very low, or even nonexistent, if we consider that the alerts being created are still the result of traditional, rule-based transaction monitoring, mechanisms then all the alerts that were being previously created will still be created, and their analysis will still be integrally performed by humans. The only thing Machine Learning is doing in this regard is aiding the human in their analysis. It probably is a very slight help at this point, yes, but it is an easy introductory step with no real repercussion in case the model proves to be inadequate.

Question:  Is it acceptable that an algorithm should be capable of autonomously deciding to close an alert if it considers there is enough evidence to dismiss suspicious activity? In practice, can the analysis be, in some cases, be transferred from a human to an algorithm that emulates the human thought process?

  1. Classify customer activity and predict if alerts should be created

The goal is to learn with historical activity, which has led to alerts in the past and predict the probability that future activity will lead to the creation of new alerts. Machine Learning techniques can be used to completely control and dictate the creation of alerts, or a more conservative approach can be used by means of a hybrid between machine learning and traditional techniques.

In technical terms this step is very similar to the previous one, the same genre of algorithms and techniques will be used, the key difference is that instead of analyzing alerts we are now analyzing customer activity. This can have a very direct and impactful consequence on the creation of alerts so, the risks associated to having an inadequate model are bigger. Again, we want to maximize the detection of true positives but, this time we can no longer hide behind what was already being produced by traditional techniques, we want to be absolutely confident in the capacity of our model to capture suspicious activity.

The model should be able to, at least, capture the same suspicious activity as before, but ideally, it should be able to discard activity that is not truly suspicious (less false positives).

  1. Produce “explainable” conclusions

“The ability to demonstrate and audit compliance is a cornerstone of the current AML framework and, for that reason alone, the transparency of AI and the underlying algorithms is one of the key areas of debate. AI is inherently statistical in approach and hence can often be found to perform well in aggregate. However, this is often at the cost of greater complexity in the model and reduced ability to extract rationale and reasoning for outcomes”

Artificial Intelligence algorithms, and Machine Learning algorithms, in particular, are typically black-boxes, they can be very efficient and accurate in producing results but they do not provide an explanation for how the results were arrived at. This is problematic for several reasons, in first instance without a rationale behind the conclusions of the algorithm it may prove exceedingly difficult to validate its results, even if they are very accurate, compliance experts that deal with AML procedures daily should also be able to understand the results in order to perform their job and, ultimately, reports that are delivered to competent authorities or regulators must present a clear explanation of the decisions that were made.

From a technical standpoint this is probably one of the most challenging aspects when trying to apply Machine Learning to AML, it is an area that is increasingly gathering interest as of late and many advances are being made in the industry, still, full interpretability of ML algorithms is considered to be in a very embryonic stage and there is a lot of room for progress.

Many of the algorithms that have been shown to have a high degree of performance – Random forests, Deep Learning, Support Vector Machines, Neural Networks, K-Nearest Neighbors – either lack good interpretability or provide none at all. In some cases, the degree of interpretability is inversely proportional to the performance of the algorithm, with the algorithms that are more “explainable” often offering poor performance in terms of accuracy.

  1. Find atypical activity patterns

Despite the availability of so many tools, data science teams are likely to find that the rate of fraud advancements will outpace the implementation of any singular solution

We believe that there is no miraculous formula that enables the detection of all potential crimes. At the same time Governments, Financial Institutions and Computer Scientists work to perfect their techniques to detect suspicious activity, criminals are doing the exact opposite, they are constantly working to find ways to subvert known crime-detection mechanisms, and they probably are one step ahead of us… Legislation changes take time to implement, measures being employed to detect criminal activity also take time to adapt to new patterns, and it becomes hard to predict what criminals will do next. For this reason, neither traditional rule-based techniques nor supervised learning algorithms may be enough to efficiently combat crime.

Unsupervised learning algorithms may be the key to finding emergent crime patterns but, it may prove even harder to evaluate their performance. They may be able to detect complex activity and find patterns that a human would otherwise never be able to, and that is already an improvement in and of itself, but, how do we prove that no suspicious activity has been missed? This is a great challenge to overcome, but one that can provide undeniable benefits.


Quidgest is a global technology company, a pioneer in automatic software modeling and generation. Through the extreme low-code platform Genio presents a vast portfolio of solutions, in different areas, aimed at continuous improvement in the management of companies and public institutions of excellence.

a Quidgest é uma empresa certifiicada


R. Viriato, 7
1050-233 Lisboa | Portugal
Tel. +351 213 870 563