Why data is the backbone of predictive AI

07 March 2024 Be the first to comment

Dominique Dierks

Content Manager

Finextra

Member since

11 Sep 2023

Location

London

Why data is the backbone of predictive AI

Contributed

This content is contributed or sourced from third parties but has been subject to Finextra editorial review.

With the launch of ChatGPT in November 2022, a new wave of AI enthusiasm was kicked off that primarily centred around the capabilities and possibilities of generative AI. Signifying a transformational moment in AI technology and smoothing the ease of access to AI, many financial institutions have since started looking at how generative AI can be applied in their field.

Yet while generative AI may be the shiny new item on the menu, it’s another form of AI that is the cornerstone of financial services: predictive AI. Especially when it comes to fraud detection and prevention, predictive AI enables financial organisations to optimise their operations and free up resources to optimise a bank’s business approach.

Optimising their predictive AI capabilities before turning attention to generative AI will allow banks to better understand customer behaviour, streamline fraud mitigation, and yield more revenue. But in order to do that, banks need to get their data right.

I spoke to Colm Coughlan, director of data science at Outseer, about the importance of rich, unique data and how financial institutions can leverage it to optimise their AI and fraud mitigation capabilities.

How are financial institutions currently using AI?

The unsung hero of anti-fraud modelling is predictive AI. As generative AI’s less shiny older brother, predictive AI is a powerful tool that leverages data, algorithms and machine learning to identify the probability of future outcomes based on historical data. This helps financial organisations to gain accurate insights into future scenarios and proactively identify fraudulent activity. By finding patterns in data, predictive AI can help create early warning systems, which in turn helps organisations prevent threats, minimise financial losses and increase customer trust.

The public rollout of ChatGPT and other LLMs signified a leap forward in the perception and availability of AI across many industries. Coughlan highlights: “It's a paradigm shift, but it's also built very much on predictive AI and uses a lot of the same tooling. For financial institutions, it's going to be very useful for designing applications where we want to emulate human behaviour and creativity – such as chatbots, summarising tools etc. In terms of anti-fraud modelling, people are currently looking at some interesting ideas for applying generative AI, but predictive modelling remains king as it is often the right tool for the job of statistical profiling and risk analysis.

“There is also a regulatory element, where predictive AI currently has a strong regulatory basis and can be used in banking and financial contexts, whereas generative AI is inherently more creative and its rationale more difficult to explain from a model governance perspective. There is a strong and obvious use case for generative AI in the hands of fraudsters however – it will help fraudsters conduct more advanced social engineering attacks at scale. Whereas once one fraudster could only have a single phone conversation at a time, a generative AI scammer could call thousands simultaneously with realistic human conversation aimed at extracting information from customers such as banking details, one-time-passwords and more.”

Yet while there might not be immediate use cases for generative AI in anti-fraud modelling, the introduction of ChatGTP has incentivised many financial players to re-examine the ways in which they make use of new technologies, such as AI and machine learning.

“Currently, predictive AI is used in fraud because you get a lot of data from a lot of different sources, and it can be quite hard to combine manually,” Coughlan says. “In my experience, organisations tend to follow their human bias when working manually on data, and also tend to create very complex systems that are then hard to maintain.”

“For example, you might create a magnum opus of fraud rules – tackling every fraud case that you have seen in the last couple of months – that works well, but then you move to a different project and six months later, it no longer works. This could be because human bias has been used instead of firm statistics. Whereas if you let a ML algorithm take in the data, calibrate and adapt it in a systematic and continuous way, it’s easier, cheaper and more maintainable. Especially if you choose technologies that have a high degree of explainability. That’s what we’re currently seeing in predictive AI: spotting trends in data, whether they are anomalous patterns and deviations from norm or combining different predictors to determine whether a transaction is statistically likely to be fraudulent based on past events.”

Why optimising fraud capabilities is more crucial than ever

Today, we are more connected than at any previous point in time. The ease of cross-border payments, uptake in instant payments and growing popularity of open and embedded finance have streamlined what we expect from financial services and payment providers, yet they have also opened the door for fraud.

UK Finance’s half year fraud report shows that, in the first six months of 2023, £580 million was stolen by fraudsters. In the US, this number grows to an astonishing $10 billion in fraud losses for the entirety of 2023, according to the Federal Trade Commission.

Upcoming regulation such as the PSR’s shift in liability introduce safeguards for customers, yet they don’t address fraud itself. However, legislation like the Economic Crime and Corporate Transparency Act, that was passed in 2023, are directly designed to discourage organisations to turn a blind eye to fraud. For financial services, failure to prevent fraud is one of the more intimidating aspects of the ECCTA.

That’s why it’s crucial for financial services companies to optimise their fraud detection and mitigation strategies, taking advantage of the opportunities that predictive AI can offer.

The role of data in AI

This is where data science comes into play. Coughlan highlights that any fraud prevention model is only as good as the data it’s built on. “There's no algorithm that will give you good results if you’re using data that’s not suitable for attacking the problem that you're trying to solve. Getting the data, storing it correctly and having the right tools to access and understand it is a prerequisite to deploying any kind of AI solution.”

The key to a successful fraud mitigation strategy using predictive AI is a considerable amount of clean, contextual data. There are different types of context in fraud, ranging from device to location and financial context. The best type used to train an organisation’s AI model needs to be determined based on the fraud MO that is most relevant to the specific business. The data that is relevant for one organisation might be entirely unsuitable for another. User history is crucial for contextual data, yet depending on business model, an organisation might be more interested in IP or email addresses, while for others device type might be more relevant in order to detect account takeover.

In an environment of increasing numbers of fraud, consortium data plays a crucial role in helping organisations minimise their losses. When groups of organisations decide to pool their data for shared access – creating a consortium – everyone benefits. Coughlan says: “If you take 10 financial institutions individually, the attacker has a first mover advantage against all 10 of them. But if you are part of a consortium, the attacker only gets a first mover advantage against the first organisation – the others get a warning. You might be able to reduce your number of successful attacks by up to 90%. You could get unlucky and be the very first person to be attacked, but most of the time, it'll be someone else.”

The more data an organisation has, the more likely they are to successfully detect and mitigate fraud. Codifying what a human expert might already know when reviewing data, without trying to project future trends, lays the groundwork for a successful predictive AI tool.

Coughlan highlights: “Ideally, when the attack comes to you, even though you've never seen it before, the model will respond and say: I recognise this pattern. While this will offer more benefits to smaller banks, who don’t have as much fraud exposure as larger organisations, even tier-1 banks stand to gain from pooling data."

Steps to effectively deploy AI to combat fraud

According to Coughlan, there are common missteps financial organisations can make when deploying AI to combat fraud.

The very first thing that organisations often misunderstand is the cycle of collecting data, marking it, training the AI and then going back again. Coughlan points out that people often believe that an AI doesn’t need to learn or, similar to using a consortium, they borrow from other people’s understanding and models without adapting it to their own needs. Equally crucial is ensuring that organisations have a solid case management system. The type of fraud seen by one organisation might not be appropriate for another company, and expecting high AI performance without adapting case markers will yield poor results.

Secondly, measuring the performance of the AI model is crucial, yet the definition of high performance depends on an organisation's unique case markings. Coughlan says: “Some organisations want to capture as much of fraud as they can by dollar value. For example, they want to see 99% of their dollar value blocked. Others want to stop 99% of fraudulent transactions and are not too sensitive about the dollar value of the transactions, but instead care about the customer experience. Do you want to block 10 customers with small fraudulent experiences or go after the whales where someone has been defrauded by a large amount? Organisations need to understanding their KPIs for fraud and measure them regularly.”

Yet even high performing AI models will sometimes get it wrong. Organisations need to determine how much customer friction they are willing to introduce in order to balance these cases out. How can they define this in a quantitative way? This is where it’s important to go back to the cycle: Evaluate performance and re-calibrate your AI model based on what you observe.

Explainability plays an integral role here as well: The better an organisation understands their AI model, the better they will be at developing newer versions or making changes in order to adapt to new situations.

“Fraud is not static. So the idea that you could develop a model, or work with a partner to develop and release a model, and for that model to still work 10 years later is not very realistic,” Coughlan says. “And that's an expectation that a lot of financial institutions have for technology. They can be change averse, and when they want to deploy something, they want it to stay deployed for years without needing much care.

“However, the fraud landscape evolves quickly, even within a particular attack vector like for example social engineering. And if you're not adapting your model and changing it at a timescale of at least of months, ideally days, you're going to miss a lot of fraud.”

Getting data right to get AI right

AI by itself will not fix fraud. Data science, case marking, and the right infrastructure to monitor and evaluate the model’s performance are all crucial factors in how successful an organisation’s anti-fraud model will perform.

Finding the balance between data, predictive AI, rules-based methodologies and case marking will enable financial institutions to make their fraud prevention model as effective as possible. This, in turn, will free up internal resources and help organisations focus on using data to grow new business as well as innovate their products and services. Organisations that don’t use predictive AI to mitigate fraud are at the risk of losing out on these revenue streams. But it all starts and ends with understanding your problem and your data.