In an increasingly digital world, the amount of online fraudulent activities has risen considerably, highlighting the need for advanced tools to match the complexity of the challenge. Indeed, traditional methods of fraud detection are not able to capture the interconnections between entities effectively. Many financial entities are now turning to network science for a solution. The whole discipline is built around the analysis of connections and can uncover hidden relationships, identify anomalous patterns, and enhance fraud detection solutions.
What is network science
Network science is the study of complex networks that are built around the relationships between nodes (individual elements) and edges (connections between elements). This field combines a variety of disciplines such as graph theory from mathematics, statistical mechanics from physics, data visualization from computer science, and theories from social sciences to analyze and understand the structure and dynamics of networks. The study of networks can have a huge impact on the fight against financial fraud, as it can help visualize and analyze the web of transactions and reveal suspicious patterns of behavior.
Figure 1: Sample graph data model for fraud detection
The network component of fraud detection
In the financial network, nodes can represent individuals or accounts, while edges represent the transactions between these nodes. Once a network is constructed, it is possible to compute a variety of metrics to identify the most important nodes within the network. Computing a centrality measure will provide the relative importance of the nodes, highlighting potential key hubs in a money laundering scheme. Different community detection algorithms can be employed to identify clusters of nodes that have a higher density of connections with respect to the rest of the network. This is important as fraudulent activities often occur within highly connected communities. Network algorithms can also be used to estimate the probability of a fraudulent account being associated with other accounts (link prediction problem). Finally, the analysis of the network can be used as an additional measure of deviation from the norm, signaling potential fraud.
Tools for network-based fraud detection
- Graph Databases: The use of graph databases such as Neo4j and OrientDB, which can store and query large networks of data is fundamental to analyzing large-scale networks. When compared to traditional relational databases such as Oracle or MySQL, graph databases can have improvements in performance of orders of magnitude in terms of time and resources needed to perform the analysis.
- Visualization Tools: Software like Gephi, Cytoscape, or Python libraries (e.g., NetworkX, graph-tool) enables the visualization of complex networks, making it easier to spot anomalies. As the network grows in size, it becomes increasingly challenging to obtain a meaningful network visualization, instead of a so-called “ridiculogram”. While not as straightforward, it is still possible to obtain insightful visualizations through the use of appropriate filtering techniques.
- Machine Learning: Thanks to the use of graph embeddings, it is possible to take all the information coming from the network and encode it into the nodes, creating a vector that can be used by machine learning techniques. At this point, it is possible to train a link prediction model to predict duplicate accounts or use a classifier to label potential fraudsters.
Figure 2: Example of community detection in a small network
Case studies
- BNP Paribas Personal Finance was able to reduce fraud by 20% through the adoption of a graph database. This was possible thanks to a massively reduced execution time of the automated fraud detection process. Indeed, a graph database can compare all historical requests in milliseconds. Machine learning algorithms can then be applied to the embeddings derived from a graph database.
- Zurich Switzerland utilizes graph databases and network visualization tools to retrieve information much faster and get a better picture of the potential fraud account. In this case, graph visualization tools are used to aid the manual check of suspect fraud.
- In this scientific publication the authors use network analytics and machine learning techniques to identify tax evasion in Mexico. Once the initial patterns of tax evasion were identified, through the use of network concepts such as loops and neighborhood it was possible to expose an estimated previously undetected tax evasion in the order of $10 billion USD per year.
Conclusion
Network science offers a powerful framework for detecting and combating fraud in various industries. By analyzing the complex web of relationships and interactions, organizations can uncover hidden patterns and identify anomalous behavior indicative of fraud. As technology evolves, the integration of network science with advanced analytics and machine learning will play an increasingly vital role in safeguarding against fraudulent activities, ensuring a more secure and trustworthy digital landscape.