Translator's foreword: The original intention of blockchain applications such as Bitcoin is to achieve inclusive financial purposes. However, the use of criminals has brought them a bad reputation, which also highlights the importance of anti-money laundering work. Researchers at MIT, IBM, and blockchain analysis firm Elliptic have jointly introduced methods to detect illegal blockchain transactions. They also provide a dataset containing more than 200,000 tagged bitcoin transactions, of which only a few are traded. Classified as an illegal transaction.
The following is a translation of the paper:
Bitcoin's anti-money laundering research: using graph convolutional networks for financial forensics
Anti-money laundering (AML) regulation plays a key role in safeguarding the financial system, but it also imposes high costs on financial institutions and encourages those finances that are at the socio-economic and international edge to be excluded. The emergence of cryptocurrencies has brought an interesting paradox: pseudonyms allow criminals to be hidden in obvious places, while open data gives investigators more power and makes crowdsourcing for court analysis possible. At the same time, the development of learning algorithms has greatly boosted the AML toolkit. In this workshop, we provide an Elliptic dataset, a time series of more than 200,000 bitcoin transactions and 234,000 targeted payment streams (edges) with 166 node features, including based on non-public data. feature.
To the best of our knowledge, this is currently the largest set of tagged transaction data associated with cryptocurrencies. We shared a variant using the Logistic Regression (Logistic Regression) algorithm, the Random Forest (RF) algorithm, the Multilayer Cognitive Algorithm (MLP), and the Graph Convolutional Network (GCN) algorithm to predict the binary of illegal transactions. Classification task results.
Among them, the graph convolutional network (GCN) has a special significance as an emerging method of acquiring relational information.
The results demonstrate the superiority of the Random Forest (RF) algorithm and the potential for combining the capabilities of the Random Forest (RF) algorithm and the Graph Convolutional Network (GCN) approach.
Finally, we consider that visual analysis and interpretation are difficult to achieve due to the size and dynamics of the actual transaction graph. We provide a simple prototype for this, which can determine the graph and observe that the model is detecting illegally. Performance in terms of activities.
With these methods and data sets, we hope to (1) invite feedback to support our ongoing investigations, and (2) motivate others to work hard to address this important challenge.
Graph convolution network, anomaly detection, financial forensics, cryptocurrency, anti-money laundering, visualization.
1. Conflicts between anti-money laundering (AML) and inclusive finance
“The price of poverty is expensive,” is the common creed of inclusive finance advocates. It illustrates the fact that those at the margins of society are restricted in accessing the financial system and the relative costs of participation are higher. Limiting access issues (for example, the ability to register for a bank account) is, to some extent, an unexpected result of increasingly stringent anti-money laundering (AML) regulations, although anti-money laundering (AML) is essential to protect the financial system. But it has a disproportionately negative impact on low-income people, immigrants and refugees . About 1.7 billion adults worldwide are in a state of no bank accounts .
The relatively high cost problem is also partly due to the nature of the anti-money laundering policy, which enforces higher fixed compliance costs on the money service business (MSB), while “low value” customers are simply not worth the Business takes this risk.
In the case of remittances from low- and middle-income countries around the world, their remittances in 2018 reached $529 billion, a record high, far exceeding the $153 billion in aid donations worldwide.
At present, people send 200 US dollars, the average remittance rate is 7 points expensive, and some countries even exceed 10%. The UN Sustainable Development Goals are reduced to 3% by 2030.
Despite the prevalence of problems, anti-money laundering regulation cannot be arbitrarily rejected because of overburdening. The reason is that many illegal industries, such as drug trafficking groups, human trafficking and terrorist organizations, have caused many human tragedies around the world. The recent Malaysian Development Co., Ltd. (1MDB) money laundering scandal has claimed more than 11 billion taxpayer funds for the development of the Malaysian people . This incident has also implicated organizations such as Goldman Sachs. Huge fines and criminal prosecutions. The recent Dansk bank money-laundering scandal in Estonia was the center of about 200 billion U.S. dollars of illegal capital inflows from Russia and Azerbaijan, which also caused incalculable losses to innocent citizens of these countries, and the institutions involved in it, such as Danish banks. (danske bank) and Deutsche Bank (deutsche bank), thus losing billions of dollars .
Money laundering is not a crime of no victim, and the current methods of the traditional financial system have done a poor job in stopping money laundering.
1, 1 anti-money laundering in the world of cryptocurrency
The cryptocurrency introduced by the Bitcoin network  has triggered an explosion in technology and business interest in payment processing.
Around the world, money transfer startups are starting to compete with traditional banks and money service businesses such as Western Union.
They focus on using bitcoin and other cryptocurrencies as “orbits” (a term commonly used in this area) to achieve low-cost, peer-to-peer cross-border capital transfers.
Many people clearly point to remittance goals and support inclusive finance.
Growing up with these entrepreneurs are scholars from academia and advocate groups that support the renewal of cryptocurrency regulatory policies.
However, the suppression of this exciting application is the bad reputation of Bitcoin.
Many criminals use the pseudonym of Bitcoin to hide in people's sights, then conduct ransomware attacks and operate the dark market to exchange illegal goods and services.
In May 2019, the US Financial Crimes Enforcement Network (FinCEN) published how the Bank Secrecy Act of 1970 (BSA) applied to cryptocurrencies, also known as the Convertible Virtual Currency (CVC) Guide .
Consistent with the Bank Secrecy Act (BSA), the guide requires the Money Services Business (MSB) to generate risk assessments that measure money laundering, terrorism finance, and other financial crimes. These assessments are based on customer composition, service areas, and financial products or services offered.
The assessment must inform the management of the customer relationship, including the implementation of control measures commensurate with the risk. In other words, the money service business (MSB) must not only report suspicious accounts, but must also take action against them (such as freezing or shutting them down). The guide defines a “perfect risk assessment” as “a comprehensive analysis that assists top management in identifying and providing their individual risk profile”. This guide reinforces BSA's “Know Your Customer” (KYC) requirement, which requires MSB to have sufficient knowledge of the customers it serves in order to be able to determine the level of risk they have stated to the organization.
To the extent that it is “sufficient to understand” to the customer, this is the topic of debate between the compliance and policy circles. One of the most challenging aspects of practice is an implicit but effective implementation requirement that not only understands the customer but also the customer's customer . In traditional financial fragmented data ecosystems, compliance in this area is usually performed through calls between MSBs. But in the open system of Bitcoin, the entire graphical trading network data is public, although this exists in the form of pseudonyms and untagged forms.
In order to meet the opportunities brought by this open data, cryptocurrency companies emerged as the times require them to provide tailored anti-money laundering solutions for the cryptocurrency sector. While the pseudonym nature of Bitcoin is an advantage for criminals, the nature of open data is also a key advantage for investigators.
Second, the ELLIPTIC data set
Elliptic is a cryptocurrency intelligence company that works to protect the cryptocurrency ecosystem from criminal activity. As a contribution to the research community, we present the Elliptic Bitcoin Transaction Graphic Network Dataset and agree to share the dataset publicly. To the best of our knowledge, it is the largest tagged transaction dataset associated with cryptocurrency.
2, 1 graphic construction
The Elliptic dataset maps bitcoin transactions to real entities (exchanges, wallet providers, miners, legal services, etc.) belonging to legal categories, as well as illegal entities (scams, malware, terrorist organizations, ransomware, Ponzi schemes) Wait). Based on the original bitcoin data, a graph is constructed and labeled with nodes representing transactions and edges representing bitcoin transactions from a transaction stream to the next transaction.
If the entity that initiated the transaction (ie, the entity that controls the private key associated with the particular transaction input address) is a legal entity, then the given transaction is considered a legal category and vice versa. Importantly, all features are built using public information.
2.1.1 Nodes and Edges : There are 203,769 node transactions and 234,355 directed edge payment streams. In the current bitcoin network, using the same graphical representation, the entire BTC network has approximately 438 million nodes and 1.1 billion edges (as of this writing). In the Elliptic dataset, approximately 2% (4545) transactions were marked as illegal, 21% (42,019) transactions were marked as legitimate, and the remaining transactions were not labeled as legal or illegal, but instead Other features.
2.1.2 Features : Each node is associated with 166 features, of which the first 94 features represent native information about the transaction, including time step, input/output number, transaction fee, output, and total number (eg input/output) The average BTC received (used) and the average number of incoming (outgoing) transactions associated with the input/output. The remaining 72 features are called aggregation features, which are obtained by aggregating transaction information from the central node to the back/forward hop (the maximum and minimum of adjacent transactions giving the same information data (input/output, transaction fee, etc.) Standard deviation and correlation coefficient).
Figure 1: (above) The ratio of illegal nodes to legitimate nodes in different time steps in the data set. (bottom) number of nodes and time step
2.1.3 Time Information : The timestamp is associated with each node and represents the estimated time when the Bitcoin network confirms the transaction. There are 49 different time steps with an average interval of about two weeks. Each time step contains a single connection component that appears on the blockchain and is less than three hours apart from each other; there are no edges that connect different time steps.
It is obvious that the timestamps associated with nodes in a particular time step are very close together, so each of them can be effectively treated as an instant "snapshot" in time. The number of nodes per time step is fairly uniform over time (ranging from 1000 to 8000 nodes). see picture 1 .
2, 2 explanation of feature structure
The legal and illegal labeling process is achieved through a heuristic-based reasoning process. For example, a higher number of inputs and reuse of the same address are typically associated with a higher address cluster , which results in reduced anonymity of the signed transaction entity. On the other hand, consolidating funds controlled by multiple addresses into one transaction provides benefits in terms of transaction costs (fees). Therefore, entities that respond to a large number of user requests and avoid using anonymous protection measures may be legal (for example, an exchange).
Conversely, illegal activities may tend to use less-input transactions to reduce the impact of anti-anonymous address clustering techniques .
In addition, there are two major challenges in building features for bitcoin transactions. The first challenge is that the size of the Bitcoin blockchain is equivalent to 200GB of compressed data and approximately 400 million processed transactions. Although not all transactions are included in the subset used in this study, it is still necessary to access the complete blockchain to observe the full history of the transaction. To overcome this problem, Elliptic used a high-performance all-in-memory graphics engine to calculate features.
The second challenge comes from the underlying graph structure of the data and the heterogeneity of the number of transactions that can be owned by the transaction. When constructing 72 aggregated features, the problem of heterogeneous neighborhoods is by simply constructing the statistical total (minimum, maximum, etc.) of the native characteristics of the neighbor transactions. In general, this solution is sub-optimal because it brings a lot of information loss.
We will discuss this in the discussion of the method of deep learning of graphics that will be mentioned, which can better explain the local graph topology.
Third, tasks and methods
From a high-dimensional perspective, anti-money laundering analysis is an anomaly detection challenge that aims to accurately classify a small number of illegal transactions in a growing mass of data sets. A false positive rate of more than 90% in the industry inhibits this effort.
We hope to reduce the false positive rate without increasing the false negative rate, that is, identify more innocents without allowing more criminals.
Logistic regression (Regression) and Random Forest (RF) algorithms are the benchmarks for this task. And deep learning of graphics has become a potential tool for anti-money laundering .
In the case of the Elliptic dataset, the task to be performed on the data is to filter the transaction to assess the risk associated with a given transaction to and from the cryptocurrency wallet.
Specifically, each unmarked bitcoin transaction will be classified as illegal or legal.
3, 1 benchmark method
The benchmark machine learning method uses the first 94 features in supervised learning for binary classification. These techniques include Logistic Regression (Logistic Regression), Multilayer Cognitive Algorithm (MLP), and Random Forest  algorithms.
In MLP, each input neuron accepts a data feature, the output is a Softmax, and each class has a probability vector. Logistic regression and random forest are two common methods used for anti-money laundering, especially their respective advantages: random forests are used for accuracy, and logistic regression is used for logistic regression. Explanatory. However, these methods do not utilize any graphical information.
In the Elliptic data set, local features are enhanced by a set of 72 features that contain neighborhood information. We will see that the utilization of these features will improve performance. While this approach shows the graph structure in the binary classification problem, and this approach can be used with standard machine learning techniques, extending the pure feature-based approach beyond the neighborhood is a challenge. This shortcoming has prompted people to use the graph convolutional network approach.
3, 2 Figure Convolutional Network (GCN)
Deep learning of graphical structure data is a rapidly growing research topic [3, 6, 8, 9, 14]. Dealing with the inherent complexities of graphical structures has created scalability challenges for practical applications, and researchers have made significant progress in addressing these challenges [5, 11, 24].
Specifically, we considered the graph convolutional network (GCN). The Graph Convolutional Network (GCN) consists of a multi-layer graph convolution that is similar to a cognitive algorithm, but also uses a neighborhood aggregation step driven by spectral convolution.
Suppose the bitcoin transaction graph from the Elliptic dataset is G = (N, E), where N is the node transaction set and E is the edge set representing the BTC stream. The first layer of the graph convolutional network (GCN) uses the adjacency matrix A and the node embedding matrix H^(l) as inputs, and updates the node embedding matrix to H^(l+1) using the weight matrix W^(l). Output. In mathematics, we write as:
among them The definition is as follows:
σ is the activation function (usually ReLU) for all layers except the output layer. The initial embedded matrix comes from node features, such as . Suppose the graph convolution has an L layer. In the case of node classification, the output layer is softmax, where Consists of predicted probabilities. The convolutional layer is similar to the feed forward layer. The difference is only the previous multiplication. . The matrix is driven by spectral filtering on the Laplacian matrix of the graph, which is the result of a linear function of the Laplacian matrix. On the other hand, we can also The multiplication is interpreted as a set of transformations embedded by adjacent nodes. The graph convolutional network (GCN) parameters are the weight matrix of different layers .
A commonly used 2-layer graph convolutional network (GCN) can be neatly written as:
a "skip" variable, we found it actually useful, embedded in the middle A skip connection is inserted between the input node feature X and the resulting structure:
among them It is the weight matrix of the skip connection, which we call the architecture Skip-GCN. when When it is 0, Skip-GCN is equivalent to Logistic regression. Therefore, Skip-GCN should be at least as powerful as Logistic regression.
3, 3 Figure Convolutional Network (GCN)
Financial data is inherently time-sensitive because transactions are time-stamped. There is reason to assume that there is some kind of motivation, albeit hidden, that drives the evolution of the system. It would be more useful if a predictive model was designed to capture dynamics. In this way, models trained over a given period of time can be better generalized to subsequent time steps. The better the system's power is captured (which is evolving), the longer the horizon it can enter.
The time model for extending GCN is EvolveGCN , which calculates a separate GCN model for each time step. These GCNs are then connected by a recurrent neural network (RNN) to capture system power.
Therefore, the GCN model of the future time step is evolved from the past model, and its evolution captures the power.
In EvolveGCN, GCN weights are collectively considered as system states. By using an RNN (eg GRU), the model is updated every time the system is entered. The input is graphical information of the current time point. Graphical information can be instantiated in a number of ways, in EvolveGCN, which is represented by the embedding of top-k influential nodes in the graph.
Fourth, the experiment
Below is the experimental results we obtained on the Elliptic dataset. We performed a 70:30 time split on the training and test data. That is to say, the first 34 time steps are used to train the model, and the last 15 time steps are used for testing. We use time division because it reflects the nature of the task. Therefore, GCN is trained in an induction environment.
We first use three standard methods to test the standard classification model for legal/illegal prediction: Logistic regression (using the default parameters in the scikit learn python package ), random forest (also from scikit-learn, There are 50 estimators and 50 largest features) and a multi-layered cognitive algorithm (implemented in PyTorch).
Our MLP has a hidden layer consisting of 50 neurons and trained for 200 epoch periods using the Adam optimizer with a learning rate of 0.001.
We evaluated these models by using all 166 features (called AF) and using only local features (ie the first 94 features, called LF). The results are summarized in the upper part of Table 1.
The bottom half of Table 1 reports the results obtained when we utilized the graph structure of the data. We used the Adam optimizer to train the GCN model for 1000 epochs with a learning rate of 0.001. In our experiment, we used a 2-layer GCN, then hyperparameter adjustment, we set the size of the node embedding to 100.
Figure 2: Illegal F1 results within the test time span
Table 1: Illegal classification results. The top half of the table shows the results of not using the graph information. Each model shows the results with different inputs: AF refers to all features, LF refers to local features, ie the first 94 features, and NE refers to the calculation by GCN. Node embedding. The bottom half of the table shows the results of using GCN .
This task is a binary classification, and the two classes are unbalanced (see Figure 1). For anti-money laundering, what is more important is the minority classification (ie illegal category). Therefore, we use the weighted cross entropy loss method to train the GCN model to provide a higher illegitimate sample importance. After the hyperparameter adjustment, we chose a ratio of 0.3/0.7 for legal and illegal classes. Table 1 shows the test results for the accuracy of the illegal class, the recall rate (Recall) and the F1 score. For the sake of completeness, we also show the microscopic average F1 score.
Note that the performance of GCN and the variable Skip-GCN is better than Logistic Regression, which indicates that the graph-based approach is useful compared to the agnostic graph information approach. On the other hand, in this example, the input characteristics are already quite rich, and using only these features, the Random Forest (RF) method can get the best F1 score.
Another detail in Table 1 comes from a comparison of all features (AF) and only 94 local feature (LF) training methods. For all three models evaluated, the aggregated information leads to higher accuracy, indicating the importance of the graph structure in this environment. From this observation, we further evaluated the method of enhancing the input feature set. The purpose of this experiment is to prove that the graph information is useful for enhancing the representation of the transaction. In this setup, we insert the node embedding obtained from the GCN with the original feature X. The results show that the enhanced feature set improves the accuracy of the full feature (AF + NE) and local feature (LF + NE) models.
Table 2 compares the predicted performance of non-temporal GCN and temporal EvolveGCN. The results show that the performance of EvolveGCN has always been better than GCN, although the improvement of this data set is not significant. One way to do further research is to use other forms of system input to drive repeated updates within the GRU.
Table 2: GCN vs EvolveGCN
Black market closure: An important consideration in anti-money laundering is the predictive model's robustness to emerging events. An interesting aspect of this data set is that a black market is suddenly closed (in time step 43) within the time span of the data. As shown in Figure 2, this event caused all methods to perform poorly after the black market closed. Even a random forest model, retraining after each test time step, assumes that real information is available after each test, and that new illegal transactions after the black market close cannot be reliably captured. The robustness of such events is a major challenge that we need to address.
We have seen the fact that the Random Forest (RF) method is significantly better than the Logistic Regression (Logistic Regression), and it is also superior to the GCN, even though the latter has the addition of graph structure information. Random Forest (RF) uses a voting mechanism to integrate learning from multiple decision trees. Each decision tree is trained using subsamples of the data set. In contrast, GCN, like most deep learning models, uses Logistic Regression as the final output layer. Therefore, it can be seen as an important generalization of logistic regression.
The question is: Can random forest (RF) be combined with graph neural network methods? A simple idea is to use the embedding calculated from the GCN to add node features before running Random Forest (RF). According to previous experiments, this idea can only play a small role. Literature  proposes another idea that uses a forward neural network to parameterize each node in the decision tree. This idea organically combines random forest (RF) and neural networks, but does not suggest how to integrate graph information. One possible approach is to replace the Logistic Regression output layer in the GCN with a differentiable version of the decision tree for end-to-end training.
We will study the implementation of this idea in the future.
Six, graphic visualization
Finally, to support analysis and interpretation purposes (important for AML compliance), we created a visual prototype called Chronograph. The purpose of Chronograph is to enable human analysts to conduct research and analysis clearly and easily through a comprehensive representation of the model.
6,1 Elliptic dataset visualization research
In the Chronograph system, transactions are visualized as a node on the graph, with edges representing the flow of BTC from one transaction to another. The node coordinates are calculated simultaneously at all time steps using the projection algorithm UMAP . This global calculation makes the layout comparable in time. The time step slider control at the top of the interface allows the user to browse the time by submitting only nodes in the selected time step. The illegal trades in the picture are dyed red, while the legal trades are dyed blue, and the unclassified trades are the default black and gray.
When you click on a trading node or enter a trading ID in the left control, the system visualizes the selected transaction and highlights all adjacent transactions (inflow or outflow) in green. On the left side of the interface, the user can see general statistics about the chart of the transfer numbers between different transaction classes.
In this simple prototype, Chronograph enables simple exploration scenarios to visually inspect clusters and their presence over time, observe obvious transfer patterns, or detect other deviations, such as individual outliers. As a more complex use case, we have also increased the freedom of UMAP calculation input: the original transaction characteristic data (Figure 4a) and the last layer of neuron activation (Figure 4b) seem to be two interesting alternatives; The neural network, Rauber et al. also proposed a similar method . Differences in the results visualization will imply the particularity of the model, ie we assume that changes in similarity between the data can be used to explain which basic features are important to the model.
Figure 4 shows the results of two optional inputs for a time step, with the raw feature data at the top and the model activated at the bottom. We further dye the nodes using the actual labels in the left column and the GCN prediction labels in the right column, and then get 4 network visualization results.
In model-based layouts, illegal nodes appear to be more concentrated, which seems to be a feature worthy of attention: illegal nodes should have some important features, and the similarity of nodes makes the layout closer. However, since they do not completely collapse in one location, it is quite possible to have qualitative differences in the set of illegal nodes. Visualization further reveals that the model cannot detect the exact location of an illegal node. If there are multiple false predictions in the vicinity, this may further imply the lack of performance of the model. A detailed study of the characteristics of these transactions can inspire discussion from a new perspective and lead to further improvements in the model.
a) projection of the original transaction feature vector
b) projection of the last GCN layer activation
Figure 3: Two optional inputs for UMAP projection, left: colored by input label, right: predicted by GCN.
Figure 4: Chronograph's user interface, where users can view time-sliced transaction data and observe transaction patterns and change patterns. The illegal trade was dyed red. Further statistics are shown on the left.
In general, we have proposed some cryptocurrency transaction forensics (especially for Bitcoin) to combat criminal activity. We have provided the AML community with a large, tagged transaction dataset that has never been disclosed before. We shared the results of the early experiments, using various methods, including graph convolutional networks, and discussed possible algorithm improvements in the next step. We provide a prototype for the visualization of these data and provide a model for enhancing human analysis and interpretation capabilities. Most importantly, we hope to inspire others to address the major problem of anti-money laundering and make our financial system safer and more inclusive.
The research was funded by the MIT-IBM Watson Artificial Intelligence Laboratory, a joint research project between MIT and IBM Research, and the data and domain knowledge involved in this study was provided by Elliptic.
 Christopher Bishop. 2006. Pattern Recognition and Machine Learning. SpringerVerlag.
 Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
 Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral Networks and Locally Connected Networks on Graphs. In ICLR.
 Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013 API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108–122.
 Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In ICLR.
 Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In NIPS.
 Demirguc-Kunt, Leora Klapper, Dorothe Singer, Sinya Ansar, and Jake Hess. 2017. The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech Revolution.
 Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. In ICML.
 William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS.
 Martin Harrigan and Christoph Fretter. 2016. The unreasonable effectiveness of address clustering. In 2016 IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). IEEE, 368–373.
 Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
 Knomad and World Bank Group. 2019. Migration and Remittances: Recent Developments and Outlook. Migration and Development Brief 31.
 Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. 2015. Deep Neural Decision Forests. In ICCV.
 Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated Graph Sequence Neural Networks. In ICLR.
 Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
 Daniel J. Mitchell. 2012. World Bank Study Shows How Anti-Money Laundering Rules Hurt the Poor. Forbes.
 Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. (2008).
 Financial Crimes Enforcement Network. 2019. Application of FinCENâĂŹs Regulations to Certain Business Models Involving Convertible Virtual Currencies. FIN-2019-G001 (May 2019).
 Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, and Charles E. Leiserson. 2019. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. Preprint arXiv: 1902.10191.
 Paulo E Rauber, Samuel G Fadel, Alexandre X Falcao, and Alexandru C Telea. 2016. Visualizing the hidden activity of artificial neural networks. IEEE transactions on visualization and computer graphics 23, 1 (2016), 101–110.
 Mark Weber, Jie Chen, Toyotaro Suzumura, Aldo Pareja, Tengfei Ma, Hiroki Kanezashi, Tim Kaler, Charles E. Leiserson, and Tao B. Schardl. 2018. Scalable Graph Learning for Anti-Money Laundering: A First Look. CoRR abs/1812.00076 (2018). arXiv:1812.00076 http://arxiv.org/abs/1812.00076
 Wikipedia. [nd]. 1Malaysia Development Berhad scandal.
 Wikipedia. [nd]. Danske Bank money laundering scandal.
 Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In KDD.