# Credit Card Fraud Transaction: A Statistical Analysis

**Introduction**

Fraud is one of the major ethical issues in the credit card industry. It constitutes a growing problem all over the world. Fraud deals with cases involving criminal purposes that can be difficult to identify. Credit cards are one of the famous targets of fraud but not the only one[2].

The use of credit cards has increased significantly in the last years, unfortunately so has fraud. Every year billions of euros are lost due to credit card fraud. According to European Central Bank, during 2012 the total level of fraud reached 1.33 billion Euros in the Single Euro Payments Area, which represents an increase of 14.8% compared with 2011. Moreover, payments across nontraditional channels (mobile, internet, etc.) accounted for 60% of the fraud, whereas it was 46% in 2008. This opens new challenges as new fraud patterns emerge and make fraud detection systems less successful in preventing these frauds[1].

Credit card fraud detection is a cost-sensitive problem. When predicting a transaction as fraudulent, when in fact it is not a fraud, there is an administrative cost that is incurred by the financial institution. On the other hand, when failing to detect a fraud, the amount of that transaction is lost. Thus, when constructing a credit card fraud detection model, it is very important to use features that allow accurate classification. Typical models only use raw transactional features, such as time, amount, place of the transaction[1].

Several detection systems based on machine learning techniques have been successfully used for the credit card fraud problem. The skewness of the data, the preprocessing of the features and the dimensionality of the search space consists of important factors that influence the process[1].

# Data Sample Description

Universe **Ω**: transactions made by credit that occurred in two days, sometime in September 2013, by Europeans cardholders.

• RV-function Time is the seconds elapsed between each transaction. The DVS S*t *is the set of all the seconds elapsed between each transaction and the first transaction. The distribution of this RV-function is X*t *: Ω → S*t *=[0, 172792].

• RV-function Amount is the transaction amount. The DVS Sa is the set of all money value transacted through different credit cards. The distribution of this RV-function is Xa : Ω → Sa = [0, 25691.16].

• RV-function V1 to RV-function V28 are principal components resulting from a PCA (Principal Components Analysis) transformation and their names are hidden for confidentiality purposes. The DVS S for each of the features V1 to V28 is the set of all possible values for each feature respectively. The continuous distributions of the RV-functions are as follow:

• V1 is XV1 : Ω → S1 = [ -56.407510, 2.45493]

• V2 is XV2 : Ω → S2 = [ -72,715728, 22.057729]

• V3 is XV3 : Ω → S3 = [ -48.325589, 9.382558]

• V4 is XV4 : Ω → S4 = [ -5.683171, 16.875344]

• V5 is XV5 : Ω → S5 = [ -113.743307, 34.801666]

- • V6 is XV6 : Ω → S6 = [ -26.160506, 73.301626]

• V7 is XV7 : Ω → S7 = [ -43.557242, 120.589494]

• V8 is XV8 : Ω → S8 = [ -73.216718, 20.007208]

• V9 is XV9 : Ω → S9 = [ -13.434066, 15.594995]

• V10 is XV10 : Ω → S10 = [ -24.588262, 23.745136]

• V11 is XV11 : Ω → S11 = [ -4.797473, 12.018913]

• V12 is XV12 : Ω → S12 = [ -18.683715, 7.848392]

• V13 is XV13 : Ω → S13 = [ -5.791881, 7.126883]

• V14 is XV14 : Ω → S14 = [ -19.214325, 10.526766]

• V15 is XV15 : Ω → S15 = [ -4.498945, 8.877742]

• V16 is XV16 : Ω → S16 = [ -14.129855, 17.315112]

• V17 is XV17 : Ω → S17 = [ -25.162799, 9.253526]

• V18 is XV18 : Ω → S18 = [ -9.498746, 5.041069]

• V19 is XV19 : Ω → S19 = [ -7.213527, 5.591971]

• V20 is XV20 : Ω → S20 = [ -54.497720, 39.420904]

• V21 is XV21 : Ω → S21 = [ -34.830382, 27.202839]

• V22 is XV22 : Ω → S22 = [ -10.933144, 10.503090]

• V23 is XV23 : Ω → S23 = [ -44.8077735, 22.528412]

• V24 is XV24 : Ω → S24 = [ -2.836627, 4.584549]

• V25 is XV25 : Ω → S25 = [ -10.295397, 7.519589]

• V26 is XV26 : Ω → S26 = [ -2.604551, 3.517346]

• V27 is XV27 : Ω → S27 = [ -22.565679, 31.612198]

• V28 is XV28 : Ω → S28 = [ -15.430084, 33.847808]

• RV Class is the response variable which represents the kind of transactions performed. The DVS Sc is the set of the possible categories of transactions which are fraud represented as 1 and non-fraud transactions represented as 0. The RV-function distribution XClass : Ω → Sc = {0,1} is a Bernoulli distribution.

From the figure above, these distributions exhibit the possibility of skewness of the data.

Distributions can present positive skewness (right skewness) or negative skewness (left skewness). If the skewness is less than -1 or greater than 1, the distribution is highly skewed. Besides positive or negative skewness, the skewness can be 0 for some distributions. In this case, the distribution is said to be symmetric.

**Statistical analysis**

If we take a closer look at some features from(1), it can be observed that Class is highly skewed with Skewness Sc = 23.997579. Xa distribution is also asymmetric with skewness Sa = 16.977724. The variable X*t *distribution is moderately skewed with skewness S*t *= -0.035568. It is a bimodal distribution as it appears to have 2 maxima.

In addition, the data presents much more outliers than normal. The variable XV28 have kurtosis k28 = 933.397502, Xa with kurtosis ka = 845.092646 as well as the variable XClass with kurtosis kc = 573.887843. These represent the variables with much more outliers than the others.

Since XV1 to XV28 were obtained by PCA transformation, it is assumed that these features had to be scaled thus X*t *and Xa are also scaled for this project. Taking a closer look at the distribution of XClass (from the original data set) it is obvious that the data is imbalanced and XClass is highly skewed. It can be concluded that in order to capture more correlations between XClass and other features, a subset of the data need to be used. The subset is developed by making the classes equivalent in order to obtain a normal distribution of the RV-function XClass. See Figure below:

The whole data set present 492 transactions labeled as fraud and the rest labeled as non-fraud. Thus, the subset distribution has 492 fraud and 492 non-fraud transactions. The reason behind the use of this subset is explained with the results of the correlation matrix from the entire data set as shown here[B].

In order to get a deeper understanding of the data, correlational information need to be withdrawn from this data. A correlation matrix is computed to point out correlation between variables. The correlation matrix is going to support the extraction of important features.

In this case, features which exhibit strong correlations with each other and the ones which show strong correlations with Class are to be recorded.

As seen from the figure below, there are negative and positive correlations. There are features that appear to influence whether a transaction is a normal transaction or a fraud.

The variables XV10, XV12 and XV14 manifest a strong negative correlation with the variable Class. The variables XV2, XV4 and XV11 present a strong positive correlation with the variable Class. These variables influence the outcome of the data value space of Class. Other strong negative correlations exist between XV2 and XV3, XV7 and XV2, XV12 and XV11, XV14 and XV11. Variables XV3, XV5, XV7 show a strong positive correlation with the variable XV1.

After pointing out these important features, a deeper analysis is conducted with boxplots. The boxplots can emphasize and explain clearly the dependencies of the variables observed in the correlation matrix.

At first, the figure above confirms the presence of outliers as it was specified previously. Thus, these outliers have a strong correlation with the RV-function XClass. Next, it is observed that the lower XV10, XV12, XV14 distributions go, the higher is the probability of fraud transaction. This confirms the negative correlation results deduced from the correlation matrix between these variables and XClass.

A presence of outliers and its strong correlation with XClass are recorded too from the figure above. As the distributions of XV2, XV4, and XV11 increase, the higher is the probability of fraud transaction. Thus, confirming the positive correlation described previously from the results of the correlation matrix between these variables and XClass.

In general, the distribution can be mathematically denoted in the following way:

XV14 **⊕ **XClass : Ω → SV14 x SClass = [ -19.214325, 10.526766] x {0,1}. And this goes the same for all the other variables. Below is the visualization of these previous distributions but this time only for fraud transactions.

Comparing these distributions, XV14 and XV2 are the only features that have a Gaussian distribution. XV11 and XV12 have a Weibull distribution. The mean of the first three variables is mostly negative. This could mean that the values and the frequency of the data have enough influential negative values that the mean becomes negative.

For dimensionality reduction, t-SNE (Stochastic Neighbor Embedding) algorithm is used. Dimensionality reduction is used for determining clusters.

Stochastic Neighbor Embedding starts by converting the high dimensional Euclidean distances between datapoints into conditional properties that represent similarities. The similarity of datapoint x*j *to data x*i *is the conditional probability, p*j\i*, that x*i *would pick x*j *as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at x*j*[3].

Unlike PCA (Principal Component Analysis), t-SNE is not limited to non-linear dependencies. It allows to capture a non-linear structure. This makes t-SNE more suited to match this dataset and all sort of other datasets. t-SNE creates a probability distribution using the Gaussian distribution that defines the relationships between the points in high-dimensional space[4].

For this subset, the dimensionality reduction is used to cluster the transactions as fraud transactions and non-fraud transactions.

Even if the subset is small, t-SNE was able to detect clusters accurately in this case. It was able to separate accurately the fraud transactions from non-fraud transactions. This indicates that the variables in the dataset can be well predicted. Thus, a predictive model to be used on the subset will perform well in separating fraud from non-fraud transactions.

This result proves there is presence of dependencies between variables. Since XClass is a categorical variable, logistic regression is computed in order to check how accurate it is effective in classifying the transactions. An accuracy score of 94% is obtained from the Logistic Regression Classifier.

**Conclusion**

In general, the data set possess some interesting structure. By simply analyzing this data, the skewness of the data was obvious. The skewness was then computed to confirm that observation. The computation of kurtosis also highlights the presence of more outliers than normal. In fact, outliers can distort the accuracy of a model. Because of that, performing a reduction of outliers is going to be necessary for further purposes, in order to improve the accuracy of the classifier model. It can also be noted that the reduction of outliers needs to be performed carefully in fear to lose a large amount of information that can lead to a risk of the model underfitting. Finally, clusters were able to be identified within the subset after strong correlations between variables were found and important features extracted from the information provided by the correlation matrix.

**References**

[1] Alejandro Correa Bahnsen, Djamila Aouada, Aleksandar Stojanovic, Björn Ottersten, Feature engineering strategies for credit card fraud detection, 134–142, 2016.

[2] Linda Delamaire, Hussein Abdou, John Pointon, Credit Card fraud and detection techniques, 2009.

[3] Laurens Van der Maaten, Geoffrey Hinton, Visualizing Data using t-SNE, 2008.

[4] https://mlexplained.com/2018/09/14/paper-dissected-visualizing-data-using-t-sne-explained/ . Retrieved on June 9, 2020.