# Credit Card Fraud Detection using Logistic Regression

This tutorial will discuss credit card fraud detection using Logistic Regression in machine learning using Python. We will use the scikit-learn machine learning module.

This project tells about the given data and whether this transaction is a true transaction or a fraudulent transaction.

`So let’s try to implement the code`

Import the necessary Python libraries.

```import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score```

The first step is Data Collection,

```#Load the dataset

To check whether the data set is loaded or not print the first five rows by using the head method.

`credit_data.head()`

#### OUTPUT:-

Now print the last five rows by using the tail method.

`credit_data.tail()`

#### OUTPUT:-

Remember this point whenever u are working on machine learning problems that need data exploration.

### DATASET INFORMATION

`credit_data.info()`

#### OUTPUT:-

All the values are in the float and integer values so the machine can easily understand the data set easily.

Let’s check whether there are any missing values present in the given dataset.

`credit_data.isnull().sum()`

#### OUTPUT:-

No missing values are present in the dataset.

So no need to add mean and median values.

0 represents legit transactions.
1 represents a fraudulent transaction in the given dataset.

To check how many zeros and ones are present in the given dataset.

`credit_data['Class'].value_counts()`

#### OUTPUT:-

```0    284315
1       492
Name: Class, dtype: int64```

By seeing the above output there are only 492 ones present in the given dataset and 284315 zero values present in the dataset.

The dataset is highly unbalanced.

If you directly send the dataset into the prediction the accuracy is very less.

Now from the dataset take 492 zeros values and from one take all values from the dataset then the data set is balanced.

Separate the data for analysis

```legit=credit_data[credit_data.Class==0]
fraud=credit_data[credit_data.Class==1]```
```print(legit.shape)
print(fraud.shape)```

#### OUTPUT:-

```(284315, 31)
(492, 31)```

Statistical measures of data.

`legit.Amount.describe()`

#### OUTPUT:-

To check fraud information.

`fraud.Amount.describe()`

#### OUTPUT:-

There is so much difference in the mean values for the two data.

compare the values for both transactions.

`credit_data.groupby('Class').mean()`

#### OUTPUT:-

There is a difference between the two classes with that difference we are finding whether that is a fraud transaction or a true transaction.

Dealing with the unbalanced data.

### UNDERSAMPLING

Build a sample dataset containing a similar distribution of normal and fraudulent transactions.

```legit_sample=legit.sample(n=492)
```

Concatenating two data frames.

`new_dataset=pd.concat([legit_sample,fraud],axis=0)`

To check whether the data set is loaded or not print the first five rows by using the head method.

`new_dataset.head()`

#### OUTPUT:-

Now print the last five rows by using the tail method.

`new_dataset.tail()`

#### OUTPUT:-

Let’s count how many classes are present in the dataset.

`new_dataset['Class'].value_counts()`

#### OUTPUT:-

```0    492
1    492
Name: Class, dtype: int64```

Let’s check the mean difference between the two classes.

`new_dataset.groupby('Class').mean()`

#### OUTPUT:-

By seeing the above outputs the difference is reduced in the values.

That difference is how we are calculated now.

### splitting the data into features and target

```X=new_dataset.drop(columns='Class',axis=1)
Y=new_dataset['Class']```
`print(X)`

#### OUTPUT:-

Now let’s print the y values.

`print(Y)`

#### OUTPUT:-

Split the Training and Test data.

`X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,stratify=Y,random_state=2`
`print(X.shape,X_train.shape,X_test.shape)`

#### OUTPUT:-

`(984, 30) (787, 30) (197, 30)`

## MODEL TRAINING

`model=LogisticRegression()`
`print(model)`

#### OUTPUT:-

`LogisticRegression()`

Training logistic regression model with training data.

`model.fit(X_train,Y_train)`

#### OUTPUT:-

`LogisticRegression()`

## MODEL EVALUATION

Accuracy score

```X_train_prediction=model.predict(X_train)
training_data_accuracy=accuracy_score(X_train_prediction,Y_train)```
`print("Accuracy on training data : ",training_data_accuracy)`

#### OUTPUT:-

`Accuracy on training data :  0.9491740787801779`

Accuracy score on test data.

```#accuracy score on test data
X_test_prediction=model.predict(X_test)
test_data_accuracy=accuracy_score(X_test_prediction,Y_test)```
`print("Accuracy on test data : ",test_data_accuracy)`

#### OUTPUT:-

```Accuracy on test data :  0.9187817258883249
​```

The accuracy values for each train and test data are more than 90 percent so this algorithm is suitable for credit card fraud detection.