Phishing website detection using auto-encoders in Keras using Python
Phishing is the most commonly approached cyber-attack in this modern era. Through such attacks, the phisher will target innocent users and steal their details. The detection of a phishing website is becoming a very efficient way of protecting ourselves.
In this article, you will learn about auto-encoders and their implementation in Keras for phishing website detection in Python programming.
The first part of this article is installing the library.
# Type this code in command prompt !pip install keras
Importing the dataset
The next step is to make sure you have the data set in your working directory. You can download the dataset from here. After you download the data set, make sure to have it loaded in your working directory for easier access. The next snippet of code will tell you about pulling your data set into your environment
data0 = pd.read_csv(r'path to your dataset(urldata.csv)') data0.head() #The head will provide you with the first 5 data points in your dataset. #Checking the shape of the dataset data0.shape #Listing the features of the dataset data0.columns """OUTPUT: Index(['Domain', 'Have_IP', 'Have_At', 'URL_Length', 'URL_Depth','Redirection', 'https_Domain', 'TinyURL', 'Prefix/Suffix', 'DNS_Record','Web_Traffic', 'Domain_Age', 'Domain_End', 'iFrame', 'Mouse_Over','Right_Click', 'Web_Forwards', 'Label'],dtype='object')""" #Information about the dataset data0.info() """<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Domain 10000 non-null object 1 Have_IP 10000 non-null int64 2 Have_At 10000 non-null int64 3 URL_Length 10000 non-null int64 4 URL_Depth 10000 non-null int64 5 Redirection 10000 non-null int64 6 https_Domain 10000 non-null int64 7 TinyURL 10000 non-null int64 8 Prefix/Suffix 10000 non-null int64 9 DNS_Record 10000 non-null int64 10 Web_Traffic 10000 non-null int64 11 Domain_Age 10000 non-null int64 12 Domain_End 10000 non-null int64 13 iFrame 10000 non-null int64 14 Mouse_Over 10000 non-null int64 15 Right_Click 10000 non-null int64 16 Web_Forwards 10000 non-null int64 17 Label 10000 non-null int64 dtypes: int64(17), object(1) memory usage: 1.4+ MB"""
Creation of the auto-encoder network
Auto-encoders downscale an image, try to retain the contents, and again upscale the image and see if it could develop the same old image. Even though Auto-encoders are lossy, their automatic learnability is a useful property. The below code will tell you about the implementation of autoencoders. Do check the comments for a better understanding of the code.
# Sepratating & assigning features and target columns to X & y y = data['Label'] X = data.drop('Label',axis=1) # Splitting the dataset into train and test sets: 80-20 split from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 12) #importing required packages import keras from keras.layers import Input, Dense from keras import regularizers import tensorflow as tf from keras.models import Model from sklearn import metrics #building autoencoder model input_dim = X_train.shape[1] #finds the shape of the data encoding_dim = input_dim input_layer = Input(shape=(input_dim, )) encoder = Dense(encoding_dim, activation="relu", activity_regularizer=regularizers.l1(10e-4))(input_layer) #Here, we have created a layer with input shape as the input dimensions, activation layer as relu and regularizer as well. encoder = Dense(int(encoding_dim), activation="relu")(encoder) encoder = Dense(int(encoding_dim-2), activation="relu")(encoder) code = Dense(int(encoding_dim-4), activation='relu')(encoder) decoder = Dense(int(encoding_dim-2), activation='relu')(code)#from here the decoding part starts, where the model tries to upscale and regenerate the image decoder = Dense(int(encoding_dim), activation='relu')(encoder) decoder = Dense(input_dim, activation='relu')(decoder) autoencoder = Model(inputs=input_layer, outputs=decoder) #we create a model of our developed architecture of auto-encoder model autoencoder.summary() # displays the summary of our model. #compiling the model autoencoder.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) #Training the model history = autoencoder.fit(X_train, X_train, epochs=10, batch_size=64, shuffle=True, validation_split=0.2)
The above code will give you an idea of how to create an auto-encoder model. You can understand how to prepare the data for training and testing in the snippet above. After the training and testing are done, you can find the accuracy score that you have achieved from your model. Also, try manipulating the parameters and dataset. This will give you a better understanding of auto-encoders and neural networks in general. The next snippet will tell you how to find the accuracy scores for your developed model. This is important as accuracy scores will tell you whether your model is actually functioning properly or not.
acc_train_auto = autoencoder.evaluate(X_train, X_train)[1] acc_test_auto = autoencoder.evaluate(X_test, X_test)[1] print('\nAutoencoder: Accuracy on training Data: {:.3f}' .format(acc_train_auto)) print('Autoencoder: Accuracy on test Data: {:.3f}' .format(acc_test_auto)) #The output which I received #Autoencoder: Accuracy on training Data: 0.817 #Autoencoder: Accuracy on test Data: 0.818
So, basically, this article will help you in understanding the implementation of auto-encoder for a real-world application in Python using Keras. You can also increase the functionality of this article and make it a real-time product.
Leave a Reply