Skin disease classification using random forest algorithm in Python

Hello everyone! In this post, we will learn about how to classify various skin diseases based upon some attributes using the random forest algorithm.

By the end of the post, you will understand the implementation of the random forest algorithm.

Random Forest

  • Random forest algorithm is a supervised machine learning algorithm used for classification and regression problems.
  • In this algorithm, the given training dataset is divided into n subsets then for each of the subsets a decision tree is made.
  • The testing dataset is passed to every decision tree, the majority of the output predicted by decision trees is determined as the final output.

Implementation of Random Forest Algorithm

  • To implement the random forest algorithm in python we require some libraries and modules.
  • The first step is importing the dataset.
    import pandas as pd
    import numpy as np
    data=pd.read_csv('dataset.csv')
    data.head()
    
  • The above code will display the first 5 rows of the dataset.
  • The next step is separating output columns with other attributes.
    x = dermatology.iloc[0:, :-1].values
    y = dermatology.iloc[:, 11].values
  • Now for the given dataset split it into training and testing datasets.
  • This can be done using the train_test_split module which has parameters independent and dependent variable and the test_size.
  • For example,test_size=0.30 indicates 30% of records are considered as testing datasets and 70% as training datasets.
    from sklearn.model_selection import train_test_split  
    x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.30, random_state=0)
  • Now the next step is feature scaling and fitting the dataset to feed the random forest classifier.
    from sklearn.preprocessing import StandardScaler    
    st_x= StandardScaler()    
    x_train= st_x.fit_transform(x_train)    
    x_test= st_x.transform(x_test)
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.ensemble import BaggingClassifier
    from sklearn import tree
    classifier= RandomForestClassifier(n_estimators= 25, criterion="entropy")  
    classifier.fit(x_train, y_train)
  • The parameters n_estimators indicate the no of decision trees and criterion=”entropy” indicates the randomness of values.
  • The output of the above code is the description of a random forest classifier.
  • For the graphical representation of a collection of decision trees, we use matplotlib library.
    import matplotlib.pyplot as mtp
    mtp.figure(figsize=(5,5))
    for i in range(len(classifier.estimators_)):
      tree.plot_tree(classifier.estimators_[i],filled=True)
  • The output of the above code is a tree structure.
  • The last step is the evaluation of the confusion matrix.
  • It determines the number of correct and wrong outputs by comparing the variables y_pred and y_test.
    y_pred= classifier.predict(x_test)
    from sklearn.metrics import confusion_matrix  
    cm= confusion_matrix(y_test, y_pred)
    cm
  • The confusion matrix is as follows:

    Thank you!!

Leave a Reply

Your email address will not be published. Required fields are marked *