Decision tree regression using Ada-Boost with scikit-learn

Introduction

In machine learning, we use boosting methods to make the training data sequential and fit it sequentially or we can say we train our model sequentially. Generally, we use two types of boosting algorithms first is gradient boost and second is Ada boost.

In this article, I will cover all the basics of the Ada boost algorithm with hands-on Python code using scikit-learn Python module.

Before we get started you need to know what is the meaning o boosting? So, Whenever we train our sequentially instead of normally fitting and training, it is boosting. This blog is showing the hands-on implementation with decision tree regression. We mostly use Ada boost with the decision tree algorithm.

Ada Boosting

  • First, we figure out which column is useful for us for the related results.
  • Then we fit the data normally in the algorithm, In this blog, we are fitting it in the decision tree regressor.
  • In these cases, there are some particular points that make it hard for an algorithm to train the model or we can say they are really difficult to fit in any classification and regression.
  • In the Ada boost, each difficult part is individually represented with the increased method.
  • So, in the end, we are actually making a strong classifier where we are using weak classifiers and In the case of regression, it will be regressor.
  • Ada boosting actually have some really big advantages that are mentioned after the coding part.

Let us get through the Python coding part and see how to apply it in our regression:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor

#Now we will make two regressor instead of one
#reg1 for normal regression
#reg2 for ada boosting

reg1 = DecisionTreeRegressor(max_depth=4)
reg2 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4),n_estimators=300, random_state=rng)

reg1.fit(X_train, y_train)
reg2.fit(X_train, y_train)

reg1.predict(X_test)
reg2.predict(X_test)


from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,predictions))
print("\n")
print(confusion_matrix(y_test,predictions))

Output:

              precision    recall  f1-score   support

           0       0.91      0.95      0.93       157
           1       0.94      0.90      0.92       143

    accuracy                           0.92       300
   macro avg       0.92      0.92      0.92       300
weighted avg       0.92      0.92      0.92       300



[[149   8]
 [ 15 128]]

 

–>First part, we are importing all the necessary libraries.

–>The second part we are making two regressors, first for normal regression and second for Ada boost regression

–>In the third part we are training the data in both of the regressors

–>Last part is all about predicting our results where we can compare the results of normal regression and Ada boost

 

Benefits

  • Boosts the performance of machine learning model
  • Improve weak models
  • Increases the accuracy

Limitations

  • Needs high-quality data
  • Sometimes force the outliers to work really hard
  • Noisy data will give us a noisy output.

Conclusion: It is really beneficial to use this with a large dataset with high quality. It does not work with noisy or poor-quality data. For applying it in poor quality data we need to focus on the cleaning part a lot where chances are that we will lose our real data. Not even this, It is applicable for every machine learning algorithm but you will get satisfying results with only few which is a draw-back.

Leave a Reply

Your email address will not be published. Required fields are marked *