Decision tree regression using Ada-Boost with scikit-learn
Introduction
In machine learning, we use boosting methods to make the training data sequential and fit it sequentially or we can say we train our model sequentially. Generally, we use two types of boosting algorithms first is gradient boost and second is Ada boost.
In this article, I will cover all the basics of the Ada boost algorithm with hands-on Python code using scikit-learn Python module.
Before we get started you need to know what is the meaning o boosting? So, Whenever we train our sequentially instead of normally fitting and training, it is boosting. This blog is showing the hands-on implementation with decision tree regression. We mostly use Ada boost with the decision tree algorithm.
Ada Boosting
- First, we figure out which column is useful for us for the related results.
- Then we fit the data normally in the algorithm, In this blog, we are fitting it in the decision tree regressor.
- In these cases, there are some particular points that make it hard for an algorithm to train the model or we can say they are really difficult to fit in any classification and regression.
- In the Ada boost, each difficult part is individually represented with the increased method.
- So, in the end, we are actually making a strong classifier where we are using weak classifiers and In the case of regression, it will be regressor.
- Ada boosting actually have some really big advantages that are mentioned after the coding part.
Let us get through the Python coding part and see how to apply it in our regression:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import AdaBoostRegressor #Now we will make two regressor instead of one #reg1 for normal regression #reg2 for ada boosting reg1 = DecisionTreeRegressor(max_depth=4) reg2 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4),n_estimators=300, random_state=rng) reg1.fit(X_train, y_train) reg2.fit(X_train, y_train) reg1.predict(X_test) reg2.predict(X_test) from sklearn.metrics import classification_report,confusion_matrix print(classification_report(y_test,predictions)) print("\n") print(confusion_matrix(y_test,predictions))
Output:
precision recall f1-score support 0 0.91 0.95 0.93 157 1 0.94 0.90 0.92 143 accuracy 0.92 300 macro avg 0.92 0.92 0.92 300 weighted avg 0.92 0.92 0.92 300 [[149 8] [ 15 128]]
–>First part, we are importing all the necessary libraries.
–>The second part we are making two regressors, first for normal regression and second for Ada boost regression
–>In the third part we are training the data in both of the regressors
–>Last part is all about predicting our results where we can compare the results of normal regression and Ada boost
Benefits
- Boosts the performance of machine learning model
- Improve weak models
- Increases the accuracy
Limitations
- Needs high-quality data
- Sometimes force the outliers to work really hard
- Noisy data will give us a noisy output.
Conclusion: It is really beneficial to use this with a large dataset with high quality. It does not work with noisy or poor-quality data. For applying it in poor quality data we need to focus on the cleaning part a lot where chances are that we will lose our real data. Not even this, It is applicable for every machine learning algorithm but you will get satisfying results with only few which is a draw-back.
Leave a Reply