Implementing decision tree based ID3 algorithm
Hello Everyone! In this tutorial, we will learn about the decision tree which is implemented using the ID3 algorithm using the iris dataset in Python.
- ID3 algorithm stands for iterative Dichotomiser 3 is a classification algorithm which is built on greedy approach and by considering best attribute which gives the maximum information gain and minimum entropy.
- Entropy is defined as randomness in the sample and information gain is defined as a reduction in entropy.
- A decision tree is like a tree structure in which the root and internal nodes are the attributes and leaf nodes are the output/test column.
Implementation using iris dataset in Python
- Here we use a pre-defined dataset named the iris dataset. For the implementation, we will use the scikit learn library.
- Let’s import the required libraries.
from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.tree import plot_tree from sklearn.tree import export_text
- Now, let’s create a decision tree using the sklearn library and display the iris dataset.
clf = DecisionTreeClassifier(random_state=0,max_depth=2) iris = load_iris() iris
- The output of the code is as follows:
- The next step is splitting the dataset and feeding it to DecisionTreeClassifier.
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=0) clf.fit(X_train,y_train) plot_tree(clf) r = export_text(clf, feature_names=iris['feature_names']) print(r)
- The final output of the above code is a tree structure as shown below:
- The decision tree allows visualizing the complete and every possible outcome of a decision made.
- It can be used for handling non-linear datasets effectively.