Implementing Find-S algorithm using Python
Hello Everyone,
Let’s learn how to find the most specific hypothesis for a given dataset.
Find-S Algorithm
The Find-S algorithm is used to find the most specific hypothesis of a given dataset. The most specific hypothesis can be defined as a pattern drawn by only considering positive examples of the dataset.
Consider the dataset EnjoySport,
Example | Sky | Temp | Humidity | Wind | Water | Forecast | EnjoySport |
1 | Sunny | Warm | Normal | Strong | Warm | Same | Yes |
2 | Sunny | Warm | High | Strong | Warm | Same | Yes |
3 | Rainy | Cold | High | Strong | Warm | Change | No |
4 | Sunny | Warm | High | Strong | Cool | Change | Yes |
Implementation of Find-S algorithm
This dataset consists of seven attributes including the output. Let’s import the required libraries.
import pandas as pd import numpy as np
- Let us understand how to read the data of the CSV file(dataset).
- Let the name of the CSV file be “dataset.csv”.
d = pd.read_csv("dataset.csv") print(d)
The output of the above code would be the dataset EnjoySport.
Now, the next step is making an array of all attributes by excluding the output column.
a = np.array(d)[:,:-1] print(" The attributes are: ",a)
The attributes are: [['Sunny' 'Warm' 'Normal' 'Strong' 'Warm' 'Same'] ['Sunny' 'Warm' 'High' 'Strong' 'Warm' 'Same'] ['Rainy' 'Cold' 'High' 'Strong' 'Warm' 'Change'] ['Sunny' 'Warm' 'High' 'Strong' 'Cool' 'Change']]
The next step is getting only the output values of the dataset.
t = np.array(d)[:,-1] print("The target is: ",t)
The target is: ['no' 'Yes' 'No' 'Yes']
- Instantiate the variable specific_hypothesis by the first positive example.
- Then for every positive example compare it with specific_hypothesis.
- If an attribute does not match, replace it with ‘?’ else continue the process until the last positive example.
def fun(c,t): for i, val in enumerate(t): if val == "Yes": specific_hypothesis = c[i].copy() break for i, val in enumerate(c): if t[i] == "Yes": for x in range(len(specific_hypothesis)): if val[x] != specific_hypothesis[x]: specific_hypothesis[x] = '?' else: pass return specific_hypothesis print(" The final hypothesis is:",train(a,t))
- The final value in specific_hypothesis is the most specific hypothesis of the dataset.
OUTPUT:
The final hypothesis is: ['Sunny' 'Warm' 'High' 'Strong' '?' '?']
This means that if the first four attributes of record are Sunny, Warm, High, Strong respectively then the output of that record is positive(Yes) irrespective of the last two attributes Water and forecast.
Thank You.
excellent material sir. please post and update this type of content so that it is very useful to all .it might help to interview preparing students also sir.
Question is find s-algorithm as play tennis