Implementing Find-S algorithm using Python

Hello Everyone,

Let’s learn how to find the most specific hypothesis for a given dataset.

Find-S Algorithm

The Find-S algorithm is used to find the most specific hypothesis of a given dataset. The most specific hypothesis can be defined as a pattern drawn by only considering positive examples of the dataset.

Consider the dataset EnjoySport,

Example Sky Temp Humidity Wind Water Forecast EnjoySport
1Sunny Warm Normal Strong Warm Same Yes
2Sunny Warm High Strong Warm Same Yes
3Rainy Cold High Strong Warm Change No
4Sunny Warm High Strong Cool Change Yes

Implementation of Find-S algorithm

This dataset consists of seven attributes including the output. Let’s import the required libraries.

import pandas as pd
import numpy as np
  • Let us understand how to read the data of the CSV file(dataset).
  • Let the name of the CSV file be “dataset.csv”.
d = pd.read_csv("dataset.csv") 


The output of the above code would be the dataset EnjoySport.

Now, the next step is making an array of all attributes by excluding the output column.

a = np.array(d)[:,:-1] 
print(" The attributes are: ",a)
The attributes are: [['Sunny' 'Warm' 'Normal' 'Strong' 'Warm' 'Same'] 
                     ['Sunny' 'Warm' 'High' 'Strong' 'Warm' 'Same'] 
                     ['Rainy' 'Cold' 'High' 'Strong' 'Warm' 'Change'] 
                    ['Sunny' 'Warm' 'High' 'Strong' 'Cool' 'Change']]

The next step is getting only the output values of the dataset.

t = np.array(d)[:,-1] 
print("The target is: ",t) 
The target is: ['no' 'Yes' 'No' 'Yes']
  • Instantiate the variable specific_hypothesis by the first positive example.
  • Then for every positive example compare it with specific_hypothesis.
  • If an attribute does not match, replace it with ‘?’ else continue the process until the last positive example.
def fun(c,t): 
 for i, val in enumerate(t): 
 if val == "Yes": 
 specific_hypothesis = c[i].copy() 
 for i, val in enumerate(c): 
 if t[i] == "Yes": 
 for x in range(len(specific_hypothesis)): 
 if val[x] != specific_hypothesis[x]: 
 specific_hypothesis[x] = '?' 
 return specific_hypothesis 
print(" The final hypothesis is:",train(a,t))
  • The final value in specific_hypothesis is the most specific hypothesis of the dataset.
The final hypothesis is: ['Sunny' 'Warm' 'High' 'Strong' '?' '?']

This means that if the first four attributes of record are Sunny, Warm, High, Strong respectively then the output of that record is positive(Yes) irrespective of the last two attributes Water and forecast.

Thank You.

Leave a Reply

Your email address will not be published.