ANOVA Example using Python Pandas on Iris Dataset

ANOVA: We can use ANOVA to determine whether the means of three or more groups are significantly different.

Here’s an example of how to perform ANOVA on the Iris dataset using Python Pandas and the ANOVA function from the scipy.stats module:

import pandas as pd
from scipy.stats import f_oneway
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()

# Convert the dataset into a Pandas DataFrame
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Group the data by species and extract the sepal length column
group1 = df[df['species'] == 0]['sepal length (cm)']
group2 = df[df['species'] == 1]['sepal length (cm)']
group3 = df[df['species'] == 2]['sepal length (cm)']

# Perform ANOVA to test for significant differences between group means
f, p = f_oneway(group1, group2, group3)

if p < 0.05:
    print('Reject null hypothesis: at least one group mean is different')
else:
    print('Fail to reject null hypothesis: all group means are the same')
Output : Reject null hypothesis: at least one group mean is different

If you want to know more about What is P-Value with examples : Click Here

In this example, we first load the Iris dataset using the load_iris() function from scikit-learn. We then convert the dataset into a Pandas DataFrame and extract the sepal length (cm) column for each of the three species of iris.

Next, we perform ANOVA on the three groups of data using the f_oneway() function from scipy.stats. This function returns the F-statistic and p-value for the test. If the p-value is less than 0.05, we reject the null hypothesis and conclude that there is evidence of a difference between at least one pair of group means. If the p-value is greater than or equal to 0.05, we fail to reject the null hypothesis and conclude that there is not enough evidence to support a difference between the group means.

Leave a Comment