Handling Missing Data with fillna, dropna and interpolate in Pandas – Lesson 6

Handling missing data is an important task in data analysis and pandas provides several methods to handle missing data, including fillna, dropna, and interpolate.

fillna method

The fillna method is used to fill missing values in a pandas DataFrame or Series. We can specify a value or a method to fill the missing values. Here is an example code that fills missing values with a specified value:

import pandas as pd

# create a sample DataFrame with missing values
df = pd.DataFrame({'A': [1, 2, None, 4], 'B': [5, None, 7, None]})

# fill missing values with 0
df.fillna(0, inplace=True)

print(df)

The above code creates a DataFrame with missing values and fills the missing values with 0 using the fillna method. The inplace=True parameter is used to modify the original DataFrame.

dropna method

The dropna method is used to drop rows or columns with missing values from a pandas DataFrame. Here is an example code that drops rows with missing values:

import pandas as pd

# create a sample DataFrame with missing values
df = pd.DataFrame({'A': [1, 2, None, 4], 'B': [5, None, 7, None]})

# drop rows with missing values
df.dropna(inplace=True)

print(df)

The above code creates a DataFrame with missing values and drops the rows with missing values using the dropna method. The inplace=True parameter is used to modify the original DataFrame.

interpolate method

The interpolate method is used to fill missing values in a pandas DataFrame or Series with interpolated values. Here is an example code that fills missing values with linear interpolation:

import pandas as pd

# create a sample DataFrame with missing values
df = pd.DataFrame({'A': [1, 2, None, 4], 'B': [5, None, 7, None]})

# fill missing values with linear interpolation
df.interpolate(method='linear', inplace=True)

print(df)

The above code creates a DataFrame with missing values and fills the missing values with linear interpolation using the interpolate method. The method='linear' parameter specifies the interpolation method and the inplace=True parameter is used to modify the original DataFrame.

In summary, the fillna, dropna, and interpolate methods are useful for handling missing data in pandas. The fillna method is used to fill missing values with a value or method, the dropna method is used to drop rows or columns with missing values, and the interpolate method is used to fill missing values with interpolated values.

Leave a Comment