Merge in Python Pandas with examples

In Pandas, merge is a function that allows you to combine two or more data frames based on one or more common columns, similar to a SQL join.

Here’s an example of how to use merge in Pandas:

import pandas as pd

# create two data frames to merge
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': [5, 6, 7, 8]})

# merge data frames based on the 'key' column
merged_df = pd.merge(df1, df2, on='key')

# print the merged data frame
print(merged_df)

Output:

key  value_x  value_y
0   B        2        5
1   D        4        6

In this example, we create two data frames df1 and df2 with a common column key. We then use the merge function to merge the two data frames based on the common column key. The resulting data frame merged_df contains only the rows where the key appears in both data frames, and includes columns from both data frames (value_x from df1 and value_y from df2).

Here are a few more examples to illustrate the different ways in which merge can be used:

Merge two data frames with different column names:

df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})
merged_df = pd.merge(df1, df2, on='key')
print(merged_df)

Output:

key  value1  value2
0   B       2       5
1   D       4       6

Merge two data frames using multiple columns:

df1 = pd.DataFrame({'key1': ['A', 'B', 'C', 'D'], 'key2': ['W', 'X', 'Y', 'Z'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key1': ['B', 'D', 'E', 'F'], 'key2': ['X', 'Z', 'Y', 'W'], 'value': [5, 6, 7, 8]})
merged_df = pd.merge(df1, df2, on=['key1', 'key2'])
print(merged_df)

Output:

key1 key2  value_x  value_y
0    D    Z        4        6

Merge two data frames with different types of join (inner join, left join, right join, and outer join):

df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})

# inner join
inner_join_df = pd.merge(df1, df2, on='key', how='inner')
left_join_df = pd.merge(df1, df2, on='key', how='left')
right_join_df = pd.merge(df1, df2, on='key', how='right')
outer_join_df = pd.merge(df1, df2, on='key', how='outer')

Here are a few more examples of using the merge function in Pandas:

Merge two data frames with different column names and fill missing values with zeros:

df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})
merged_df = pd.merge(df1, df2, left_on='key', right_on='key', how='outer').fillna(0)
print(merged_df)

Output:

key  value1  value2
0   A     1.0     0.0
1   B     2.0     5.0
2   C     3.0     0.0
3   D     4.0     6.0
4   E     0.0     7.0
5   F     0.0     8.0

In this example, we use left_on and right_on parameters to specify the column names for merging instead of using the on parameter. We also use the outer join type to include all rows from both data frames in the merged result, and fill missing values with zeros using the fillna method.

Merge two data frames with different column names and rename the columns:

df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})
merged_df = pd.merge(df1, df2, left_on='key', right_on='key').rename(columns={'value1': 'df1_value', 'value2': 'df2_value'})
print(merged_df)

Output:

key  df1_value  df2_value
0   B          2          5
1   D          4          6

In this example, we use the rename method to rename the columns of the merged result.

Merge Complete Explanation with Example ( Jupyter notebook Code )

Basic Merge Using a Dataframe Column

In [28]:

import pandas as pd
df1 = pd.DataFrame({
    "city": ["new york","chicago","orlando"],
    "temperature": [21,14,35],
})
df1

Out[28]:

	city	temperature
0	new york	21
1	chicago	14
2	orlando	35

In [29]:

df2 = pd.DataFrame({
    "city": ["chicago","new york","orlando"],
    "humidity": [65,68,75],
})
df2

Out[29]:

	city	humidity
0	chicago	65
1	new york	68
2	orlando	75

In [30]:

df3 = pd.merge(df1, df2, on="city")
df3

Out[30]:

	city	temperature	humidity
0	new york	21	68
1	chicago	14	65
2	orlando	35	75

Type Of DataBase Joins

In [31]:

df1 = pd.DataFrame({
    "city": ["new york","chicago","orlando", "baltimore"],
    "temperature": [21,14,35, 38],
})
df1

Out[31]:

	city	temperature
0	new york	21
1	chicago	14
2	orlando	35
3	baltimore	38

In [32]:

df2 = pd.DataFrame({
    "city": ["chicago","new york","san diego"],
    "humidity": [65,68,71],
})
df2

Out[32]:

	city	humidity
0	chicago	65
1	new york	68
2	san diego	71

In [33]:

df3=pd.merge(df1,df2,on="city",how="inner")
df3

Out[33]:

	city	temperature	humidity
0	new york	21	68
1	chicago	14	65

In [34]:

df3=pd.merge(df1,df2,on="city",how="outer")
df3

Out[34]:

	city	temperature	humidity
0	new york	21.0	68.0
1	chicago	14.0	65.0
2	orlando	35.0	NaN
3	baltimore	38.0	NaN
4	san diego	NaN	71.0

In [35]:

df3=pd.merge(df1,df2,on="city",how="left")
df3

Out[35]:

	city	temperature	humidity
0	new york	21	68.0
1	chicago	14	65.0
2	orlando	35	NaN
3	baltimore	38	NaN

In [36]:

df3=pd.merge(df1,df2,on="city",how="right")
df3

Out[36]:

	city	temperature	humidity
0	new york	21.0	68
1	chicago	14.0	65
2	san diego	NaN	71

indicator flag

In [37]:

df3=pd.merge(df1,df2,on="city",how="outer",indicator=True)
df3

Out[37]:

	city	temperature	humidity	_merge
0	new york	21.0	68.0	both
1	chicago	14.0	65.0	both
2	orlando	35.0	NaN	left_only
3	baltimore	38.0	NaN	left_only
4	san diego	NaN	71.0	right_only

suffixes

In [38]:

df1 = pd.DataFrame({
    "city": ["new york","chicago","orlando", "baltimore"],
    "temperature": [21,14,35,38],
    "humidity": [65,68,71, 75]
})
df1

Out[38]:

	city	humidity	temperature
0	new york	65	21
1	chicago	68	14
2	orlando	71	35
3	baltimore	75	38

In [39]:

df2 = pd.DataFrame({
    "city": ["chicago","new york","san diego"],
    "temperature": [21,14,35],
    "humidity": [65,68,71]
})
df2

Out[39]:

	city	humidity	temperature
0	chicago	65	21
1	new york	68	14
2	san diego	71	35

In [40]:

df3= pd.merge(df1,df2,on="city",how="outer", suffixes=('_first','_second'))
df3

Out[40]:

	city	humidity_first	temperature_first	humidity_second	temperature_second
0	new york	65.0	21.0	68.0	14.0
1	chicago	68.0	14.0	65.0	21.0
2	orlando	71.0	35.0	NaN	NaN
3	baltimore	75.0	38.0	NaN	NaN
4	san diego	NaN	NaN	71.0	35.0

join

In [58]:

df1 = pd.DataFrame({
    "city": ["new york","chicago","orlando"],
    "temperature": [21,14,35],
})
df1.set_index('city',inplace=True)
df1

Out[58]:

	temperature
city
new york	21
chicago	14
orlando	35

In [59]:

df2 = pd.DataFrame({
    "city": ["chicago","new york","orlando"],
    "humidity": [65,68,75],
})
df2.set_index('city',inplace=True)
df2

Out[59]:

	humidity
city
chicago	65
new york	68
orlando	75

In [60]:

df1.join(df2,lsuffix='_l', rsuffix='_r')

Out[60]:

	temperature	humidity
city
new york	21	68
chicago	14	65
orlando	35	75

Merge in Python Pandas with examples

Basic Merge Using a Dataframe Column

Type Of DataBase Joins

indicator flag

suffixes

join

Leave a Comment Cancel Reply

Do you want to learn?

We are here to help.

Our B2B Services

Course benefits

Quick Links

Resource

Contact Info