What is DataFrame in Pandas – Python Pandas Tutorial Part 4

The DataFrame object is the workhorse of data analysis in pandas. It is a spreadsheet-like table of values, similar to a database table or a worksheet in an Excel spread sheet, but with more powerful indexing and data alignment features.

DataFrames can be constructed from various types of data distributed across any number of columns and rows.

They can also represent relational tables (rows as different observations over which you want to perform group by operations), as well as be used as matrices (multidimensional arrays).

DataFrame objects can be created from: – structured type data such as lists, dicts and Numpy arrays. It can also be created using

  • Excel
  • JSON
  • CSV
  • HTML Tables
  • SQL etc.

 

In Python, the Pandas library provides a powerful data analysis toolkit, and one of its primary data structures is the DataFrame. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table, and it is commonly used for data manipulation and analysis in Python.

Here are some examples of how to create and work with a DataFrame in Pandas:

  1. Creating a DataFrame from a dictionary:

You can create a DataFrame by passing a dictionary of lists as input to the DataFrame constructor. Each key in the dictionary represents a column name, and the corresponding value is a list of values for that column. Here’s an example:

import pandas as pd

data = {'Name': ['John', 'Jane', 'Bob', 'Sara'],
        'Age': [25, 30, 21, 28],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}

df = pd.DataFrame(data)
print(df)

Output:

   Name  Age      City
0  John   25  New York
1  Jane   30    London
2   Bob   21     Paris
3  Sara   28     Tokyo
  1. Reading a DataFrame from a CSV file:

You can also create a DataFrame by reading data from a CSV file using the read_csv function. Here’s an example:

import pandas as pd

df = pd.read_csv('data.csv')
print(df)

Output:

   Name  Age      City
0  John   25  New York
1  Jane   30    London
2   Bob   21     Paris
3  Sara   28     Tokyo
  1. Accessing columns and rows:

You can access individual columns of a DataFrame using the column name as an index. Here’s an example:

import pandas as pd

df = pd.read_csv('data.csv')
print(df['Name'])

Output:

0    John
1    Jane
2     Bob
3    Sara
Name: Name, dtype: object

You can also access individual rows using the .loc attribute, which allows you to access rows by label or by integer index. Here’s an example:

import pandas as pd

df = pd.read_csv('data.csv')
print(df.loc[1])

Output:

Name       Jane
Age          30
City     London
Name: 1, dtype: object

These are just a few examples of how to create and work with a DataFrame in Pandas. The DataFrame is a powerful and flexible data structure that can be used for a wide variety of data analysis tasks in Python.

Leave a Comment