The DataFrame object is the workhorse of data analysis in pandas. It is a spreadsheet-like table of values, similar to a database table or a worksheet in an Excel spread sheet, but with more powerful indexing and data alignment features.
DataFrames can be constructed from various types of data distributed across any number of columns and rows.
They can also represent relational tables (rows as different observations over which you want to perform group by operations), as well as be used as matrices (multidimensional arrays).
DataFrame objects can be created from: – structured type data such as lists, dicts and Numpy arrays. It can also be created using
- Excel
- JSON
- CSV
- HTML Tables
- SQL etc.
In Python, the Pandas library provides a powerful data analysis toolkit, and one of its primary data structures is the DataFrame. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table, and it is commonly used for data manipulation and analysis in Python.
Here are some examples of how to create and work with a DataFrame in Pandas:
- Creating a DataFrame from a dictionary:
You can create a DataFrame by passing a dictionary of lists as input to the DataFrame constructor. Each key in the dictionary represents a column name, and the corresponding value is a list of values for that column. Here’s an example:
import pandas as pd
data = {'Name': ['John', 'Jane', 'Bob', 'Sara'],
'Age': [25, 30, 21, 28],
'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 John 25 New York
1 Jane 30 London
2 Bob 21 Paris
3 Sara 28 Tokyo
- Reading a DataFrame from a CSV file:
You can also create a DataFrame by reading data from a CSV file using the read_csv
function. Here’s an example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Output:
Name Age City
0 John 25 New York
1 Jane 30 London
2 Bob 21 Paris
3 Sara 28 Tokyo
- Accessing columns and rows:
You can access individual columns of a DataFrame using the column name as an index. Here’s an example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df['Name'])
Output:
0 John
1 Jane
2 Bob
3 Sara
Name: Name, dtype: object
You can also access individual rows using the .loc
attribute, which allows you to access rows by label or by integer index. Here’s an example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.loc[1])
Output:
Name Jane
Age 30
City London
Name: 1, dtype: object
These are just a few examples of how to create and work with a DataFrame in Pandas. The DataFrame is a powerful and flexible data structure that can be used for a wide variety of data analysis tasks in Python.