Topics covered in this lesson are:
- Sorting and filtering data
- Removing duplicates
- Text functions: LEFT, RIGHT, MID, TRIM, CONCATENATE
Here’s a sample dataset for the Data Organization and Cleaning chapter and an explanation of the steps for sorting, filtering, removing duplicates, and using text functions:
Sample Data: Sales Records
ID | Name | Product | Sales | Region | Date |
---|---|---|---|---|---|
101 | John Smith | Laptop | 1200 | North | 2023-12-01 |
102 | Mary Johnson | Mobile | 900 | South | 2023-12-01 |
103 | John Smith | Laptop | 1200 | North | 2023-12-01 |
104 | Alex Brown | Monitor | 450 | West | 2023-12-02 |
105 | Chris Lee | Tablet | 600 | East | 2023-12-02 |
1. Sorting and Filtering Data
Sorting:
- Objective: Sort the data by Sales (highest to lowest) or by Region alphabetically.
- Steps:
- Select the dataset (A1:E6).
- Go to the Data Tab → Click Sort.
- Choose the column:
- For Sales, sort Largest to Smallest.
- For Region, sort A to Z.
Filtering:
- Objective: Filter records where Region is North.
- Steps:
- Select the dataset.
- Go to the Data Tab → Click Filter.
- Use the dropdown arrow on the Region column → Select North.
2. Removing Duplicates
- Objective: Remove duplicate rows from the dataset (e.g., duplicate sales records by ID).
- Steps:
- Select the dataset.
- Go to the Data Tab → Click Remove Duplicates.
- In the pop-up window, select columns to check for duplicates (e.g., ID, Name, Product, Sales, and Region).
- Click OK. The duplicate row for John Smith is removed.
3. Text Functions
Use Case: Cleaning and Organizing Text Data
Function | Formula | Output |
---|---|---|
LEFT | =LEFT(B2,4) | “John” |
RIGHT | =RIGHT(B2,5) | “Smith” |
MID | =MID(B2,6,5) | “Smith” |
TRIM | =TRIM(" Laptop ") | “Laptop” |
CONCATENATE | =CONCATENATE(C2, " - ", D2) | “Laptop – 1200” |
Steps:
- LEFT: Extracts a specified number of characters from the start of a text string.
- Formula:
=LEFT(B2,4)
gives “John” from Name.
- Formula:
- RIGHT: Extracts a specified number of characters from the end of a text string.
- Formula:
=RIGHT(B2,5)
gives “Smith” from Name.
- Formula:
- MID: Extracts characters from the middle of a text string.
- Formula:
=MID(B2,6,5)
gives “Smith” (starting at character 6, length 5).
- Formula:
- TRIM: Removes unnecessary spaces from text.
- Formula:
=TRIM(" Laptop ")
removes extra spaces around “Laptop”.
- Formula:
- CONCATENATE: Combines multiple text strings into one.
- Formula:
=CONCATENATE(C2, " - ", D2)
combines Product and Sales into “Laptop – 1200”.
- Formula:
Practical Example of Steps
Using the Sales Records dataset:
- Sorting: Arrange the dataset by Sales to find the highest-selling product.
- Filtering: Filter for the North Region to analyze its performance.
- Removing Duplicates: Ensure only unique sales entries by removing duplicates.
- Text Functions: Use
LEFT
,RIGHT
,MID
,TRIM
, andCONCATENATE
to clean and reorganize the Name and Product fields.