Data Organization and Cleaning in MS Excel

Topics covered in this lesson are:

  • Sorting and filtering data
  • Removing duplicates
  • Text functions: LEFT, RIGHT, MID, TRIM, CONCATENATE

Here’s a sample dataset for the Data Organization and Cleaning chapter and an explanation of the steps for sorting, filtering, removing duplicates, and using text functions:


Sample Data: Sales Records

IDNameProductSalesRegionDate
101John SmithLaptop1200North2023-12-01
102Mary JohnsonMobile900South2023-12-01
103John SmithLaptop1200North2023-12-01
104Alex BrownMonitor450West2023-12-02
105Chris LeeTablet600East2023-12-02

1. Sorting and Filtering Data

Sorting:

  • Objective: Sort the data by Sales (highest to lowest) or by Region alphabetically.
  • Steps:
    1. Select the dataset (A1:E6).
    2. Go to the Data Tab → Click Sort.
    3. Choose the column:
      • For Sales, sort Largest to Smallest.
      • For Region, sort A to Z.

Filtering:

  • Objective: Filter records where Region is North.
  • Steps:
    1. Select the dataset.
    2. Go to the Data Tab → Click Filter.
    3. Use the dropdown arrow on the Region column → Select North.

2. Removing Duplicates

  • Objective: Remove duplicate rows from the dataset (e.g., duplicate sales records by ID).
  • Steps:
    1. Select the dataset.
    2. Go to the Data Tab → Click Remove Duplicates.
    3. In the pop-up window, select columns to check for duplicates (e.g., ID, Name, Product, Sales, and Region).
    4. Click OK. The duplicate row for John Smith is removed.

3. Text Functions

Use Case: Cleaning and Organizing Text Data

FunctionFormulaOutput
LEFT=LEFT(B2,4)“John”
RIGHT=RIGHT(B2,5)“Smith”
MID=MID(B2,6,5)“Smith”
TRIM=TRIM(" Laptop ")“Laptop”
CONCATENATE=CONCATENATE(C2, " - ", D2)“Laptop – 1200”

Steps:

  1. LEFT: Extracts a specified number of characters from the start of a text string.
    • Formula: =LEFT(B2,4) gives “John” from Name.
  2. RIGHT: Extracts a specified number of characters from the end of a text string.
    • Formula: =RIGHT(B2,5) gives “Smith” from Name.
  3. MID: Extracts characters from the middle of a text string.
    • Formula: =MID(B2,6,5) gives “Smith” (starting at character 6, length 5).
  4. TRIM: Removes unnecessary spaces from text.
    • Formula: =TRIM(" Laptop ") removes extra spaces around “Laptop”.
  5. CONCATENATE: Combines multiple text strings into one.
    • Formula: =CONCATENATE(C2, " - ", D2) combines Product and Sales into “Laptop – 1200”.

Practical Example of Steps

Using the Sales Records dataset:

  1. Sorting: Arrange the dataset by Sales to find the highest-selling product.
  2. Filtering: Filter for the North Region to analyze its performance.
  3. Removing Duplicates: Ensure only unique sales entries by removing duplicates.
  4. Text Functions: Use LEFT, RIGHT, MID, TRIM, and CONCATENATE to clean and reorganize the Name and Product fields.