Types of Data in a Dataset

When working with datasets, data is typically categorized into different types based on its characteristics and usage. Here are the main types of data found in a dataset:

1. Based on Nature of Data

a. Quantitative (Numerical) Data

  • Represents measurable quantities and can be expressed in numbers.
  • Types:
    • Discrete Data: Countable values (e.g., number of students, number of sales).
    • Continuous Data: Measured values that can take any value within a range (e.g., height, weight, temperature).

b. Qualitative (Categorical) Data

  • Represents categories or labels and cannot be measured numerically.
  • Types:
    • Nominal Data: Categories with no specific order (e.g., gender, colors, country names).
    • Ordinal Data: Categories with a meaningful order, but the difference between them is not quantifiable (e.g., satisfaction level: Poor, Good, Excellent).

2. Based on Role in Analysis

a. Independent Variables

  • Variables that influence or predict changes in other variables (e.g., study hours affecting exam scores).

b. Dependent Variables

  • Variables that depend on independent variables (e.g., exam scores based on study hours).

c. Feature Variables (Predictor Variables)

  • Inputs used in machine learning models for making predictions.

d. Target Variables (Response Variables)

  • The output variable a model is trying to predict.

3. Based on Measurement Scale (Stevens’ Classification)

a. Nominal Scale

  • Categories with no ranking (e.g., blood type, marital status).

b. Ordinal Scale

  • Categories with a ranking but no fixed interval (e.g., rating scale: 1-5 stars).

c. Interval Scale

  • Numeric data with equal intervals but no true zero (e.g., temperature in Celsius or Fahrenheit).

d. Ratio Scale

  • Numeric data with equal intervals and a true zero (e.g., weight, height, salary).

4. Based on Data Source

a. Primary Data

  • Collected firsthand through surveys, experiments, or direct measurement.

b. Secondary Data

  • Collected from existing sources like databases, reports, or published research.

5. Based on Structure

a. Structured Data

  • Organized in rows and columns (e.g., databases, Excel sheets).

b. Unstructured Data

  • Raw data without a predefined format (e.g., text, images, videos, social media posts).

c. Semi-Structured Data

  • Contains some organization but does not fit into a strict schema (e.g., JSON, XML files).