When working with datasets, data is typically categorized into different types based on its characteristics and usage. Here are the main types of data found in a dataset:
1. Based on Nature of Data
a. Quantitative (Numerical) Data
- Represents measurable quantities and can be expressed in numbers.
- Types:
- Discrete Data: Countable values (e.g., number of students, number of sales).
- Continuous Data: Measured values that can take any value within a range (e.g., height, weight, temperature).
b. Qualitative (Categorical) Data
- Represents categories or labels and cannot be measured numerically.
- Types:
- Nominal Data: Categories with no specific order (e.g., gender, colors, country names).
- Ordinal Data: Categories with a meaningful order, but the difference between them is not quantifiable (e.g., satisfaction level: Poor, Good, Excellent).
2. Based on Role in Analysis
a. Independent Variables
- Variables that influence or predict changes in other variables (e.g., study hours affecting exam scores).
b. Dependent Variables
- Variables that depend on independent variables (e.g., exam scores based on study hours).
c. Feature Variables (Predictor Variables)
- Inputs used in machine learning models for making predictions.
d. Target Variables (Response Variables)
- The output variable a model is trying to predict.
3. Based on Measurement Scale (Stevens’ Classification)
a. Nominal Scale
- Categories with no ranking (e.g., blood type, marital status).
b. Ordinal Scale
- Categories with a ranking but no fixed interval (e.g., rating scale: 1-5 stars).
c. Interval Scale
- Numeric data with equal intervals but no true zero (e.g., temperature in Celsius or Fahrenheit).
d. Ratio Scale
- Numeric data with equal intervals and a true zero (e.g., weight, height, salary).
4. Based on Data Source
a. Primary Data
- Collected firsthand through surveys, experiments, or direct measurement.
b. Secondary Data
- Collected from existing sources like databases, reports, or published research.
5. Based on Structure
a. Structured Data
- Organized in rows and columns (e.g., databases, Excel sheets).
b. Unstructured Data
- Raw data without a predefined format (e.g., text, images, videos, social media posts).
c. Semi-Structured Data
- Contains some organization but does not fit into a strict schema (e.g., JSON, XML files).