Descriptive Statistics

Descriptive statistics is a branch of statistics that summarizes and describes the main features of a dataset.

  • It does not involve prediction or inference.
  • Focuses on understanding data characteristics through numerical summaries and visualizations.
  • It is a key step in Exploratory Data Analysis (EDA).

EDA = Descriptive Statistics + Data Visualization


Dataset and Data Types

A dataset is a collection of data objects (examples: records, patterns, samples, observations).
Each data object is described by attributes (features or properties).

Example: Sample Patient Table (Table 2.2)

Patient IDNameAgeBlood TestFeverDisease
1John21NegativeLowNo
2Andre36PositiveHighYes

Types of Data (Figure 2.1)

Data is broadly classified into:

1. Categorical (Qualitative) Data

  • Describes qualities or labels, not measurable.
  • Further classified into:
    • Nominal Data: No inherent order.
      e.g., Patient ID, Blood Group, Gender.
      • Only equality comparisons (=, ) are valid.
    • Ordinal Data: Has a meaningful order.
      e.g., Fever = {Low, Medium, High}
      • Can be ranked, but exact differences are unknown.

2. Numerical (Quantitative) Data

  • Represents measurable quantities.
  • Subtypes:
    • Interval Data: Numeric values with meaningful differences, but no true zero.
      e.g., Temperature in Celsius or Fahrenheit.
      • Operations allowed: +,
    • Ratio Data: Has a meaningful zero and allows all mathematical operations.
      e.g., Age, Height, Weight.

Based on Values

TypeDescriptionExample
DiscreteCountable integersEmployee ID, Survey scores
ContinuousValues with decimal precision, measurableAge (e.g., 12.5), Height, Weight

Based on Number of Variables (Figure 2.2)

TypeDescription
UnivariateOne variable per record
BivariateTwo variables
MultivariateThree or more variables

Leave a Reply

Your email address will not be published. Required fields are marked *