Lecture 3: Descriptive Statistics – Mean, Median, and Variance

1. Introduction to Measures of Central Tendency

Mean (Average):

Real-Life Example: Average test scores of students in a class. If students scored 70, 75, 80, and 85:

Mean=70+75+80+854=77.5Mean=\frac{70+75+80+85}{4}=77.5

Median:

Examples:

  1. For the dataset [1,4,5,6,9][1, 4, 5, 6, 9], the median is 55.
  2. For the dataset [1,4,5,6,9,10][1, 4, 5, 6, 9, 10], the median is 5+62=5.5\frac{5+6}{2}=5.5.

2. Introduction to Variability Measures

Variance:

Standard Deviation:

Real-Life Example: Consistency of delivery times for a courier company. Two delivery teams may have the same mean time, but very different consistency (standard deviation).

Delivery times

3. Why Do These Measures Matter in Real Life?


4. Visualizing Probability Distributions with Different Variances

We'll plot normal distributions with:

This shows how data spreads more as variance increases.

Graph Sketches:

Normal distributions visualization



Exercises

1. Outlier Impact Analysis

You are given two datasets:

Tasks:

Solution
a) Calculate mean and median for both datasets:

b) The outlier (100) greatly increases the mean from 13.5 to 27.5, but does not affect the median.

c) The median better represents the "typical" value here because it is resistant to outliers, unlike the mean which is pulled up by the extreme value.

2. Comparing Two Schools

School X and School Y both claim to have "the same average SAT score." Their score distributions are:

Solution
a) Calculate mean and standard deviation (SD) for both schools:

b) School X is more consistent because it has a much smaller standard deviation (~14.14) compared to School Y (~158.11).

3. Mean vs Median

Suppose that the average income in a city is 10,000 Riyals, and the median income is 5,000 Riyals.

What can you say about the distribution of income in this city? More people are richer or poorer than the average?

Solution
The mean income (10,000 Riyals) is much higher than the median income (5,000 Riyals).
This suggests the income distribution is right-skewed, meaning that a few people earn much more than most others.
Because the mean is pulled upward by these high incomes, more people actually earn less than the average (mean).

4. Reverse Engineering a Dataset

You are told that a dataset of 5 numbers has:

Tasks:

Solution
Given mean = 10, median = 8, SD ≈ 4, and 5 numbers in the dataset:

a) One possible dataset: [5, 7, 8, 14, 16]

b) Another dataset: [6, 6, 8, 12, 18]

c) This shows that statistical summaries like mean, median, and standard deviation are not unique and many different datasets can have the same summary statistics but very different values.

Additional Advanced Exercises

1. Predicting the Effect of a Data Shift

Suppose a dataset has:

If each value in the dataset is multiplied by 2 and then 10 is added, what happens to:

Follow-up Challenge: Generalize a rule for what happens to mean, median, and standard deviation under:

2. Prove or Disprove: The Mean is Always Closer to the Data

Statement: "The mean of a dataset is always closer to the data points than the median."

Task: