GradeGen.AI

What are Measures of Spread?

Two classes can have the exact same median test score of 50%, but in one class everyone scored between 45% and 55%, while in the other scores ranged from 10% to 90%. Measures of spread describe this "dispersion" or "consistency" of a data set. A smaller spread means the data points are more consistent. Both the range and the interquartile range have the same units as the original data.

The Range is the difference between the highest (maximum) value and the lowest (minimum) value in a data set.
The Interquartile Range (IQR) is the difference between the upper quartile and the lower quartile ( $Q_3 - Q_1$ ), representing the spread of the middle 50% of the data.

The Impact of Outliers

An outlier is an extreme value that does not fit the general pattern of the data. The range is highly sensitive to outliers because it relies exclusively on the highest and lowest values.

When a data set contains extreme outliers, the range "distorts" the view of the data by stretching the distance without reflecting the rest of the set. The IQR is much more robust because it ignores the extreme top and bottom 25%.

Data Set A: $2, 4, 5, 6, 8, 10, 12$ $2, 4, 5, 6, 8, 10, 12$
- $\text{Range} = 12 - 2 = 10$
- $\text{IQR} = 10 - 4 = 6$
Data Set B (with an outlier): $2, 4, 5, 6, 8, 10, 80$ $2, 4, 5, 6, 8, 10, 80$
- $\text{Range} = 80 - 2 = 78$

Calculating Quartiles from Discrete Data

Quartiles divide an ordered data set into four equal parts. The Lower Quartile ( $Q_1$ ) is the value one-quarter of the way through the data, and the Upper Quartile ( $Q_3$ ) is three-quarters of the way through.

For discrete lists of data, Edexcel accepts finding the median of the lower and upper halves of the data, or using the position formulas:

Position of $Q_1$ : $\frac{n+1}{4}\text{th term}$
Position of :

Worked Example: Discrete Data

Find the range, $Q_1$ , $Q_3$ , and the IQR for the following data set: $3, 5, 8, 12, 15, 18, 20$ .

Step 1: Calculate the Range.

$\text{Range} = \text{Maximum} - \text{Minimum}$
$\text{Range} = 20 - 3 = 17$

Step 2: Find the median to split the data into halves.

There are 7 values ( $n=7$ ).
The median is the middle value: $12$ .

Step 3: Find $Q_1$ and $Q_3$ using the lower and upper halves.

Lower half is $3, 5, 8$ . The median of this half ( $Q_1$ ) is $5$ .
Upper half is $15, 18, 20$ . The median of this half () is .

Step 4: Calculate the IQR.

$\text{IQR} = Q_3 - Q_1$
$\text{IQR} = 18 - 5 = 13$

Estimating Spread from Grouped Data

When data is grouped into classes, the exact individual values are lost. We can only calculate an Estimated Range by finding the difference between the upper boundary of the highest class and the lower boundary of the lowest class.

Worked Example: Estimating Range from a Table

Find the estimated range for the following masses:

Mass ( $m$ kg)	Frequency
$0 < m \leq 10$	5
$10 < m \leq 20$	12

Step 1: Identify the lowest and highest boundaries.

Lower boundary of the lowest class $= 0$
Upper boundary of the highest class $= 30$

Step 2: Calculate the Estimated Range.

$\text{Estimated Range} = 30 - 0 = 30\text{ kg}$

Estimating Quartiles from a Cumulative Frequency Graph

To find quartiles for grouped data, you must use a Cumulative Frequency Graph.

Find the $Q_1$ position at $\frac{n}{4}$ on the frequency ( $y$ ) axis.

Worked Example: IQR from a Cumulative Frequency Graph

A cumulative frequency graph shows the test marks of 80 students. Estimate the IQR if the curve crosses the $\frac{n}{4}$ frequency line at 32 marks and the $\frac{3n}{4}$ frequency line at 58 marks.

Step 1: Identify the positions on the $y$ -axis.

$n = 80$
$Q_1$ position $= \frac{80}{4} = 20\text{th value}$

Step 2: Read the corresponding values from the $x$ -axis.

Reading across from 20 on the $y$ -axis gives $Q_1 = 32\text{ marks}$ .
Reading across from 60 on the $y$ -axis gives $Q_3 = 58\text{ marks}$ .

Step 3: Calculate the IQR.

$\text{IQR} = Q_3 - Q_1$
$\text{IQR} = 58 - 32 = 26\text{ marks}$

Identifying Outliers Mathematically (Higher Tier)

At Higher Tier, you must use a specific mathematical rule to provide an objective boundary, determining exactly which values are outliers.

An extreme value is an outlier if it falls outside the Lower Boundary or the Upper Boundary:

\text{Lower Boundary} = Q_1 - (1.5 \times \text{IQR})

\text{Upper Boundary} = Q_3 + (1.5 \times \text{IQR})

If asked to draw a box plot, any calculated outliers must be plotted as an 'x' or an asterisk (*). The whiskers should only extend to the smallest and largest values that are not outliers.

Worked Example: Identifying Outliers

A data set has $Q_1 = 14$ and $Q_3 = 17$ . The maximum value in the data set is $22$ . Determine mathematically if is an outlier.

Step 1: Calculate the IQR.

$\text{IQR} = 17 - 14 = 3$

Step 2: Calculate the upper boundary.

$\text{Upper Boundary} = Q_3 + (1.5 \times \text{IQR})$
$\text{Upper Boundary} = 17 + (1.5 \times 3)$

Step 3: Compare the value to the boundary.

$22 > 21.5$
Therefore, $22$ is an outlier.

Students often identify the position of the quartile (e.g., the 5th term) but forget to go back to the data set to find the actual value at that position.

When asked to compare two data sets in an exam, you must explicitly comment on one measure of average AND one measure of spread, making sure to relate your answer back to the real-life context of the question.

Use phrases like 'more consistent' or 'less spread out' to describe the data set that has the lower IQR or range.

In cumulative frequency graphs, always double-check the scale on the x and y axes before reading your quartiles. A common error is assuming one small square equals 1 unit when it actually equals 2.

If an exam question asks you to justify if a value is an outlier, you must explicitly show the Q₃ + (1.5 x IQR) calculation. Stating it is an outlier 'by inspection' will score zero method marks.

The difference between the highest (maximum) value and the lowest (minimum) value in a data set.

The difference between the upper quartile and the lower quartile (Q₃ - Q₁), representing the spread of the middle 50% of the data.

The value one-quarter (25%) of the way through an ordered data set.

The value three-quarters (75%) of the way through an ordered data set.

An extreme value that does not fit the general pattern of the data.

A running total graph used to estimate medians and quartiles for grouped continuous data.

The difference between the upper boundary of the highest class and the lower boundary of the lowest class in grouped data.

The threshold calculated as Q₁ - (1.5 x IQR), below which a data value is considered a small outlier.

The threshold calculated as Q₃ + (1.5 x IQR), above which a data value is considered a large outlier.