GradeGen.AI

Measures of Central Tendency

When investigating the size of pebbles on a beach, you will collect dozens of measurements that are hard to interpret just by looking at the raw numbers. Measures of central tendency help by identifying the "midpoint" or most typical value in your geographical dataset.

The mean ( $\bar{x}$ ) is the arithmetic average. It uses every single piece of data, making it useful but highly sensitive to extreme values.
The median is the middle value when all the data is ranked in ascending or descending order.
The mode is the specific value that occurs most frequently in a dataset.
For data that has been grouped into broad categories (such as " $10$ – $19$ mm"), the modal class is the category with the highest frequency.

Calculate the mean, median, and mode for these river depth measurements (in meters): 0.4, 0.5, 0.5, 0.7, 0.9

Step 1: Calculate the mean by summing all values and dividing by the total number of items ( $n$ ).

$\text{Mean} = \frac{0.4 + 0.5 + 0.5 + 0.7 + 0.9}{5}$

Step 2: Calculate the final mean answer with units.

$\text{Mean} = \frac{3.0}{5} = 0.60$ m

Step 3: Find the median by ranking the data and finding the middle position.

$\text{Position} = \frac{n + 1}{2} = \frac{5 + 1}{2} = 3^{\text{rd}}$ value

Step 4: Identify the 3rd value in the ranked list.

$0.4, 0.5, \mathbf{0.5}, 0.7, 0.9$
$\text{Median} = 0.5$ m

Step 5: Identify the mode by finding the most frequent value.

$0.5$ appears twice, more than any other value.
$\text{Mode} = 0.5$ m

Measures of Spread: Range and Inter-quartile Range

Knowing the average temperature of a city does not tell you if the climate is stable year-round or features freezing winters and scorching summers. Measures of dispersion describe how spread out your data is around the central value.

The range is the difference between the maximum and minimum values in the dataset.
The lower quartile represents the 25th percentile (the median of the lower half of the data).
The upper quartile represents the 75th percentile (the median of the upper half).
The inter-quartile range (IQR) is the difference between the upper and lower quartiles ( $Q3 - Q1$ ).

The IQR is a highly reliable measure because it effectively ignores the top 25% and bottom 25% of values, meaning it is not distorted by extreme extremes.

Worked Example: Measures of Spread

Calculate the range and IQR for these pebble roundness scores: 2, 3, 5, 6, 7, 10, 11

Step 1: Ensure the data is ranked in ascending order and identify $n$ .

The data is already ranked, and $n = 7$ .

Step 2: Find the lower quartile (LQ) using the position formula.

$\text{LQ Position} = \frac{n + 1}{4} = \frac{7 + 1}{4} = 2^{\text{nd}}$ value

Step 3: Find the upper quartile (UQ) using the position formula.

$\text{UQ Position} = \frac{3(n + 1)}{4} = \frac{3(8)}{4} = 6^{\text{th}}$ value

Step 4: Calculate the inter-quartile range (IQR).

$\text{IQR} = \text{UQ} - \text{LQ}$
$\text{IQR} = 10 - 3 = 7$

Step 5: Calculate the range.

$\text{Range} = \text{Maximum} - \text{Minimum}$
$\text{Range} = 11 - 2 = 9$

Cumulative Frequency

Counting individual pebbles along a 5 km stretch of coastline generates far too much data to analyze number by number. Cumulative frequency is used to handle large grouped datasets by keeping a running total of the frequencies.

When plotted on a graph, cumulative frequency typically forms an 'S-shaped' curve called an ogive. You must always plot the points at the upper class boundary (the end of the interval) of each group, never at the midpoint. You will often need to find percentiles like the median from these graphs.

Worked Example: Cumulative Frequency

Construct a cumulative frequency table for the following wind speed data (in km/h):

Wind Speed ( $\text{km/h}$ )	Frequency	Cumulative Frequency
$0 < x \le 10$	5	5
$10 < x \le 20$

To find values from a cumulative frequency graph (where total $n = 40$ ):

Step 1: Find the median ( $Q2$ ) by locating the 50th percentile.

$\text{Position} = n / 2 = 40 / 2 = 20^{\text{th}}$ value. Draw a horizontal line from 20 on the $y$ -axis to the curve, then down to the $x$ -axis.

Step 2: Find the lower quartile (LQ).

$\text{Position} = n / 4 = 40 / 4 = 10^{\text{th}}$ value.

Step 3: Find the upper quartile (UQ).

$\text{Position} = 3n / 4 = 3(40) / 4 = 30^{\text{th}}$ value.

Choosing the Appropriate Measure (Outliers and Skew)

If ten people in a room earn an average wage and one billionaire walks in, the average wealth of the room skyrockets, even though nobody else got richer. This highlights why choosing the correct statistical measure is critical in geographical analysis.

An outlier (or anomaly) is an extreme data value that does not fit the general pattern. A dataset with data clustered at one end is known as a skewed distribution.

The mean uses every value, making it highly sensitive to outliers. A single day of extreme rainfall will pull the mean away from the center, creating a misleading average. Therefore, the mean is only appropriate for normal (symmetrical) distributions with a small range.

The median and IQR are position-based, meaning they are resistant to extreme outliers. They ignore the highest and lowest values, making them much more appropriate when analyzing skewed data or data containing anomalies, such as urban household income or pebble size. The range is heavily distorted by outliers, so the IQR is always a more reliable measure of spread.

When identifying the modal class, candidates often mistakenly write down the highest frequency number (e.g., '18') instead of the actual category name. Always write the category name (e.g., '20–30 km/h').

In Paper 3 fieldwork questions, use the terms 'atypical' or 'unrepresentative' to describe a mean that has been heavily distorted by extreme outliers.

When reading percentiles from a cumulative frequency graph, examiners require clear evidence of your method; always draw dashed lines from the y-axis to the curve and down to the x-axis, and include units in your final answer.

If asked to justify why the inter-quartile range (IQR) is 'better' than the range, state that the IQR ignores the top and bottom 25% of data, making it completely resistant to distortion from outliers.

Do not waste time trying to calculate standard deviation, as it is not required for OCR Geography B; focus exclusively on the IQR and range for your measures of spread.

Statistical values that identify the midpoint or most typical value in a dataset.

The arithmetic average calculated by summing all data values and dividing by the total number of values.

The middle value in a dataset when all values are arranged in rank order.

The specific value that occurs most frequently in a dataset.

The class interval or category that contains the highest frequency in a grouped frequency table.

Statistical values that describe how spread out data is around a central value.

The difference between the maximum and minimum values in a dataset.

The value representing the 25th percentile of the data, effectively the median of the lower half.

The value representing the 75th percentile of the data, effectively the median of the upper half.

The difference between the upper quartile and the lower quartile, representing the spread of the middle 50% of the dataset.

A running total of the frequencies in a dataset, typically plotted as an S-shaped curve.

An extreme data value that does not fit the general pattern or trend of the dataset.

A result that does not fit the pattern, which could be a valid extreme event or a measurement error.

An asymmetrical distribution where data is clustered heavily at one end rather than evenly spread.