When investigating the size of pebbles on a beach, you will collect dozens of measurements that are hard to interpret just by looking at the raw numbers. Measures of central tendency help by identifying the "midpoint" or most typical value in your geographical dataset.
Calculate the mean, median, and mode for these river depth measurements (in meters): 0.4, 0.5, 0.5, 0.7, 0.9
Step 1: Calculate the mean by summing all values and dividing by the total number of items ().
Step 2: Calculate the final mean answer with units.
Step 3: Find the median by ranking the data and finding the middle position.
Step 4: Identify the 3rd value in the ranked list.
Step 5: Identify the mode by finding the most frequent value.
Knowing the average temperature of a city does not tell you if the climate is stable year-round or features freezing winters and scorching summers. Measures of dispersion describe how spread out your data is around the central value.
The IQR is a highly reliable measure because it effectively ignores the top 25% and bottom 25% of values, meaning it is not distorted by extreme extremes.
Calculate the range and IQR for these pebble roundness scores: 2, 3, 5, 6, 7, 10, 11
Step 1: Ensure the data is ranked in ascending order and identify .
Step 2: Find the lower quartile (LQ) using the position formula.
Step 3: Find the upper quartile (UQ) using the position formula.
Step 4: Calculate the inter-quartile range (IQR).
Step 5: Calculate the range.
Counting individual pebbles along a 5 km stretch of coastline generates far too much data to analyze number by number. Cumulative frequency is used to handle large grouped datasets by keeping a running total of the frequencies.
When plotted on a graph, cumulative frequency typically forms an 'S-shaped' curve called an ogive. You must always plot the points at the upper class boundary (the end of the interval) of each group, never at the midpoint. You will often need to find percentiles like the median from these graphs.
Construct a cumulative frequency table for the following wind speed data (in km/h):
| Wind Speed () | Frequency | Cumulative Frequency |
|---|---|---|
| 5 | 5 | |
To find values from a cumulative frequency graph (where total ):
Step 1: Find the median () by locating the 50th percentile.
Step 2: Find the lower quartile (LQ).
Step 3: Find the upper quartile (UQ).
If ten people in a room earn an average wage and one billionaire walks in, the average wealth of the room skyrockets, even though nobody else got richer. This highlights why choosing the correct statistical measure is critical in geographical analysis.
An outlier (or anomaly) is an extreme data value that does not fit the general pattern. A dataset with data clustered at one end is known as a skewed distribution.
The mean uses every value, making it highly sensitive to outliers. A single day of extreme rainfall will pull the mean away from the center, creating a misleading average. Therefore, the mean is only appropriate for normal (symmetrical) distributions with a small range.
The median and IQR are position-based, meaning they are resistant to extreme outliers. They ignore the highest and lowest values, making them much more appropriate when analyzing skewed data or data containing anomalies, such as urban household income or pebble size. The range is heavily distorted by outliers, so the IQR is always a more reliable measure of spread.
When identifying the modal class, candidates often mistakenly write down the highest frequency number (e.g., '18') instead of the actual category name. Always write the category name (e.g., '20–30 km/h').
In Paper 3 fieldwork questions, use the terms 'atypical' or 'unrepresentative' to describe a mean that has been heavily distorted by extreme outliers.
When reading percentiles from a cumulative frequency graph, examiners require clear evidence of your method; always draw dashed lines from the y-axis to the curve and down to the x-axis, and include units in your final answer.
If asked to justify why the inter-quartile range (IQR) is 'better' than the range, state that the IQR ignores the top and bottom 25% of data, making it completely resistant to distortion from outliers.
Do not waste time trying to calculate standard deviation, as it is not required for OCR Geography B; focus exclusively on the IQR and range for your measures of spread.
Measures of central tendency
Statistical values that identify the midpoint or most typical value in a dataset.
Mean
The arithmetic average calculated by summing all data values and dividing by the total number of values.
Median
The middle value in a dataset when all values are arranged in rank order.
Mode
The specific value that occurs most frequently in a dataset.
Modal class
The class interval or category that contains the highest frequency in a grouped frequency table.
Measures of dispersion
Statistical values that describe how spread out data is around a central value.
Range
The difference between the maximum and minimum values in a dataset.
Lower quartile
The value representing the 25th percentile of the data, effectively the median of the lower half.
Upper quartile
The value representing the 75th percentile of the data, effectively the median of the upper half.
Inter-quartile range (IQR)
The difference between the upper quartile and the lower quartile, representing the spread of the middle 50% of the dataset.
Cumulative frequency
A running total of the frequencies in a dataset, typically plotted as an S-shaped curve.
Outlier
An extreme data value that does not fit the general pattern or trend of the dataset.
Anomaly
A result that does not fit the pattern, which could be a valid extreme event or a measurement error.
Skewed distribution
An asymmetrical distribution where data is clustered heavily at one end rather than evenly spread.
Put your knowledge into practice — try past paper questions for Geography B
Measures of central tendency
Statistical values that identify the midpoint or most typical value in a dataset.
Mean
The arithmetic average calculated by summing all data values and dividing by the total number of values.
Median
The middle value in a dataset when all values are arranged in rank order.
Mode
The specific value that occurs most frequently in a dataset.
Modal class
The class interval or category that contains the highest frequency in a grouped frequency table.
Measures of dispersion
Statistical values that describe how spread out data is around a central value.
Range
The difference between the maximum and minimum values in a dataset.
Lower quartile
The value representing the 25th percentile of the data, effectively the median of the lower half.
Upper quartile
The value representing the 75th percentile of the data, effectively the median of the upper half.
Inter-quartile range (IQR)
The difference between the upper quartile and the lower quartile, representing the spread of the middle 50% of the dataset.
Cumulative frequency
A running total of the frequencies in a dataset, typically plotted as an S-shaped curve.
Outlier
An extreme data value that does not fit the general pattern or trend of the dataset.
Anomaly
A result that does not fit the pattern, which could be a valid extreme event or a measurement error.
Skewed distribution
An asymmetrical distribution where data is clustered heavily at one end rather than evenly spread.
| 12 |
| 18 |
| 5 |