Descriptive Statistics: Central Tendency, Spread and Percentages

Measures of Central Tendency

If you measure the depth of a river at ten different points, how do you summarize the whole river with just one number? Geographers use measures of central tendency to find the "typical" or middle value in large datasets, helping to spot trends in factors like population density or rainfall.

The mean is the mathematical average, calculated by adding all values together and dividing by the total number of items ( $n$ ). The median is the middle value when all numbers are arranged in order. The mode is the most frequently occurring value in a dataset.

When dealing with grouped data in a frequency table, we cannot find an exact mode; instead, we identify the modal class, which is the category or interval with the highest frequency.

Worked Example: Mean

Data: Monthly rainfall in mm: 10, 12, 10, 15, 13.

Step 1: Write the formula:

\text{Mean} = \frac{\sum x}{n}

Step 2: Substitute the values: $(10 + 12 + 10 + 15 + 13) \div 5$
Step 3: Calculate the final answer: $60 \div 5 = \mathbf{12\text{ mm}}$

Worked Example: Median

Data: River pebble sizes in mm: 5, 8, 12, 16.
Step 1: Order the data (already done) and locate the middle. Since there is an even number of values, take the mean of the two middle values.
Step 2: Substitute the values: $(8 + 12) \div 2$
Step 3: Calculate the final answer: $\mathbf{10\text{ mm}}$

Worked Example: Mode

Data: Number of flood events per year over a decade: 1, 0, 2, 1, 1, 3, 0, 1, 2, 1.
Step 1: Count the frequency of each value. The number 1 appears five times, 0 appears twice, 2 appears twice, and 3 appears once.
Step 2: Identify the most frequently occurring value.
Step 3: Calculate the final answer: $\mathbf{1}$

Worked Example: Modal Class

Data: Elevation categories: 0–10m (Frequency: 5), 11–20m (Frequency: 12), 21–30m (Frequency: 7).
Step 1: Look for the highest frequency. The highest frequency is 12.
Step 2: Identify the corresponding class. The modal class is 11–20m.

Measures of Spread (Range and IQR)

Two cities might both have an average temperature of $15^{\circ}\text{C}$ , but one ranges from $14^{\circ}\text{C}$ to $16^{\circ}\text{C}$ while the other swings from $-5^{\circ}\text{C}$ to $35^{\circ}\text{C}$ . A measure of dispersion describes how spread out a set of data is, which helps geographers compare variability between different sites.

The range is the simplest measure of spread, calculated as the difference between the maximum and minimum values. However, the range includes extreme outliers, which can distort your understanding of the data.

The interquartile range (IQR) is a much more reliable measure because it focuses only on the middle 50% of the data, completely removing the influence of extreme outliers. It is the difference between the upper quartile (the $75\text{th}$ percentile) and the lower quartile (the $25\text{th}$ percentile).

Worked Example: Range

Data: Weekly high temperatures: $16^{\circ}\text{C}$ , $18^{\circ}\text{C}$ , $22^{\circ}\text{C}$ , $19^{\circ}\text{C}$ , $17^{\circ}\text{C}$ .
Step 1: Write the formula:

\text{Range} = \text{Maximum Value} - \text{Minimum Value}

Step 2: Identify max and min: Max is $22^{\circ}\text{C}$ , Min is $16^{\circ}\text{C}$ .
Step 3: Calculate: $22 - 16 = \mathbf{6^{\circ}\text{C}}$

Worked Example: Interquartile Range (IQR)

Data: River velocities in m/s (ordered): 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 0.9.
Step 1: Find the median (middle value): $0.5\text{ m/s}$ .
Step 2: Find the lower quartile (middle of the lower half): $0.3\text{ m/s}$ .
Step 3: Find the upper quartile (middle of the upper half): $0.8\text{ m/s}$ .
Step 4: Calculate:

\text{IQR} = \text{Upper Quartile} - \text{Lower Quartile}

Step 5: Final answer: $0.8 - 0.3 = \mathbf{0.5\text{ m/s}}$

Cumulative Frequency Graphs

Understanding exactly where the middle 50% of your data lies is crucial when analyzing thousands of pebble sizes on a beach. Cumulative frequency is a running total of frequencies up to the end of a specific class interval, allowing you to estimate medians and quartiles for grouped data.

When plotted, this data forms a stretched-S shape called an ogive. The curve must always start at the lowest value in the first interval with a cumulative frequency of 0.

A steeper section of the curve indicates a higher frequency of data clustered in that range. Because we do not know the exact raw data values within the groups, the values we extract from the graph are only estimates.

Interpreting a Cumulative Frequency Curve (Pebble Sizes)

Table Data: 0-20mm (10), 20-40mm (25), 40-60mm (30), 60-80mm (15).
Step 1: Calculate the running totals: 10, 35, 65, 80. The total frequency ( $n$ ) is 80.
Step 2: Plot the points. Crucially, you must plot at the upper class boundary for the x-coordinate: (20, 10), (40, 35), (60, 65), (80, 80).
Step 3: To find the median, calculate the position: $n \div 2 = 80 \div 2 = 40$ .
Step 4: Draw a clear construction line from 40 on the y-axis across to the curve, then down to the x-axis. The estimate is approximately $\mathbf{44\text{ mm}}$ .
Step 5: To find quartiles, use $n \div 4$ for the lower quartile (position 20) and $3n \div 4$ for the upper quartile (position 60), reading from the graph in the exact same way.

Percentage Change and Proportion

How do we fairly compare population growth between a tiny village and a massive megacity? Geographers use relative change to compare the magnitude of shifts across different scales, usually expressing it as a percentage.

Percentage change shows the relative difference between an old and new value, expressed as a fraction of the original value multiplied by 100. A positive result indicates an increase, while a negative result indicates a decrease.

Proportion describes the relationship of a specific "part" to the "whole" dataset, usually expressed as a percentage ( $(\text{Part} \div \text{Whole}) \times 100$ ).

Worked Example: Percentage Increase

Data: In 2001, UK GDP was £1.4 trillion. In 2015, it reached £1.64 trillion.
Step 1: Write the formula:

\text{Percentage Change} = \left( \frac{\text{Difference}}{\text{Original Value}} \right) \times 100

Step 2: Calculate the difference (New - Original): $1.64 - 1.4 = 0.24$ .
Step 3: Divide by the original value: $0.24 \div 1.4 = 0.1714...$
Step 4: Multiply by 100: $\mathbf{17.14\% \text{ increase}}$ .

Worked Example: Percentage Decrease

Data: A village population falls from 850 to 720.
Step 1: Calculate the difference: $720 - 850 = -130$ .
Step 2: Divide by the original starting value: $-130 \div 850 = -0.1529...$
Step 3: Multiply by 100: $\mathbf{15.29\% \text{ decrease}}$ .

Understanding Percentiles

Being in the 90th percentile for a maths test is great, but being in the 90th percentile for river pollution is disastrous. A percentile is a measure indicating the value below which a given percentage of observations fall.

Percentiles divide a dataset into 100 equal parts to show a data point's relative position compared to the rest of the distribution. For example, a country in the 10th percentile for Gross National Income (GNI) is among the poorest 10% globally.

Quartiles are simply specific benchmark percentiles. The lower quartile is the 25th percentile, the median is the 50th percentile, and the upper quartile is the 75th percentile. Hydrologists also use percentiles to predict hazards, such as using the Q95 (the river flow level equalled or exceeded 95% of the time) to identify drought conditions.

Sign up to continue reading

Get full access to revision notes, key terms, and exam tips for every subtopic.

Exam Tips

Common Mistake
When calculating percentage change, students often divide the difference by the new value instead of the original (starting) value. Always divide by the earliest year's data.
2
In 'Assess the usefulness' questions, examiners expect you to point out that the mean is easily distorted by extreme outliers, making the median a much more representative measure for skewed data.
3
When plotting a cumulative frequency graph, you MUST plot your points at the upper class boundary (the end of the interval) for the x-coordinate, never the midpoint.
4
Examiners require you to draw 'clear construction lines' (dotted lines from axes to the curve) when estimating medians or quartiles from a cumulative frequency graph to award full marks.
5
When calculating the Interquartile Range from a list, do not just subtract the position numbers (e.g., 7th minus 2nd); you must subtract the actual data values located at those positions.

Key Terms(15)

Mean: The mathematical average, calculated by adding all values together and dividing by the total number of items.
Median: The middle value in a dataset when all numbers are arranged in ascending or descending order.
Mode: The most frequently occurring value in a dataset.
Modal class: The category or interval with the highest frequency in a grouped dataset.
Measure of dispersion: A statistical term describing how spread out a set of data is.
Range: The difference between the highest and lowest values in a dataset.
Interquartile range (IQR): A measure of spread focusing on the middle 50% of the data, calculated as the upper quartile minus the lower quartile.
Upper quartile: The value at the 75th percentile of an ordered dataset, separating the highest 25% of data from the rest.
Lower quartile: The value at the 25th percentile of an ordered dataset, separating the lowest 25% of data from the rest.
Cumulative frequency: The running total of frequencies up to the end of a specific class interval.
Relative change: A term used to compare the magnitude of change across different scales, usually between an old and new value.
Percentage change: The relative change between an old and new value, expressed as a fraction of the original value multiplied by 100.
Proportion: A part, share, or number considered in comparative relation to a whole dataset.
Percentile: A measure indicating the value below which a given percentage of observations fall.
Relative position: The standing of a data point compared to the rest of the distribution, expressed as a percentage.

Previous Topic: Numerical SkillsQuantitative Analysis Next NoteBivariate Data and Evaluating Statistical Presentation

Back to Statistical Analysis

Put your knowledge into practice — try past paper questions for Geography

Key Terms(15)

Mean: The mathematical average, calculated by adding all values together and dividing by the total number of items.
Median: The middle value in a dataset when all numbers are arranged in ascending or descending order.
Mode: The most frequently occurring value in a dataset.
Modal class: The category or interval with the highest frequency in a grouped dataset.
Measure of dispersion: A statistical term describing how spread out a set of data is.
Range: The difference between the highest and lowest values in a dataset.
Interquartile range (IQR): A measure of spread focusing on the middle 50% of the data, calculated as the upper quartile minus the lower quartile.
Upper quartile: The value at the 75th percentile of an ordered dataset, separating the highest 25% of data from the rest.
Lower quartile: The value at the 25th percentile of an ordered dataset, separating the lowest 25% of data from the rest.
Cumulative frequency: The running total of frequencies up to the end of a specific class interval.
Relative change: A term used to compare the magnitude of change across different scales, usually between an old and new value.
Percentage change: The relative change between an old and new value, expressed as a fraction of the original value multiplied by 100.
Proportion: A part, share, or number considered in comparative relation to a whole dataset.
Percentile: A measure indicating the value below which a given percentage of observations fall.
Relative position: The standing of a data point compared to the rest of the distribution, expressed as a percentage.

Descriptive Statistics: Central Tendency, Spread and Percentages

Measures of Central Tendency

When dealing with grouped data in a frequency table, we cannot find an exact mode; instead, we identify the modal class, which is the category or interval with the highest frequency.

Worked Example: Mean

Data: Monthly rainfall in mm: 10, 12, 10, 15, 13.

Step 1: Write the formula:

\text{Mean} = \frac{\sum x}{n}

Step 2: Substitute the values: $(10 + 12 + 10 + 15 + 13) \div 5$
Step 3: Calculate the final answer: $60 \div 5 = \mathbf{12\text{ mm}}$

Worked Example: Median

Data: River pebble sizes in mm: 5, 8, 12, 16.
Step 1: Order the data (already done) and locate the middle. Since there is an even number of values, take the mean of the two middle values.
Step 2: Substitute the values: $(8 + 12) \div 2$
Step 3: Calculate the final answer: $\mathbf{10\text{ mm}}$

Worked Example: Mode

Data: Number of flood events per year over a decade: 1, 0, 2, 1, 1, 3, 0, 1, 2, 1.
Step 1: Count the frequency of each value. The number 1 appears five times, 0 appears twice, 2 appears twice, and 3 appears once.
Step 2: Identify the most frequently occurring value.
Step 3: Calculate the final answer: $\mathbf{1}$

Worked Example: Modal Class

Data: Elevation categories: 0–10m (Frequency: 5), 11–20m (Frequency: 12), 21–30m (Frequency: 7).
Step 1: Look for the highest frequency. The highest frequency is 12.
Step 2: Identify the corresponding class. The modal class is 11–20m.

Measures of Spread (Range and IQR)

Worked Example: Range

Data: Weekly high temperatures: $16^{\circ}\text{C}$ , $18^{\circ}\text{C}$ , $22^{\circ}\text{C}$ , $19^{\circ}\text{C}$ , $17^{\circ}\text{C}$ .
Step 1: Write the formula:

\text{Range} = \text{Maximum Value} - \text{Minimum Value}

Step 2: Identify max and min: Max is $22^{\circ}\text{C}$ , Min is $16^{\circ}\text{C}$ .
Step 3: Calculate: $22 - 16 = \mathbf{6^{\circ}\text{C}}$

Worked Example: Interquartile Range (IQR)

Data: River velocities in m/s (ordered): 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 0.9.
Step 1: Find the median (middle value): $0.5\text{ m/s}$ .
Step 2: Find the lower quartile (middle of the lower half): $0.3\text{ m/s}$ .
Step 3: Find the upper quartile (middle of the upper half): $0.8\text{ m/s}$ .
Step 4: Calculate:

\text{IQR} = \text{Upper Quartile} - \text{Lower Quartile}

Step 5: Final answer: $0.8 - 0.3 = \mathbf{0.5\text{ m/s}}$

Cumulative Frequency Graphs

When plotted, this data forms a stretched-S shape called an ogive. The curve must always start at the lowest value in the first interval with a cumulative frequency of 0.

Interpreting a Cumulative Frequency Curve (Pebble Sizes)

Table Data: 0-20mm (10), 20-40mm (25), 40-60mm (30), 60-80mm (15).
Step 1: Calculate the running totals: 10, 35, 65, 80. The total frequency ( $n$ ) is 80.
Step 2: Plot the points. Crucially, you must plot at the upper class boundary for the x-coordinate: (20, 10), (40, 35), (60, 65), (80, 80).
Step 3: To find the median, calculate the position: $n \div 2 = 80 \div 2 = 40$ .
Step 4: Draw a clear construction line from 40 on the y-axis across to the curve, then down to the x-axis. The estimate is approximately $\mathbf{44\text{ mm}}$ .
Step 5: To find quartiles, use $n \div 4$ for the lower quartile (position 20) and $3n \div 4$ for the upper quartile (position 60), reading from the graph in the exact same way.

Percentage Change and Proportion

Proportion describes the relationship of a specific "part" to the "whole" dataset, usually expressed as a percentage ( $(\text{Part} \div \text{Whole}) \times 100$ ).

Worked Example: Percentage Increase

Data: In 2001, UK GDP was £1.4 trillion. In 2015, it reached £1.64 trillion.
Step 1: Write the formula:

\text{Percentage Change} = \left( \frac{\text{Difference}}{\text{Original Value}} \right) \times 100

Step 2: Calculate the difference (New - Original): $1.64 - 1.4 = 0.24$ .
Step 3: Divide by the original value: $0.24 \div 1.4 = 0.1714...$
Step 4: Multiply by 100: $\mathbf{17.14\% \text{ increase}}$ .

Worked Example: Percentage Decrease

Data: A village population falls from 850 to 720.
Step 1: Calculate the difference: $720 - 850 = -130$ .
Step 2: Divide by the original starting value: $-130 \div 850 = -0.1529...$
Step 3: Multiply by 100: $\mathbf{15.29\% \text{ decrease}}$ .

Understanding Percentiles

Sign up to continue reading

Get full access to revision notes, key terms, and exam tips for every subtopic.

Exam Tips

Common Mistake
When calculating percentage change, students often divide the difference by the new value instead of the original (starting) value. Always divide by the earliest year's data.
2
In 'Assess the usefulness' questions, examiners expect you to point out that the mean is easily distorted by extreme outliers, making the median a much more representative measure for skewed data.
3
When plotting a cumulative frequency graph, you MUST plot your points at the upper class boundary (the end of the interval) for the x-coordinate, never the midpoint.
4
Examiners require you to draw 'clear construction lines' (dotted lines from axes to the curve) when estimating medians or quartiles from a cumulative frequency graph to award full marks.
5
When calculating the Interquartile Range from a list, do not just subtract the position numbers (e.g., 7th minus 2nd); you must subtract the actual data values located at those positions.

Key Terms(15)

Mean: The mathematical average, calculated by adding all values together and dividing by the total number of items.
Median: The middle value in a dataset when all numbers are arranged in ascending or descending order.
Mode: The most frequently occurring value in a dataset.
Modal class: The category or interval with the highest frequency in a grouped dataset.
Measure of dispersion: A statistical term describing how spread out a set of data is.
Range: The difference between the highest and lowest values in a dataset.
Interquartile range (IQR): A measure of spread focusing on the middle 50% of the data, calculated as the upper quartile minus the lower quartile.
Upper quartile: The value at the 75th percentile of an ordered dataset, separating the highest 25% of data from the rest.
Lower quartile: The value at the 25th percentile of an ordered dataset, separating the lowest 25% of data from the rest.
Cumulative frequency: The running total of frequencies up to the end of a specific class interval.
Relative change: A term used to compare the magnitude of change across different scales, usually between an old and new value.
Percentage change: The relative change between an old and new value, expressed as a fraction of the original value multiplied by 100.
Proportion: A part, share, or number considered in comparative relation to a whole dataset.
Percentile: A measure indicating the value below which a given percentage of observations fall.
Relative position: The standing of a data point compared to the rest of the distribution, expressed as a percentage.

Previous Topic: Numerical SkillsQuantitative Analysis Next NoteBivariate Data and Evaluating Statistical Presentation

Back to Statistical Analysis

Put your knowledge into practice — try past paper questions for Geography

Key Terms(15)

Mean: The mathematical average, calculated by adding all values together and dividing by the total number of items.
Median: The middle value in a dataset when all numbers are arranged in ascending or descending order.
Mode: The most frequently occurring value in a dataset.
Modal class: The category or interval with the highest frequency in a grouped dataset.
Measure of dispersion: A statistical term describing how spread out a set of data is.
Range: The difference between the highest and lowest values in a dataset.
Interquartile range (IQR): A measure of spread focusing on the middle 50% of the data, calculated as the upper quartile minus the lower quartile.
Upper quartile: The value at the 75th percentile of an ordered dataset, separating the highest 25% of data from the rest.
Lower quartile: The value at the 25th percentile of an ordered dataset, separating the lowest 25% of data from the rest.
Cumulative frequency: The running total of frequencies up to the end of a specific class interval.
Relative change: A term used to compare the magnitude of change across different scales, usually between an old and new value.
Percentage change: The relative change between an old and new value, expressed as a fraction of the original value multiplied by 100.
Proportion: A part, share, or number considered in comparative relation to a whole dataset.
Percentile: A measure indicating the value below which a given percentage of observations fall.
Relative position: The standing of a data point compared to the rest of the distribution, expressed as a percentage.