You can easily see a river getting wider as you walk downstream, but proving that relationship scientifically requires plotting the data. A scatter graph is used to plot bivariate data (data involving two numerical variables) to see if a relationship exists between them.
The independent variable (the factor chosen or controlled, like distance from the river source) goes on the x-axis. The dependent variable (the factor being measured, like river depth) is plotted on the y-axis. To highlight the central trend of the data points, you must draw a line of best fit using a ruler. This must be a single, straight line with roughly an equal number of points above and below it along its entire length; it does NOT connect the dots in a zigzag and explicitly ignores any anomaly.
To describe a relationship on a scatter graph step-by-step, use the PDA (Pattern, Data, Anomalies) technique. First, state the pattern by identifying the direction ( positive correlation goes bottom-left to top-right; negative correlation goes top-left to bottom-right) and strength (strong if clustered closely, weak if spread out). Then, quote specific data from both axes to support the pattern. Finally, identify any anomalies that do not fit the general trend.
How can geographers guess the velocity of a river at a location they haven't even visited? By using the line of best fit, you can make a trend prediction for unmeasured sites.
Interpolation involves estimating a value inside the range of your existing data points, which is generally considered reliable. Conversely, extrapolation is estimating a value outside the measured range by extending the line of best fit. Extrapolation is unreliable because you cannot guarantee the trend will continue; it might flatten or curve entirely.
To accurately position your line of best fit before predicting, it should pass through the double mean point . You can calculate these coordinates using the formula:
Worked Example: Calculating Predicted Values
Part A: Interpolation Question: Estimate the river velocity at a distance of 6 km from the source.
Step 1: Locate the known value of the independent variable on the x-axis (e.g., 6 km) which falls inside the measured data range.
Step 2: Use a ruler to draw a perfectly vertical line straight up to the line of best fit.
Step 3: Draw a perfectly horizontal line from that exact point on the line of best fit across to the y-axis.
Step 4: Read and state the predicted dependent variable value from the y-axis scale (e.g., m/s).
Part B: Extrapolation Question: Estimate the river velocity at a distance of 15 km from the source, assuming the furthest measured data point was at 12 km.
Step 1: Use a ruler to explicitly extend the straight line of best fit past the last plotted data point.
Step 2: Locate the unmeasured independent variable on the x-axis (e.g., 15 km).
Step 3: Draw a perfectly vertical line straight up to the newly extended line of best fit.
Step 4: Draw a perfectly horizontal line from that exact point across to the y-axis.
Step 5: Read and state the predicted dependent variable value from the y-axis scale (e.g., m/s).
Understanding data presentation matters because statistics can be easily manipulated to tell a fake story. Selective statistical presentation is the deliberate or accidental choice of data, scales, or sampling methods to support a specific narrative while hiding contradictions.
One common form of data manipulation is using a truncated y-axis, which starts at a number other than zero to exaggerate small differences. Misleading scales can also flatten a trend or make minor changes look dramatic through unequal intervals. Data can also be manipulated through omission (cherry-picking time periods) or by using misleading averages, such as a mean GNI per capita that hides massive internal wealth inequalities.
Sampling methods can introduce severe statistical bias that ruins the data's reliability and validity. Convenience sampling (choosing easy-to-reach sites) or using a very small sample size () means the data is not statistically representative. When evaluating a geographical conclusion, you must weigh up these weaknesses; if a conclusion relies on biased sampling or a truncated axis, that conclusion is fundamentally invalid.
Students often confuse 'correlation' with 'causation' — just because two variables show a strong trend on a scatter graph does not prove that one directly causes the other.
When asked to 'Describe' a relationship on a scatter graph, you must explicitly state both the direction (positive or negative) and the strength (strong or weak) to secure full marks.
In Paper 3 fieldwork questions, examiners frequently ask you to evaluate unfamiliar data; always check if the y-axis starts at zero, as truncated axes are a very common presentation weakness.
If asked why an extrapolated prediction is unreliable, state clearly that the trend may flatten, curve, or change entirely beyond the limits of the measured data.
Bivariate data
Data that consists of pairs of numerical observations for two variables, used to determine if a relationship exists.
Scatter graph
A graphical presentation that plots coordinate points for two variables to identify correlations.
Independent variable
The variable that stands alone and is not changed by the other variables you are measuring, usually plotted on the x-axis.
Dependent variable
The variable being tested and measured in an experiment, which responds to changes in the independent variable and is plotted on the y-axis.
Line of best fit
A single straight line drawn through the center of a group of data points on a scatter graph to show the general trend.
Anomaly
A data point that deviates significantly from the general trend or pattern shown by the rest of the data.
Positive correlation
A relationship where one variable increases as the other variable increases.
Negative correlation
A relationship where one variable decreases as the other variable increases.
Trend prediction
Using established patterns in data, often via a line of best fit, to estimate unknown values.
Interpolation
Estimating a value that falls strictly within the range of the existing data points.
Extrapolation
Estimating a value outside the range of existing data by extending the established trend line.
Selective statistical presentation
The deliberate or accidental choice of data, scales, or sampling methods to support a specific narrative.
Data manipulation
Altering or cherry-picking data to mislead the audience or hide contradictions.
Misleading scales
Axes that use unequal intervals, non-zero origins, or inappropriate sizes to distort the visual impact of data.
Statistical bias
A systematic error in data collection or sampling that results in an unfair representation of the population.
Reliability
The extent to which an investigation or data collection method would produce the same consistent results if repeated.
Validity
The extent to which collected data accurately reflects the true geographical reality.
Put your knowledge into practice — try past paper questions for Geography
Bivariate data
Data that consists of pairs of numerical observations for two variables, used to determine if a relationship exists.
Scatter graph
A graphical presentation that plots coordinate points for two variables to identify correlations.
Independent variable
The variable that stands alone and is not changed by the other variables you are measuring, usually plotted on the x-axis.
Dependent variable
The variable being tested and measured in an experiment, which responds to changes in the independent variable and is plotted on the y-axis.
Line of best fit
A single straight line drawn through the center of a group of data points on a scatter graph to show the general trend.
Anomaly
A data point that deviates significantly from the general trend or pattern shown by the rest of the data.
Positive correlation
A relationship where one variable increases as the other variable increases.
Negative correlation
A relationship where one variable decreases as the other variable increases.
Trend prediction
Using established patterns in data, often via a line of best fit, to estimate unknown values.
Interpolation
Estimating a value that falls strictly within the range of the existing data points.
Extrapolation
Estimating a value outside the range of existing data by extending the established trend line.
Selective statistical presentation
The deliberate or accidental choice of data, scales, or sampling methods to support a specific narrative.
Data manipulation
Altering or cherry-picking data to mislead the audience or hide contradictions.
Misleading scales
Axes that use unequal intervals, non-zero origins, or inappropriate sizes to distort the visual impact of data.
Statistical bias
A systematic error in data collection or sampling that results in an unfair representation of the population.
Reliability
The extent to which an investigation or data collection method would produce the same consistent results if repeated.
Validity
The extent to which collected data accurately reflects the true geographical reality.