When geographers analyse data, they often look at bivariate data to see if two different variables are connected. The independent (explanatory) variable is plotted on the x-axis, while the dependent (response) variable goes on the y-axis.
When examining a table, you must identify patterns, the data range, and any clear deviations (anomalies).
| Site Number | Distance from Source () | River Velocity () |
|---|---|---|
| 1 | 0.5 | 0.22 |
| 2 | 2.4 | 0.45 |
| 3 | 5.1 | 0.38 (Anomaly) |
| 4 | 8.2 | 0.72 |
| 5 | 11.5 | 0.89 |
To describe a relationship, you must state three things: direction, strength, and whether a correlation exists.
Correlation vs Causality: It is vital to distinguish between correlation (a mathematical link) and causality (where one variable directly causes the other to change). For example, there may be a strong positive correlation between ice cream sales and shark attacks, but ice cream does not cause shark attacks. Instead, a third factor—warm weather—causes both to increase.
A line of best fit (or trend line) represents the "mean path" of data points. When you are asked to sketch a scatter plot, it must be an approximate diagram showing key features with labels.
Key Features of a Scatter Plot Sketch:
Statistical presentations and calculations often have limitations that reduce their reliability:
To evaluate a conclusion or hypothesis, you must provide arguments for and against it using numerical evidence, ending with a balanced judgement.
Hypothesis: "River bedload size decreases as distance from the source increases."
| Evidence For the Hypothesis | Evidence Against the Hypothesis |
|---|---|
| At Site 1 (), the bedload was , decreasing to at Site 5 (). | At Site 3 (), the bedload increased to , contradicting the downstream trend. |
Balanced Concluding Judgement: The hypothesis is mostly supported because the overall percentage change was a decrease in size. However, the anomaly at Site 3 suggests that local factors can interrupt the trend. In this case, using the median bedload size across all sites would provide a more reliable "typical" value than the mean, as it would ignore the impact of the Site 3 outlier.
Formula:
Don't just say a graph is 'wrong'. Use specific OCR terms like 'truncated axis' or 'unrepresentative sample size' to explain WHY it is misleading.
In 'Evaluate' questions, you must provide a 'Concluding Judgement'. State clearly whether you 'fully support', 'partially support', or 'reject' the hypothesis based on the data provided.
When sketching a line of best fit, ignore any obvious anomalies. If you include the anomaly in your 'average', your line will be skewed and inaccurate.
Always check if a prediction is 'extrapolation'. If the question asks why a prediction might be inaccurate, 'extrapolation' is almost always the key reason.
If a dataset has an extreme anomaly, explain that the median is more reliable than the mean because the mean is 'sensitive to outliers'.
Bivariate data
Data involving two variables where one is believed to influence or have a relationship with the other.
Correlation
A mathematical relationship between two variables, showing how they change together (positive, negative, or no correlation).
Causality
The principle that a change in one variable directly causes the change in another variable; correlation does not prove this.
Line of best fit
A line drawn through the center of data points on a scatter plot to represent the general trend or 'mean path'.
Anomaly
A data point that deviates significantly from the general trend shown by the rest of the dataset.
Interpolation
Estimating an unknown value that falls within the range of existing plotted data points.
Extrapolation
Predicting a value beyond the range of collected data by extending the trend line; often unreliable.
Reliability
The extent to which an investigation would produce the same consistent results if repeated.
Truncated axis
A graph axis that does not start at zero, visually exaggerating differences in data.
Data bias
When a dataset is unrepresentative, often due to poor sampling strategies or small sample sizes.
Mean
The total sum of all values divided by the number of values; highly sensitive to anomalies.
Median
The middle value of a dataset when ordered; more reliable than the mean if anomalies are present.
Range
The difference between the highest and lowest values in a dataset.
Percentage change
A measure of how much a value has increased or decreased relative to its original starting value.
Put your knowledge into practice — try past paper questions for Geography B
Bivariate data
Data involving two variables where one is believed to influence or have a relationship with the other.
Correlation
A mathematical relationship between two variables, showing how they change together (positive, negative, or no correlation).
Causality
The principle that a change in one variable directly causes the change in another variable; correlation does not prove this.
Line of best fit
A line drawn through the center of data points on a scatter plot to represent the general trend or 'mean path'.
Anomaly
A data point that deviates significantly from the general trend shown by the rest of the dataset.
Interpolation
Estimating an unknown value that falls within the range of existing plotted data points.
Extrapolation
Predicting a value beyond the range of collected data by extending the trend line; often unreliable.
Reliability
The extent to which an investigation would produce the same consistent results if repeated.
Truncated axis
A graph axis that does not start at zero, visually exaggerating differences in data.
Data bias
When a dataset is unrepresentative, often due to poor sampling strategies or small sample sizes.
Mean
The total sum of all values divided by the number of values; highly sensitive to anomalies.
Median
The middle value of a dataset when ordered; more reliable than the mean if anomalies are present.
Range
The difference between the highest and lowest values in a dataset.
Percentage change
A measure of how much a value has increased or decreased relative to its original starting value.
| There is a strong negative correlation shown on the scatter plot. | Correlation does not prove causality; the decrease might be caused by attrition rather than just "distance". |