Every time you check a weather app, you are reading graphs chosen specifically for continuous data like temperature or discrete data like the number of rainy days. Selecting the correct representation for univariate data (data involving a single variable) depends entirely on the type of data you have.
Discrete data can only take specific, separate values (e.g., shoe size, number of children). Continuous data can take any value within a range and is usually measured (e.g., time, mass, height).
When graphing these datasets, Edexcel requires specific formats:
To draw a frequency polygon, points are plotted at the midpoint of the class interval. The midpoint is found by adding the lower and upper bounds and dividing by two: .
These points are joined with straight lines to create an "open" polygon. You should not join the first point to the last point, nor to the x-axis, unless the frequency is exactly 0 at that boundary.
In contrast, cumulative frequency graphs are plotted at the upper boundary of the class interval. The points are joined with a smooth, S-shaped curve (ogive) rather than straight lines.
Why do some bar-style graphs have unequal widths? For continuous data with unequal groups, we use histograms where the area of the bar represents the frequency, rather than the height.
The height of the bar in a histogram represents the frequency density, and the vertical axis must be explicitly labelled with this term.
Worked Example: Finding Frequency Density
Calculate the frequency density for the class interval with a frequency of 12.
Step 1: Calculate the class width.
Step 2: Substitute into the formula.
Step 3: Calculate the final answer.
You can list a hundred data points, but a box plot summarizes the entire spread of the data using just five numbers. This is known as the five-number summary.
To construct a box plot, you need:
Here is an approximate diagram of a box plot showing these key features:
Lower Quartile Upper Quartile
Minimum (Q1) Median (Q3) Maximum
|--------------+---------------|-----------------+---------------|
| |
+---------------------------------+
IQR
A rectangular box is drawn from the lower quartile to the upper quartile, showing the middle 50% of the data. A vertical line is drawn inside the box at the median, and straight "whiskers" extend from the box outwards to the minimum and maximum values.
If you are extracting these values from a cumulative frequency curve with a total frequency of , the median is read at , the lower quartile at , and the upper quartile at . For discrete data, the median position is calculated as .
Sometimes the highest or lowest values in a dataset do not fit the general pattern and are classified as an outlier. Edexcel commonly uses the "1.5 IQR" rule to identify them, where the interquartile range (IQR) is calculated as .
A value is an outlier if it falls outside these boundaries:
On a box plot, outliers are marked with a distinct cross (X). If an outlier is present, the whisker must end at the smallest or largest value in the dataset that is not an outlier, rather than extending to the outlier boundary itself.
The position of the median inside the box also indicates skewness:
When asked to compare two distributions of data, Edexcel mark schemes strictly require exactly two distinct points of comparison. You must interpret these comparisons in the context of the original problem.
Use the following paired structure to ensure you secure full marks:
| Comparison Point | Metric to Use | What It Means in Context |
|---|---|---|
| Central Tendency (Average) | Compare the Medians. Do not use the mean when comparing box plots. | States which group was higher/lower on average. Example: "The median for Class A (15) is higher than Class B (12), so on average, Class A scored more marks." |
| Spread (Consistency) | Compare the Interquartile Range (IQR) or the Range. | A smaller IQR/Range means the data is more consistent. Example: "The IQR for Class A (4) is lower than Class B (7), so Class A's marks were more consistent." |
Students often lose marks by plotting cumulative frequency at the midpoint of the interval. It must always be plotted at the upper boundary.
In 'Compare' questions, simply listing the values of the median or IQR will score 0 marks. You must use comparative words like 'higher than' or 'smaller than' alongside the specific numbers.
When asked to compare the spread, always explicitly state that a smaller IQR or range means the data is 'more consistent' or has 'less variation'. Avoid vague terms like 'better' or 'worse'.
For Higher Tier histograms, you will often be awarded a specific B1 or C1 mark just for correctly labeling the y-axis as 'Frequency Density'.
Edexcel mark schemes generally require a smooth curve for cumulative frequency graphs. Drawing straight lines between the points with a ruler may result in losing marks.
Univariate data
Data that involves a single variable, such as the heights of a group of students.
Discrete data
Data that can only take specific, separate values, such as shoe sizes or the number of children in a family.
Continuous data
Data that can take any value within a range and is usually measured, such as time, mass, or height.
Histogram
A graphical representation of continuous grouped data where the area of the bar represents the frequency and there are no gaps between bars.
Frequency density
The height of a bar in a histogram, calculated by dividing the frequency by the class width.
Class interval
The range of values defining a specific group in a grouped frequency table.
Midpoint
The exact halfway value of a class interval, used for plotting frequency polygons.
Cumulative frequency
A running total of frequencies, plotted at the upper boundary of class intervals to form a smooth S-shaped curve.
Box plot
A diagram that summarizes the distribution of a dataset using a five-number summary.
Five-number summary
The five key values needed to draw a box plot: minimum, lower quartile, median, upper quartile, and maximum.
Lower quartile (Q₁)
The value that is one-quarter of the way through an ordered dataset.
Median
The middle value of an ordered dataset, representing the central tendency.
Upper quartile (Q₃)
The value that is three-quarters of the way through an ordered dataset.
Interquartile range (IQR)
A measure of spread representing the middle 50% of the data, calculated by subtracting the lower quartile from the upper quartile.
Spread
A measure of how scattered or varied the data points are, typically described using the range or interquartile range.
Outlier
An extreme value that does not fit the general pattern of a dataset.
Skewness
A measure of the asymmetry of a distribution, visually indicated on a box plot by the position of the median relative to the quartiles.
Put your knowledge into practice — try past paper questions for Mathematics
Univariate data
Data that involves a single variable, such as the heights of a group of students.
Discrete data
Data that can only take specific, separate values, such as shoe sizes or the number of children in a family.
Continuous data
Data that can take any value within a range and is usually measured, such as time, mass, or height.
Histogram
A graphical representation of continuous grouped data where the area of the bar represents the frequency and there are no gaps between bars.
Frequency density
The height of a bar in a histogram, calculated by dividing the frequency by the class width.
Class interval
The range of values defining a specific group in a grouped frequency table.
Midpoint
The exact halfway value of a class interval, used for plotting frequency polygons.
Cumulative frequency
A running total of frequencies, plotted at the upper boundary of class intervals to form a smooth S-shaped curve.
Box plot
A diagram that summarizes the distribution of a dataset using a five-number summary.
Five-number summary
The five key values needed to draw a box plot: minimum, lower quartile, median, upper quartile, and maximum.
Lower quartile (Q₁)
The value that is one-quarter of the way through an ordered dataset.
Median
The middle value of an ordered dataset, representing the central tendency.
Upper quartile (Q₃)
The value that is three-quarters of the way through an ordered dataset.
Interquartile range (IQR)
A measure of spread representing the middle 50% of the data, calculated by subtracting the lower quartile from the upper quartile.
Spread
A measure of how scattered or varied the data points are, typically described using the range or interquartile range.
Outlier
An extreme value that does not fit the general pattern of a dataset.
Skewness
A measure of the asymmetry of a distribution, visually indicated on a box plot by the position of the median relative to the quartiles.