Populations and Samples

OCR GCSE Mathematics (J560) — Statistics

Populations, Samples, and Censuses

Every time you watch a television show, broadcast networks decide whether to renew or cancel it by tracking the viewing habits of just a few thousand homes, rather than the whole country.

The is the entire group of people, objects, or items that are being studied or from which data could be collected.
A is a selection of items taken from the that is used to represent the whole.
A is a survey or investigation that includes every single member of the .

A provides 100% accurate results but is often impractical due to high costs, excessive time requirements, or situations where testing destroys the item (such as measuring the lifespan of batteries). Because of this, we use . For a to be reliable, it must be a , meaning it shares the exact characteristics and proportions of the . Larger are always more reliable; OCR mark schemes generally consider of fewer than 30 items to be unreliable due to natural variation.

Simple Random Sampling

If a teacher asks for volunteers to answer a question, the eager students who raise their hands do not represent the whole class's level of understanding.

is a sampling method where every member of the has an equal probability of being chosen.
This method is specifically designed to prevent , where the person picking the might subconsciously or purposefully choose certain types of members.

To carry out a simple random , you must use a rigorous methodology:

Obtain a , which is a complete, numbered list of every member of the .
Assign a unique number to every member on the list (for example, from $1$ to $N$ ).
Use a (such as the $\text{RanInt}$ function on a scientific calculator) or a random number table to generate the required amount of numbers.
Crucially, ignore any duplicate numbers generated and continue until your full size is reached.
Match the randomly selected numbers back to the names or items in your to form your .

Understanding Sampling Bias

A restaurant asking lunchtime diners to rate their new evening menu will quickly find that their feedback is heavily skewed and not useful.

is a systematic error where the sampling method results in certain members of a being more or less likely to be selected than others.

When evaluating or criticising a data collection method, look for these common sources of bias:

: Sampling in only one place, which excludes people who never go there (e.g., asking about fitness habits outside a gym).
: Sampling at only one specific time, missing people who work or shop on different schedules (e.g., interviewing people at 10 AM on a Tuesday).
: Using volunteers for an online poll or radio phone-in, which disproportionately attracts people with extreme or very strong opinions.
: Choosing the easiest people to reach, such as the first 10 people to walk past a shop, which does not give every member of the an equal chance of selection.

Inferring Population Properties

Knowing that 6 out of 30 surveyed teenagers prefer a specific clothing brand allows retailers to predict how many shirts to ship to a city of a million people.

If a is representative, the proportions found within it can be scaled up to estimate properties of the whole . This process is known as .
You can use a to estimate totals or means.

$\text{Estimated Population Frequency} = \frac{\text{Frequency in Sample}}{\text{Sample Size}} \times \text{Total Population Size}$

Worked Example

In a random of 60 gym members, 18 said they use the swimming pool. The gym has 1,500 members in total. Estimate the total number of members who use the swimming pool.

Step 1: Identify the frequency, size, and total .

Frequency in = $18$
size = $60$
Total = $1500$

Step 2: Substitute the values into the scaling formula.

$\text{Estimated Total} = \frac{18}{60} \times 1500$

Step 3: Calculate the final answer.

$\text{Estimated Total} = 0.3 \times 1500 = 450$ members

(Note: If your final estimated frequency is a decimal, you must round it to the nearest whole number, as you cannot have a fraction of a person or item).

Exam Tips

Common Mistake
When describing how to take a simple random sample, students often forget to write 'ignore any duplicate numbers' — this is frequently a required marking point in OCR exams.
2
When asked to 'criticise a sampling method', do not just say 'it is biased'. You must state specifically WHY it is biased (e.g., 'it was only conducted at 10 AM, which misses people who work').
3
In scaling calculations estimating population totals, always round your final answer to the nearest whole number if it represents discrete items like people, animals, or cars.
4
Examiners frequently look for the exact term 'sampling frame' when you are asked how to set up a sample from a large population — use it instead of just saying 'a list'.

Key Terms(15)

Population: The entire group of people, objects, or items that are being studied or from which data could be collected.
Sample: A selection of items taken from the population that is used to represent the whole.
Census: A survey or investigation that includes every single member of the population.
Representative sample: A sample that shares the exact characteristics and proportions of the population, allowing for accurate inferences.
Simple random sampling: A sampling method where every member of the population has an equal probability of being chosen.
Selection bias: A systematic error where the person picking the sample subconsciously or purposefully chooses certain types of members.
Sampling frame: A complete, numbered list of every member or item in the population from which a sample is selected.
Random number generator: A tool, such as a calculator function or computer program, used to produce numbers entirely by chance to ensure unbiased selection.
Sampling bias: A systematic error where the sampling method results in certain members of a population being more or less likely to be selected than others.
Location bias: A bias occurring when data is collected from only one specific place, excluding parts of the population who do not visit that location.
Time bias: A bias occurring when data is collected at a specific time, excluding members of the population who are unavailable at that time.
Self-selection bias: A bias that occurs when participants volunteer themselves for a study, often resulting in a sample with over-represented extreme opinions.
Convenience bias: A bias where the sample is chosen based on ease of access rather than random selection.
Statistical inference: The process of using data collected from a representative sample to make estimates or predictions about the whole population.
Scaling factor: A multiplier used to scale up the proportions found in a sample to estimate totals for the entire population.

Previous Section: ProbabilityConditional Probability Next Topic: Data RepresentationCharts for Categorical and Discrete Data

Back to Populations and Samples

Put your knowledge into practice — try past paper questions for Mathematics

Key Terms(15)

Population: The entire group of people, objects, or items that are being studied or from which data could be collected.
Sample: A selection of items taken from the population that is used to represent the whole.
Census: A survey or investigation that includes every single member of the population.
Representative sample: A sample that shares the exact characteristics and proportions of the population, allowing for accurate inferences.
Simple random sampling: A sampling method where every member of the population has an equal probability of being chosen.
Selection bias: A systematic error where the person picking the sample subconsciously or purposefully chooses certain types of members.
Sampling frame: A complete, numbered list of every member or item in the population from which a sample is selected.
Random number generator: A tool, such as a calculator function or computer program, used to produce numbers entirely by chance to ensure unbiased selection.
Sampling bias: A systematic error where the sampling method results in certain members of a population being more or less likely to be selected than others.
Location bias: A bias occurring when data is collected from only one specific place, excluding parts of the population who do not visit that location.
Time bias: A bias occurring when data is collected at a specific time, excluding members of the population who are unavailable at that time.
Self-selection bias: A bias that occurs when participants volunteer themselves for a study, often resulting in a sample with over-represented extreme opinions.
Convenience bias: A bias where the sample is chosen based on ease of access rather than random selection.
Statistical inference: The process of using data collected from a representative sample to make estimates or predictions about the whole population.
Scaling factor: A multiplier used to scale up the proportions found in a sample to estimate totals for the entire population.

Populations and Samples

OCR GCSE Mathematics (J560) — Statistics

Populations, Samples, and Censuses

Every time you watch a television show, broadcast networks decide whether to renew or cancel it by tracking the viewing habits of just a few thousand homes, rather than the whole country.

The is the entire group of people, objects, or items that are being studied or from which data could be collected.
A is a selection of items taken from the that is used to represent the whole.
A is a survey or investigation that includes every single member of the .

Simple Random Sampling

If a teacher asks for volunteers to answer a question, the eager students who raise their hands do not represent the whole class's level of understanding.

is a sampling method where every member of the has an equal probability of being chosen.
This method is specifically designed to prevent , where the person picking the might subconsciously or purposefully choose certain types of members.

To carry out a simple random , you must use a rigorous methodology:

Obtain a , which is a complete, numbered list of every member of the .
Assign a unique number to every member on the list (for example, from $1$ to $N$ ).
Use a (such as the $\text{RanInt}$ function on a scientific calculator) or a random number table to generate the required amount of numbers.
Crucially, ignore any duplicate numbers generated and continue until your full size is reached.
Match the randomly selected numbers back to the names or items in your to form your .

Understanding Sampling Bias

A restaurant asking lunchtime diners to rate their new evening menu will quickly find that their feedback is heavily skewed and not useful.

is a systematic error where the sampling method results in certain members of a being more or less likely to be selected than others.

When evaluating or criticising a data collection method, look for these common sources of bias:

: Sampling in only one place, which excludes people who never go there (e.g., asking about fitness habits outside a gym).
: Sampling at only one specific time, missing people who work or shop on different schedules (e.g., interviewing people at 10 AM on a Tuesday).
: Using volunteers for an online poll or radio phone-in, which disproportionately attracts people with extreme or very strong opinions.
: Choosing the easiest people to reach, such as the first 10 people to walk past a shop, which does not give every member of the an equal chance of selection.

Inferring Population Properties

Knowing that 6 out of 30 surveyed teenagers prefer a specific clothing brand allows retailers to predict how many shirts to ship to a city of a million people.

If a is representative, the proportions found within it can be scaled up to estimate properties of the whole . This process is known as .
You can use a to estimate totals or means.

$\text{Estimated Population Frequency} = \frac{\text{Frequency in Sample}}{\text{Sample Size}} \times \text{Total Population Size}$

Worked Example

In a random of 60 gym members, 18 said they use the swimming pool. The gym has 1,500 members in total. Estimate the total number of members who use the swimming pool.

Step 1: Identify the frequency, size, and total .

Frequency in = $18$
size = $60$
Total = $1500$

Step 2: Substitute the values into the scaling formula.

$\text{Estimated Total} = \frac{18}{60} \times 1500$

Step 3: Calculate the final answer.

$\text{Estimated Total} = 0.3 \times 1500 = 450$ members

(Note: If your final estimated frequency is a decimal, you must round it to the nearest whole number, as you cannot have a fraction of a person or item).

Exam Tips

Common Mistake
When describing how to take a simple random sample, students often forget to write 'ignore any duplicate numbers' — this is frequently a required marking point in OCR exams.
2
When asked to 'criticise a sampling method', do not just say 'it is biased'. You must state specifically WHY it is biased (e.g., 'it was only conducted at 10 AM, which misses people who work').
3
In scaling calculations estimating population totals, always round your final answer to the nearest whole number if it represents discrete items like people, animals, or cars.
4
Examiners frequently look for the exact term 'sampling frame' when you are asked how to set up a sample from a large population — use it instead of just saying 'a list'.

Key Terms(15)

Population: The entire group of people, objects, or items that are being studied or from which data could be collected.
Sample: A selection of items taken from the population that is used to represent the whole.
Census: A survey or investigation that includes every single member of the population.
Representative sample: A sample that shares the exact characteristics and proportions of the population, allowing for accurate inferences.
Simple random sampling: A sampling method where every member of the population has an equal probability of being chosen.
Selection bias: A systematic error where the person picking the sample subconsciously or purposefully chooses certain types of members.
Sampling frame: A complete, numbered list of every member or item in the population from which a sample is selected.
Random number generator: A tool, such as a calculator function or computer program, used to produce numbers entirely by chance to ensure unbiased selection.
Sampling bias: A systematic error where the sampling method results in certain members of a population being more or less likely to be selected than others.
Location bias: A bias occurring when data is collected from only one specific place, excluding parts of the population who do not visit that location.
Time bias: A bias occurring when data is collected at a specific time, excluding members of the population who are unavailable at that time.
Self-selection bias: A bias that occurs when participants volunteer themselves for a study, often resulting in a sample with over-represented extreme opinions.
Convenience bias: A bias where the sample is chosen based on ease of access rather than random selection.
Statistical inference: The process of using data collected from a representative sample to make estimates or predictions about the whole population.
Scaling factor: A multiplier used to scale up the proportions found in a sample to estimate totals for the entire population.

Previous Section: ProbabilityConditional Probability Next Topic: Data RepresentationCharts for Categorical and Discrete Data

Back to Populations and Samples

Put your knowledge into practice — try past paper questions for Mathematics

Key Terms(15)

Population: The entire group of people, objects, or items that are being studied or from which data could be collected.
Sample: A selection of items taken from the population that is used to represent the whole.
Census: A survey or investigation that includes every single member of the population.
Representative sample: A sample that shares the exact characteristics and proportions of the population, allowing for accurate inferences.
Simple random sampling: A sampling method where every member of the population has an equal probability of being chosen.
Selection bias: A systematic error where the person picking the sample subconsciously or purposefully chooses certain types of members.
Sampling frame: A complete, numbered list of every member or item in the population from which a sample is selected.
Random number generator: A tool, such as a calculator function or computer program, used to produce numbers entirely by chance to ensure unbiased selection.
Sampling bias: A systematic error where the sampling method results in certain members of a population being more or less likely to be selected than others.
Location bias: A bias occurring when data is collected from only one specific place, excluding parts of the population who do not visit that location.
Time bias: A bias occurring when data is collected at a specific time, excluding members of the population who are unavailable at that time.
Self-selection bias: A bias that occurs when participants volunteer themselves for a study, often resulting in a sample with over-represented extreme opinions.
Convenience bias: A bias where the sample is chosen based on ease of access rather than random selection.
Statistical inference: The process of using data collected from a representative sample to make estimates or predictions about the whole population.
Scaling factor: A multiplier used to scale up the proportions found in a sample to estimate totals for the entire population.