Higher – Cumulative frequency and box plots

Part ofMathsStatistics

Key points about cumulative frequency and box plots

Bullet points represented by lightbulbs
  • Cumulative frequency graphs and box plots are two methods used to display .

  • A cumulative frequency graph is used to estimate the and calculate the . The interquartile range is calculated by finding the difference between the and .

  • Box plots present the same data in a different way and can be used to make comparisons between two similar datasets.

Make sure you are confident in finding the median for a list of when working with cumulative frequency graphs and box plots.

Back to top

How to construct and interpret a cumulative frequency graph

A cumulative frequency graph showing the distance between home and college
Image caption,
A cumulative frequency graph often has a characteristic S shape.

Data is required to produce a cumulative frequency graph. The data should be provided in the form of a .

Creating a cumulative frequency graph

  1. Add an additional column, if not provided, to the grouped frequency table. Label the column ‘Cumulative frequency’.

  2. The cumulative frequency is an increasing total of all the frequencies. For each cell, add up the previous frequencies. The final value should match the total frequency.

A cumulative frequency graph showing the distance between home and college
Image caption,
A cumulative frequency graph often has a characteristic S shape.
  1. Draw a horizontal axis. This should be a continuous number line. The values used to label the axis should match the numbers from the class intervals of the groups. Decide if you need to use a .

  2. Draw a vertical axis. The vertical axis is always cumulative frequency. Choose an appropriate scale for this axis which includes the highest value in the cumulative frequency column.

  3. Plot each data point. Each data point is plotted by using the value at the end of the class interval and the cumulative frequency.

  4. The graph can be completed in one of two ways:

    i. Join each consecutive point with straight lines. In this case this graph is called a cumulative frequency polygon.

    ii. Draw a smooth curve passing through each point. In this case this graph is called a cumulative frequency curve.

  5. Check you have labelled each axis correctly and give your cumulative frequency graph a title.

Follow the worked example below

GCSE exam-style questions

Icon representing a multiple-choice question with answers A, B and C
  1. The cumulative frequency graph shows information about the ages of 60 cooking club members.

Use the graph to estimate the median age of the members.

A cumulative frequency graph titled ‘A cumulative frequency graph to show the ages of cooking club members’. The horizontal axis is labelled ‘Age (years)’ and marked at 0, 30, 40, 50, 60, 70 and 80. The vertical axis is labelled ‘Cumulative frequency’ with values 0, 10, 20, 30, 40, 50 and 60. An orange cumulative frequency curve begins near age 20 with a small rise, then increases gradually through the 30s and 40s. It becomes much steeper between ages 45 and 65. The curve then flattens between ages 65 and 75, ending at a cumulative frequency of about 60.

  1. The cumulative frequency graph shows information about the ages of 60 cooking club members.

Use the graph to estimate the interquartile range of the members.

A cumulative frequency graph titled ‘A cumulative frequency graph to show the ages of cooking club members’. The horizontal axis is labelled ‘Age (years)’ and marked at 0, 30, 40, 50, 60, 70 and 80. The vertical axis is labelled ‘Cumulative frequency’ with values 0, 10, 20, 30, 40, 50 and 60. An orange cumulative frequency curve begins near age 20 with a small rise, then increases gradually through the 30s and 40s. It becomes much steeper between ages 45 and 65. The curve then flattens between ages 65 and 75, ending at a cumulative frequency of about 60.

  1. This cumulative frequency graph shows information about the distance between the home and college of 100 students.

Use the graph to estimate the percentage of students that live more than 14 miles from their college.

A cumulative frequency graph titled ‘A cumulative frequency graph to show the distance between home and college’. The horizontal axis is labelled ‘Miles’ and marked at 0, 4, 8, 12, 16, 20 and 24. The vertical axis is labelled ‘Cumulative frequency’ with values 0, 20, 40, 60, 80 and 100. An orange cumulative frequency curve begins close to (0, 0), rises slowly until around 8 miles, becomes steep between about 8 and 16 miles, and then flattens between 16 and 24 miles, finishing at a cumulative frequency slightly above 100.

Back to top

How to construct a box plot

A box plot (or box and whisker diagram) is another way of presenting the distribution of a continuous data set. In addition to showing the median and quartiles, it also displays the smallest and largest values.

The length of the box, which represents the , shows how spread the central data is.

The horizontal scale below the box plot allows for the values to be accurately recorded or read.

  1. Create a box plot from a list of raw data by ordering the numbers and identifying the median and quartiles.

  2. Find the positions of the quartiles and median using the following formulae, where \(𝑛\) is the number of pieces of data:

  • Lower quartile = \( \frac{1}{4} (𝑛 + 1) \)
  • Median = \( \frac{1}{2} (𝑛 + 1) \)
  • Upper quartile = \( \frac{3}{4} (𝑛 + 1) \)

Follow the worked example below

GCSE exam-style questions

Icon representing a multiple-choice question with answers A, B and C
  1. The table shows information about the cost of lunches (£) of pupils at a school.

Draw a box plot to show this information.

A table with five purple column headers and white cells beneath them. The headers, from left to right, read: ‘Smallest value’, ‘Lower quartile’, ‘Median’, ‘Upper quartile’, and ‘Largest value’. The values in the row below are: ‘2·80’, ‘3·20’, ‘3·60’, ‘3·90’, and ‘4·80’.

  1. The masses (kg) of 11 sofas are recorded below.

49, 56, 57, 63, 67, 67, 71, 75, 81, 82, 93

Draw a box plot for this information.

A list of masses is shown at the top of the image. The values are: 49, 56, 57, 63, 67, 67, 71, 75, 81, 82, 93. Below the list is a blank horizontal axis labelled ‘Mass (kg)’. The scale runs from 40 to 100, marked at intervals of 10: 40, 50, 60, 70, 80, 90 and 100. No box plot or data points are drawn yet.

  1. The cumulative frequency graph shows information about the distance between home and college of 100 students.

The closest student lives 3·6 miles from college.

The furthest student lives 20·8 miles from college.

Using the graph, create a box plot for this information.

A cumulative frequency graph titled ‘A cumulative frequency graph to show the distances between home and college’. On the top graph, the horizontal axis is labelled ‘Miles’ and marked from 0 to 24 at intervals of 2. The vertical axis on the left is labelled ‘Cumulative frequency’ and marked from 0 to 100 at intervals of 20. An orange cumulative frequency curve begins at (0, 0), rises slowly until about 8 miles, becomes steep between 8 and 16 miles, and then flattens between 16 and 24 miles, ending just below 100. Below this, a second horizontal axis is shown for reference. It is also labelled ‘Miles’ and marked from 0 to 24 at intervals of 2, but with no curve drawn.

Back to top

Check your understanding

Back to top

Using the median and interquartile range to make comparisons

For two sets of similar data, it is possible to draw two box plots on the same set of axes.

Comparisons can be made between the data, often by using the median and interquartile range.

For example, when comparing the data between the prices of flights that two airlines offer, a higher median would mean, on average, that the flights are more expensive. A smaller IQR means there is more consistency in the variation of prices.

When making a comparison, always quote numerical values and explain what they mean.

Find out more about making comparisons below

GCSE exam-style questions

Icon representing a multiple-choice question with answers A, B and C
  1. The box plots show the annual salaries of the workforce at two companies, White’s and Underwood’s.

On average, which company pays the higher salary?

A box plot comparison titled ‘A box plot to show the annual salaries for two companies’ is shown on a blue square grid background. The horizontal axis at the bottom is labelled ‘Salary (£1000’s)’ and marked at 8, 10, 12, 14, 16, 18, 20 and 22. Two box plots appear: Top box plot (White’s): This box plot is drawn in orange. • The left whisker extends from £10,200 to the lower quartile at £11,800. • The box spans from £11,800 to the median line at £14,200. • The box spans from £14,200 to the upper quartile at £15,600 • The right whisker extends from £15,600 to the largest value at £19,800. Bottom box plot (Underwood's): This box plot is drawn in blue. • The left whisker extends from £10,600 to the lower quartile at £13,400. • The box spans from £13,400 to the median line at £15,000. • The box spans from £15,000 to the upper quartile at £15,600 • The right whisker extends from £15,600 to the largest value at £18,400.

  1. The box plots show the annual salaries of the workforce at two companies, White’s and Underwood’s.

Which company is more consistent with its salaries?

A box plot comparison titled ‘A box plot to show the annual salaries for two companies’ is shown on a blue square grid background. The horizontal axis at the bottom is labelled ‘Salary (£1000’s)’ and marked at 8, 10, 12, 14, 16, 18, 20 and 22. Two box plots appear: Top box plot (White’s): This box plot is drawn in orange. • The left whisker extends from £10,200 to the lower quartile at £11,800. • The box spans from £11,800 to the median line at £14,200. • The box spans from £14,200 to the upper quartile at £15,600 • The right whisker extends from £15,600 to the largest value at £19,800. Bottom box plot (Underwood's): This box plot is drawn in blue. • The left whisker extends from £10,600 to the lower quartile at £13,400. • The box spans from £13,400 to the median line at £15,000. • The box spans from £15,000 to the upper quartile at £15,600 • The right whisker extends from £15,600 to the largest value at £18,400.

  1. The box plots show the waiting times of patients for two doctors.

Which doctor is more consistent with their waiting times?

A box plot diagram titled ‘A box plot to show the waiting times of patients for two doctors’. Two box plots appear one above the other, both using a horizontal axis labelled ‘Minutes’, marked from 0 to 35. The upper box plot, shown in orange, is labelled ‘Dr Woods’ to the right. The left whisker begins at 2. The box stretches from 8 to 18, with a median line at 11.5. The right whisker extends to 22. The lower box plot, shown in blue, is labelled ‘Dr Goldberg’ to the right. The left whisker begins at 3. The box spans from 10 to 16.5, with a median line at 13.5. The right whisker extends to 23.

Back to top

Quiz – Cumulative frequency and box plots

Practise what you've learned about cumulative frequency and box plots with this quiz.

Now you've revised cumulative frequency and box plots, why not look at tree diagrams?

Back to top

More on Statistics

Find out more by working through a topic