Maths

Inter-quartile range, cumulative frequency, box and whisker plots - Higher

If you are studying the higher paper you will need to know the difference between discrete and continuous data, how to plot and interpret histograms, how to calculate inter-quartile ranges, cumulative frequency and box and whisker plots.

Discrete and continuous data

Raw data is the information we get when we do a survey. For example, we might have a list of heights or shoe sizes.

Data can either be discrete or continuous.

Discrete data

This data set shows a group of discrete data.

This is called discrete data because the units of measurement (for example, CDs) cannot be split up; there is nothing between 1 CD and 2 CDs.

Bitesize CD store 31 January 2008

Music formatNumber sold
CD albums140
CD singles70
Downloads55
Vinyl5
Total sales270
shoes

Shoe sizes are a classic example of discrete data, because sizes 39 and 40 mean something, but size 39.2, for example, does not.

Continuous data

The data set shows a group of continuous data.

This data is called continuous because the scale of measurement - distance - has meaning at all points between the numbers given, eg we can travel a distance of 1.2 and 1.85 and even 1.632 miles.

Continuous data can be shown on a number line, and all points on the line have meaning and are different, but with discrete data only certain values have meaning.

Length of journey to work

Distance in miles0.1 0.2 0.6 1.1 1.2 1.8 2.0 2.7 3.4 4.6 6.2 8.0 12.1 14.2

For each question decide whether the datat set is discrete or continuous.

Question

The heights of pupils in class 3A.

toggle answer

Answer

Height is continuous. For example, a pupil could be 152.3cm.

Question

The number of chocolates in various 500g boxes.

toggle answer

Answer

The number of chocolates is discrete. There would not be half chocolates in a box.

Question
athletes running

The times taken for athletes to run 100m.

toggle answer

Answer

Time is continuous. For example, an athlete may run 100m in 10.37 seconds.

Grouping data

It is often better to display data in a table. This section will look at different ways to organise data, and revise the following terms:

  1. Classes are the groups the data is organised into.
  2. Class boundaries are the boundaries between one class and the next.
  3. Each class has an upper class boundary and lower class boundary. This is a practical application of lower and upper bounds that you met in Number.
  4. The class width is the difference between the upper class boundary and lower class boundary.
  5. The midpoint of the class is the middle of the upper and lower class boundaries (i.e, the median).

Grouping discrete data

The table shows the number of people on 12 different buses. Discrete data is normally grouped in the following way:

 

Number of people on a bus4-67-910-1213-15
Frequency2721

From the table we can see that there were 7 buses with 7-9 people on them. But we have no way of telling exactly how many people were on each bus.

Now look at the class widths: they are all 3.

image: class widths

The midpoints of the classes are 5, 8, 11 and 14, as shown in red.

Grouping continuous data

There are many ways to represent continuous data in a table.

Example 1

This table shows the heights (h) of 25 people. The class widths are all 10.

 

  • Height
  • (cm)

120less than or equal to h <130

130less than or equal to h <140

140less than or equal to h <150

150less than or equal to h <160

Frequency46105
image: class widths

The class boundaries are 120, 130, 140, 150 and 160.

The midpoints of the classes are 125, 135, 145 and 155, shown in red.

Example 2

 

Length (cm)110-129130-149150-169170-189
Frequency5311
image: class widths

  • The class boundaries are 109.5, 129.5, 149.5, 169.5 and 189.5.
  • The class widths are all 20.
  • The midpoints of the classes are 119.5, 139.5, 159.5 and 179.5.

Example 3

 

Height (cm)2-45-78-1011-14
Frequency7625
image: class boundaries

  • The class boundaries are 1.5, 4.5, 7.5, 10.5 and 14.5.
  • The class widths are all 3, except the last one which is 4.
  • The midpoints of the classes are 3, 6, 9 and 12.5.

Example 4

 

Age (years)21-3031-4041-5051-60
Frequency81034

This table looks very similar to the table in example 2, and you might assume that the class boundaries are 20.5, 30.5 etc - but they are not! Remember that if you are 16 now, you will be 16 right up until your 17th birthday.

image: class boundaries

  • Therefore, the class boundaries are 21, 31, 41, 51 and 61.
  • The class widths are all 10.
  • The midpoints of the classes are 25.5, 35.5, 45.5 and 55.5.

Histograms

The following table shows the ages of 25 children on a school bus:

 

AgeFrequency
5-106
11-1515
16-174
> 170

If we are going to draw a histogram to represent the data, we first need to find the class boundaries. In this case they are 5, 11, 16 and 18. The class widths are therefore 6, 5 and 2.

The area of a histogram represents the frequency.

The areas of our bars should therefore be 6, 15 and 4.

image: bar graph

Remember that in a bar chart the height of the bar represents the frequency. It is therefore correct to label the vertical axis 'frequency'.

However, as in a histogram, it is the area which represents the frequency.

It would therefore be incorrect to label the vertical axis 'frequency' and the label should be 'frequency density'.

So we know that Area = frequency = Frequency density x class width hence:

Frequency density = frequency ÷ class width

Apply this formula to the following question.

Question

The ages of children entering a theme park in a 1-hour period are recorded in the table:

 

AgeFrequency
0-312
4-1014
11-1848
>180

Find the class widths and frequency densities. Then draw a histogram to represent the data.

toggle answer

Answer
  • Class boundaries:
  • The class boundaries are 0, 4, 11 and 19.
  • (remember that this is age in years):
  • Class widths:
  • The class widths are therefore 4, 7 and 8.

Frequency densities:

12/4 = 3

14/7 = 2

48/8 = 6

The histogram should look like this:

image: bar graph

Interquartile range

We know that the median divides the data into two halves. We also know that for a set of n ordered numbers the median is the (n + 1) ÷ 2 th value.

Similarly, the lower quartile divides the bottom half of the data into two halves, and the upper quartile also divides the upper half of the data into two halves.

Lower quartile is the (n + 1) ÷ 4 th value.

Upper quartile is the 3 (n + 1) ÷ 4 th value.

lower quartile - median - upper quartile

Question

Find the median, lower quartile and upper quartile for the following data:

11, 4, 6, 8, 3, 10, 8, 10, 4, 12 and 31.

toggle answer

Answer

Ordering the data, we get 3, 4, 4, 6, 8, 8,10, 10, 11, 12 and 31.

The median is the (11 + 1) ÷ 2 = 6th value.

The lower quartile is the (11 + 1) ÷ 4 = 3rd value.

The upper quartile is the 3 (11 + 1) ÷ 4 = 9th value.

Therefore, the median is 8, the lower quartile is 4, and the upper quartile is 11.

3, 4, 4, 6, 8, 8, 10, 10, 11, 12, 31

The interquartile range is the difference between the upper quartile and lower quartile.

In this example, the interquartile range is 11 - 4 = 7.

Question
dog

A survey was carried out to find the number of pets owned by each child in a class.

The results are shown in the table:

 

Number of petsFrequency
03
15
22
37
410
53
61
>60

Find the interquartile range.

toggle answer

Answer

3.

Remember that there is a total of 31 children in the class.

  • The lower quartile is the 8th value, which is 1.
  • The upper quartile is the 24th value, which is 4.
  • Therefore, the interquartile range is 4 - 1 = 3.

Note that the interquartile range ignores extreme values. The range includes extreme values.

Look at this set of data:

  • 1, 5, 7, 8, 9, 12, 13, 15, 17, 18, 35.
  • The interquartile range is 17 - 7 = 10.
  • The range is 35 - 1 = 34.

In cases such as these, it is often preferable to use the interquartile range when comparing the data.

Cumulative frequency

The cumulative frequency is obtained by adding up the frequencies as you go along, to give a 'running total'.

Drawing a cumulative frequency diagram

The table shows the lengths (in cm) of 32 cucumbers.

Before drawing the cumulative frequency diagram, we need to work out the cumulative frequencies. This is done by adding the frequencies in turn.

 

LengthFrequencyCumulative Frequency
21-2433
25-28710 (= 3 + 7)
29-321222 (= 3 + 7 + 12)
33-36628 (= 3 + 7 + 12 + 6)
37-40432 (= 3 + 7 + 12 + 6 + 4)

The points are plotted at the upper class boundary. In this example, the upper class boundaries are 24.5, 28.5, 32.5, 36.5 and 40.5. Cumulative frequency is plotted on the vertical axis.

image: cumulative frequency graph,

There are no values below 20.5cm.

Cumulative frequency graphs are always plotted using the highest value in each group of data, (because the table gives you the total that are less than the upper boundary) and the cumulative frequency is always plotted up a graph, as frequency is plotted upwards.

Cumulative frequency diagrams usually have this characteristic S-shape, called an ogive.

Finding the median and quartiles

When looking at a cumulative frequency curve, you will need to know how to find its median, lower and upper quartiles, and the interquartile range.

By drawing horizontal lines to represent 1/4 of the total frequency, 1/2 of the total frequency and 3/4 of the total frequency, we can read estimates of the lower quartile, median and upper quartile from the horizontal axis.

image: cumulative frequency graph

Quartiles are associated with quarters. The interquartile range is the difference between the lower and upper quartile.

From these values, we can also estimate the interquartile range: 33 - 28 = 5.

Remember to use the total frequency, not the maximum value, on the vertical axis. The values are always read from the horizontal axis.

Activity

Click here to play the activity

Box and whisker plots

A box and whisker plot is used to display information about the range, the median and the quartiles. It is usually drawn alongside a number line, as shown:

image: box and whisker plot

Example

The oldest person in Mathsminster is 90. The youngest person is 15.

The median age of the residents is 44, the lower quartile is 25, and the upper quartile is 67.

Represent this information with a box-and-whisker plot.

Solution

image: box and whisker plot

Moving averages

These may not be on your exam as it depends on the exam board your school uses. It is worth checking with your teacher.

This table shows the number of visitors to a seaside town:

 

  • Quarter
  • Year
  • 1
  • 2005
  • 2
  • 2005
  • 3
  • 2005
  • 4
  • 2005
  • 1
  • 2006
  • 2
  • 2006
  • 3
  • 2006
  • 4
  • 2006
Visitors (0000)1424981222117

 

  • Quarter
  • Year
  • 1
  • 2007
  • 2
  • 2007
Visitors (0000)1120

If this information is plotted on a graph, it looks like this:

image: line graph showing data from table

This shows that there is a wide variation in the number of visitors depending on the season. There are far less in the autumn and winter than spring and summer.

However, if we wanted to see a trend in the number of visitors, we could calculate a 4-point moving average.

We do this by finding the average number of visitors in the four quarters of 2005:

14 + 24 + 9 + 8 over 4 = 13.75

Then we find the average number of visitors in the last three quarters of 2005 and first quarter of 2006:

24 + 9 + 8 + 12 over 4 = 13.25

Then the last two quarters of 2005 and the first two quarters of 2006:

9 + 8 + 12 + 22 over 4 = 12.75

And so on...

Note that the last average we can find is for the last two quarters of 2006 and the first two quarters of 2007.

We plot the moving averages on a graph, making sure that each average is plotted at the centre of the four quarters it covers:

image: line graph

We can now see that there is a very slight downward trend in visitors.

Back to Revision Bite