Inter-quartile range, cumulative frequency, box and whisker plots - Higher
If you are studying the higher paper you will need to know the difference between discrete and continuous data, how to plot and interpret histograms, how to calculate inter-quartile ranges, cumulative frequency and box and whisker plots.
Raw data is the information we get when we do a survey. For example, we might have a list of heights or shoe sizes.
Data can either be discrete or continuous.
This data set shows a group of discrete data.
This is called discrete data because the units of measurement (for example, CDs) cannot be split up; there is nothing between 1 CD and 2 CDs.
|Music format||Number sold|
Shoe sizes are a classic example of discrete data, because sizes 39 and 40 mean something, but size 39.2, for example, does not.
The data set shows a group of continuous data.
This data is called continuous because the scale of measurement - distance - has meaning at all points between the numbers given, eg we can travel a distance of 1.2 and 1.85 and even 1.632 miles.
Continuous data can be shown on a number line, and all points on the line have meaning and are different, but with discrete data only certain values have meaning.
|Distance in miles||0.1 0.2 0.6 1.1 1.2 1.8 2.0 2.7 3.4 4.6 6.2 8.0 12.1 14.2|
For each question decide whether the datat set is discrete or continuous.
The heights of pupils in class 3A.
Height is continuous. For example, a pupil could be 152.3cm.
The number of chocolates in various 500g boxes.
The number of chocolates is discrete. There would not be half chocolates in a box.
The times taken for athletes to run 100m.
Time is continuous. For example, an athlete may run 100m in 10.37 seconds.
It is often better to display data in a table. This section will look at different ways to organise data, and revise the following terms:
The table shows the number of people on 12 different buses. Discrete data is normally grouped in the following way:
|Number of people on a bus||4-6||7-9||10-12||13-15|
From the table we can see that there were 7 buses with 7-9 people on them. But we have no way of telling exactly how many people were on each bus.
Now look at the class widths: they are all 3.
The midpoints of the classes are 5, 8, 11 and 14, as shown in red.
There are many ways to represent continuous data in a table.
This table shows the heights (h) of 25 people. The class widths are all 10.
120 h <130
130 h <140
140 h <150
150 h <160
The class boundaries are 120, 130, 140, 150 and 160.
The midpoints of the classes are 125, 135, 145 and 155, shown in red.
This table looks very similar to the table in example 2, and you might assume that the class boundaries are 20.5, 30.5 etc - but they are not! Remember that if you are 16 now, you will be 16 right up until your 17th birthday.
The following table shows the ages of 25 children on a school bus:
If we are going to draw a histogram to represent the data, we first need to find the class boundaries. In this case they are 5, 11, 16 and 18. The class widths are therefore 6, 5 and 2.
The area of a histogram represents the frequency.
The areas of our bars should therefore be 6, 15 and 4.
Remember that in a bar chart the height of the bar represents the frequency. It is therefore correct to label the vertical axis 'frequency'.
However, as in a histogram, it is the area which represents the frequency.
It would therefore be incorrect to label the vertical axis 'frequency' and the label should be 'frequency density'.
So we know that Area = frequency = Frequency density x class width hence:
Frequency density = frequency ÷ class width
Apply this formula to the following question.
The ages of children entering a theme park in a 1-hour period are recorded in the table:
Find the class widths and frequency densities. Then draw a histogram to represent the data.
12/4 = 3
14/7 = 2
48/8 = 6
The histogram should look like this:
We know that the median divides the data into two halves. We also know that for a set of n ordered numbers the median is the (n + 1) ÷ 2 th value.
Similarly, the lower quartile divides the bottom half of the data into two halves, and the upper quartile also divides the upper half of the data into two halves.
Lower quartile is the (n + 1) ÷ 4 th value.
Upper quartile is the 3 (n + 1) ÷ 4 th value.
Find the median, lower quartile and upper quartile for the following data:
11, 4, 6, 8, 3, 10, 8, 10, 4, 12 and 31.
Ordering the data, we get 3, 4, 4, 6, 8, 8,10, 10, 11, 12 and 31.
The median is the (11 + 1) ÷ 2 = 6th value.
The lower quartile is the (11 + 1) ÷ 4 = 3rd value.
The upper quartile is the 3 (11 + 1) ÷ 4 = 9th value.
Therefore, the median is 8, the lower quartile is 4, and the upper quartile is 11.
3, 4, 4, 6, 8, 8, 10, 10, 11, 12, 31
The interquartile range is the difference between the upper quartile and lower quartile.
In this example, the interquartile range is 11 - 4 = 7.
A survey was carried out to find the number of pets owned by each child in a class.
The results are shown in the table:
|Number of pets||Frequency|
Find the interquartile range.
Remember that there is a total of 31 children in the class.
Note that the interquartile range ignores extreme values. The range includes extreme values.
Look at this set of data:
In cases such as these, it is often preferable to use the interquartile range when comparing the data.
The cumulative frequency is obtained by adding up the frequencies as you go along, to give a 'running total'.
The table shows the lengths (in cm) of 32 cucumbers.
Before drawing the cumulative frequency diagram, we need to work out the cumulative frequencies. This is done by adding the frequencies in turn.
|25-28||7||10 (= 3 + 7)|
|29-32||12||22 (= 3 + 7 + 12)|
|33-36||6||28 (= 3 + 7 + 12 + 6)|
|37-40||4||32 (= 3 + 7 + 12 + 6 + 4)|
The points are plotted at the upper class boundary. In this example, the upper class boundaries are 24.5, 28.5, 32.5, 36.5 and 40.5. Cumulative frequency is plotted on the vertical axis.
There are no values below 20.5cm.
Cumulative frequency graphs are always plotted using the highest value in each group of data, (because the table gives you the total that are less than the upper boundary) and the cumulative frequency is always plotted up a graph, as frequency is plotted upwards.
Cumulative frequency diagrams usually have this characteristic S-shape, called an ogive.
When looking at a cumulative frequency curve, you will need to know how to find its median, lower and upper quartiles, and the interquartile range.
By drawing horizontal lines to represent 1/4 of the total frequency, 1/2 of the total frequency and 3/4 of the total frequency, we can read estimates of the lower quartile, median and upper quartile from the horizontal axis.
Quartiles are associated with quarters. The interquartile range is the difference between the lower and upper quartile.
From these values, we can also estimate the interquartile range: 33 - 28 = 5.
Remember to use the total frequency, not the maximum value, on the vertical axis. The values are always read from the horizontal axis.
Click here to play the activity
A box and whisker plot is used to display information about the range, the median and the quartiles. It is usually drawn alongside a number line, as shown:
The oldest person in Mathsminster is 90. The youngest person is 15.
The median age of the residents is 44, the lower quartile is 25, and the upper quartile is 67.
Represent this information with a box-and-whisker plot.
These may not be on your exam as it depends on the exam board your school uses. It is worth checking with your teacher.
This table shows the number of visitors to a seaside town:
If this information is plotted on a graph, it looks like this:
This shows that there is a wide variation in the number of visitors depending on the season. There are far less in the autumn and winter than spring and summer.
However, if we wanted to see a trend in the number of visitors, we could calculate a 4-point moving average.
We do this by finding the average number of visitors in the four quarters of 2005:
Then we find the average number of visitors in the last three quarters of 2005 and first quarter of 2006:
Then the last two quarters of 2005 and the first two quarters of 2006:
And so on...
Note that the last average we can find is for the last two quarters of 2006 and the first two quarters of 2007.
We plot the moving averages on a graph, making sure that each average is plotted at the centre of the four quarters it covers:
We can now see that there is a very slight downward trend in visitors.