Data is represented in many different forms. Using bar charts, pie charts and frequency diagrams can make information easier to digest.

Part of

**Scatter graphs** are a good way of displaying two sets of data to see if there is a **correlation**, or connection.

Graphs can have: positive correlation; negative correlation; or no correlation.

**Positive correlation** means as one variable increases, so does the other variable. They have a positive connection.

**Negative correlation** means as one variable increases, the other variable decreases. They have a negative connection.

**No correlation** means there is no connection between the two variables.

The number of umbrellas sold and the rainfall (mm) on 9 days is shown on the scatter graph and in the table.

Umbrellas sold | 1 | 10 | 25 | 0 | 1 | 32 | 47 | 8 | 15 |
---|---|---|---|---|---|---|---|---|---|

Rainfall (mm) | 3 | 2 | 4 | 0 | 0 | 5 | 6 | 1 | 1 |

The graph shows that there is a positive correlation between the number of umbrellas sold and the amount of rainfall. On days with higher rainfall, there were a larger number of umbrellas sold.

However, it is important to remember that **correlation does not imply causation**. If data plotted on a scatter graph shows correlation, we cannot assume that the increase in one of the sets of data caused the increase or decrease in the other set of data – it might be coincidence or there may be some other cause that the two sets of data are related to.

A **line of best fit** is a sensible straight line that goes as centrally as possible through the coordinates plotted. It should show the general trend of the relationship between the two sets of data.

The line of best fit for the scatter graph would look like this:

From the diagram above, we can estimate how many umbrellas would be sold for different amounts of rainfall. For example, how many umbrellas would be sold if there was 3mm of rainfall? What if there was 10mm of rainfall?

To estimate the number sold for 3mm of rainfall, we use a process called **interpolation**. The value of 3mm is within the range of data values that were used to draw the scatter graph.

Find where 3 mm of rainfall is on the graph. Draw a line by going across from 3 mm and then down.

An estimate of 19 umbrellas would be sold if there was 3 mm of rainfall.

If there was 10mm of rainfall, we could extend the graph and the line of best fit to read off the number of umbrellas sold. This gives a value of approximately 64 umbrellas sold.

This process is called **extrapolation**, because the value we are using is outside the range of data used to draw the scatter graph. Since 10mm is much higher than the highest rainfall recorded, we cannot assume that the line of best fit would still follow the pattern when the rainfall is 10mm, so the value of 64 umbrellas is not a reliable estimate.