Viewpoint: Challenger and the misunderstanding of risk

Published

In the past week, the 30th anniversary of the Challenger shuttle disaster has been marked with tributes for the sacrifice of the crew. In the investigation that came after the tragedy, the brilliant physicist Richard Feynman identified a culture at Nasa where risk was not understood, writes mathematician Dr John Moriarty.

The Challenger was lost because one small part - an O-ring seal - failed during a launch in cold weather. The possibility of this part failing had been predicted long before, but Nasa managers chose to ignore the concerns.

The issue of "safety factors" was at the heart of the problem.

Suppose I tell you that a bridge you are about to drive over has a "safety factor of 2". Would you be reassured and put your foot down?

Perhaps you might hit the brake instead, and ask: "A factor of 2 with respect to what"? If the bridge was built twice as strong as is necessary for normal traffic, this sounds fine.

But if this "factor of 2" means something else - that only half of the essential structural components of the bridge will break under normal traffic conditions, you might opt for an alternative crossing.

Before Challenger's final mission - listed as 51-L - it was known that the O-ring seals on a previous flight - 51-C - had eroded to a depth of one-third of their radius. Instead of regarding this little-understood problem as an unacceptable risk, it was interpreted as a "a safety factor of three." Should the interpretation have been more pessimistic?

Pessimists, typically, are not popular people. The rest of us don't want to hear about bad things when they might not even happen.

Risk is about things that haven't actually happened. We can study the past of course, but - as the financial services industry constantly reminds us - past performance is not necessarily a guide to the future.

The thing is, the pessimists are actually right. To prove this we just need a monkey and a typewriter. Given enough time and paper, it's technically possible for a monkey to type out the complete works of Shakespeare. They simply need to press the keys in the right order.

The probability of this is very small of course, but it is not zero. It is possible to prove mathematically that if something - anything - is possible, and if it is tried repeatedly and independently, it is guaranteed to happen eventually.

Not that it is possible, nor even likely - it is guaranteed.

Richard Feynman, the great American physicist, wrote a detailed report on risk following the Space Shuttle Challenger disaster, entitled Personal Observations on the Reliability of the Shuttle.

His closing phrase was powerful: "Nature cannot be fooled."

One may argue whether the abstracted world of mathematics really corresponds to nature. When it comes to risk, though, we have just proved a long-term version of Murphy's Law - what can go wrong, will go wrong eventually. Just give it time.

Put like this, risk seems to be a depressing business. Properly understood though, risk is a useful tool - it can be good. If you've ever bought a standby theatre ticket, for example, you have traded a cheaper price for the risk of missing your preferred show and time.

Accepting risk can make our goals achievable. Of course nobody thought that taking humankind into space would be risk-free. The mathematical understanding that we cannot stop bad things happening leads us naturally to a much more important and helpful question - how much risk are we willing to accept?

This is the question asked of us by peddlers of investments. We must think hypothetically - if the value of our shiny new investment was to suddenly halve, they ask, would we be able to sleep at night? When it comes to investments we can retrospectively compare the performance of our chosen instrument relative with other choices we could have made. In general, though, such comparisons are not available.

If the cautionary notes of the pessimists help us to avoid bad things, can they really claim credit because of what didn't happen? How can we possibly prove that being more careful was worthwhile, or worth the extra inconvenience or expense?

According to Feynman, there may have been a shortage of such pessimists at the management levels of Nasa prior to the Challenger tragedy.

In his account, much of the reasoning about risk at Nasa effectively took the form that, if disaster hadn't happened yet, it probably wouldn't happen next time either.

As he points out, we only have to think of a game of Russian roulette to see the problem with that reasoning. Instead Feynman recommended looking for warning signs. In Challenger's case the O-rings were known to corrode and this warning was arguably not given sufficient weight.

Another key criticism was that much of the Shuttle was designed "top down", that is in near-final form, rather than design evolving incrementally - part by part - as engineering usually does.

Richard Feynman 1918-88

The argument here is that Nasa introduced too much innovation all at the same time, thus obscuring problems with individual new components. The implication is that as a result some warning signs were not seen at all.

In the context of the final years of the space race between the US and USSR for supremacy in spaceflight capability, the proper investigation of all warnings was inevitably balanced against speed of progress.

From the perspective of shuttle disasters, a main failure was therefore arguably to address the key question of how much risk was acceptable.

According to Feynman's report this issue was clouded by sharp differences between the risk assessments of engineers "on the ground" and those of managers. While some engineers put the chances of disaster around one in 100, some managers thought them to be closer to one in 100,000.

Given that the space shuttle Columbia was also lost 17 years later, it would appear that the engineers may have been closer to the truth. It stands to reason that the engineers at least had a more informed view of potential warning signs and their implications for risk.

With predictions about the future being so difficult, there is a good argument that nobody ever has an exact picture of risk.

Statistician George Box once said: "All models are wrong but some are useful."

All models are wrong because they merely approximate the truth - this is why events such as flooding, accidents at nuclear reactors, and financial crises seem to happen much more often than they should.

The idea of a good risk model is not to be correct, but to be useful.

Usually we cannot show a claim about future risks to be true or false. And since the event we care about will either happen or not happen, even hindsight may not allow us to properly judge such claims. For this reason some suggest that underplayed models of risk can be very useful for public relations or political purposes.

But as the Challenger disaster showed, this is not real usefulness. To be useful, models of risk must be honest.

They should not simply look backwards and say that nothing bad has happened as yet. They should take into account warnings like climate change, financial products that many trade but few understand, or corroded O-rings.

We may not want to listen to the pessimists, but we should at least talk to the engineers.

John Moriarty is a reader in mathematics at Queen Mary University of London

The Space Shuttle programme

• US manned launch vehicle programme, which ran between 1981 and 2011
• NASA's space shuttle fleet - Columbia, Challenger, Discovery, Atlantis and Endeavour - flew 135 missions, and helped construct the International Space Station and inspired generations
• As well as Challenger, Columbia disintegrated on 1 February 2003 with the loss of seven crew members
• The final space shuttle mission (pictured above) took place in July 2011; the remaining shuttles are now on display in various sites across the US

Subscribe to the BBC News Magazine's email newsletter to get articles sent to your inbox.