Bank's IT woes could be just the beginning
It's a well-used saying, but the adage about trying to find a needle in a haystack describes exactly what the technical team at British bank NatWest were faced with at the end of June.
A major computing error meant many of its customers' bank accounts were not correctly updated - kicking off a chain reaction that led to unpaid bills, disrupted earnings and a PR disaster.
During what was arguably the biggest IT failure the banking sector in the UK has seen in the last 10 years (at least in terms of the visible impact on customers), the IT team was faced with digging through layers and layers of complex technology to try and find the source of the problem.
It took some time, and while the share price tumbled and customer complaints echoed across every TV and radio network, the pressure to find and resolve the problem grew and grew.
In a sense, what triggered the problem may end up being largely irrelevant. No system is 100% foolproof. Large organisations of all kinds will always be complex and therefore so will the systems that support them.
The question is, how do you incorporate the current and rapidly developing IT challenges into your business IT infrastructure in a way that will enable you to predict and learn from your IT failures when they do occur?
As the consumerisation of technology grows and more of us look to access services through the internet, and on increasing numbers and types of digital platforms, this only adds to the management headache.
This is not only because of the number of different systems involved and the difficulty in updating and maintaining them all, but also because of the way that these systems interface with customers.
A failure is not just going to cause a problem for the bank teller or the payments clerk, but it's also going to have an instant and direct impact on the customer.
So many of us rely on the internet, and increasingly our personal devices, for statement updates and payments services that when the system goes down we instantly feel the impact.
Brand is increasingly tied to digital experience, and the damage that a digital breakdown can inflict on brand trust and reputation is now visible for all to see.
The other significant consideration is the huge number of interdependences within any given system.
The domino effect that can be triggered by a failure - whether it's a human or system error - means that by the time the impact is felt, tracing the problem back to its source is very difficult.
As we saw with the NatWest scenario it can take days, even weeks, before the problem is isolated.
Meanwhile the problems stack up and the ripples multiply as the organisation gets further and further behind on its day-to-day transactions.
The speed with which our IT systems are expected to operate is the third point of this dangerous triangle. Ironically it is consumer expectation that has, to a great extent, delivered this need for speed.
We are so used to a real-time environment - instant updates across multiple social media platforms and real-time order tracking online, for example - that we have moved these expectations into our interactions with the organisations we deal with every day, whether these be banks, retailers, public sector services or anything else.
This creates significant pressure and, to be brutally honest, IT departments are struggling to keep up.
So, what's the answer? Simplification is certainly part of it, but as we have already ascertained, large organisations are never going to be simple.
What is needed is more effective management of these systems, management that combines intelligent software platforms with intelligent people.
The huge volume of updates and changes that are required to keep IT environments like this going have long made automation a necessity.
It is simply not possible for IT management teams to do all of this manually. Automation brings significant benefits, both in terms of cost and efficiency - but they come with a warning.
Without human intelligence, automation can cause as many problems as it solves. Once an action has been triggered, the process can be unstoppable and a failure along the way can create chaos before anyone has even realised that something is wrong.
Despite the increasing sophistication of our systems, the role of the human "handler" has never been more important. With thousands of interdependencies across these environments there is no way that even the best and most brilliant IT manager can know the impact of every potential error.
This is where the intelligence of the system becomes important. Automation cannot be effective unless the automation platform is able to recognise that an error has occurred, where it occurred and what the impact will be, as well as making sure it never happens again.
You could argue that none of this is rocket science - more common sense. That said, the drive towards efficiency and speed has sometimes been to the detriment of intelligence.
Buzz words like "cloud computing" and even "automation" promise insight, savings and increased simplicity. While taking systems and applications out of the in-house architecture can certainly help reduce the complexity, cloud computing is not a panacea.
Automation too is a no-brainer, but it has to be intelligent automation - the kind that works hand in hand with human intellect.
Without that the NatWest scenario will be the first of many and it won't just be the banking sector that suffers.
Jason Liu is the chief executive of UC4 Software, a company which specialises in implementing automated IT systems for businesses.