Data dilemma: Massive data is the challenge, says SGI CTO
- 31 March 2011
- From the section Business
Each week we ask chief technology officers (CTOs) and other high-profile tech decision-makers three questions.
This week, Dr Eng Lim Goh, CTO of SGI is providing the answers.
SGI provides high performance computing, servers, storage, data centre and cloud computing solutions. The company has over 1,300 employees worldwide and sells systems, technologies, software, and services to enterprises in over 26 countries.
What's your biggest technology problem right now?
The big challenge we are facing is the growing amount of data and that data is coming from two different places.
The first place is high performance computing. These systems are getting bigger and faster, and the result is that they are generating more and more data.
So we've spent years refining the process to get them faster, and linking more and more systems so they can process models faster and at higher resolutions and better fidelity. The result of this in many cases is the generation of massive amounts of data.
The second part that is also a big data generator is from scientific instruments - I like to call it from microscopes all the way to telescopes and everything in between - from DNA sequencers to street cameras to sensors, and other instruments that are continuously recording and generating data.
We end up in a world where we have to deal with this deluge of data. This is the challenge.
We spent a lot of energy building bigger, better computers - now is the time to look at how we deal with the data generated from them.
What's the next big tech thing in your industry?
Following on from the fact that we have to deal with the challenge of big data, the next big thing for us has to be how to deal with that data, finding a breakthrough in dealing with it.
If one were to generalise, we could look at data in two ways. Data easily divisible into small pieces and data that is not easily divisible.
In the former case we actually have some solutions to these massive clusters, using server farms and on top of that these Hadoop distributed server systems [software filesystems that let users work with large amounts of data], and on top of that we can put an application like MapReduce that search engines use, this will form a highly distributed application to allow you to process and search through the data
It's not a perfect solution but at least we have a means of dealing with it.
I think the big breakthrough comes with the second part - big data not easily divisible.
One example is indexing the entire world wide web. Search engines don't do it every day, it takes a long time. It's not easily divisible - you can't use a massive server farm to split it.
Once you've indexed it a search is easily divisible.
This is where we've been dedicating quite a lot of research and development, in figuring out how to build not a server farm of many small nodes, but to build a system that is one big server made up of many nodes - so to the application it's one big server.
The goal of this project - Project Ultraviolet - is to build a scale-up server, so it can keep growing with the massive amounts of data, where the data can be processed as a whole in a monolithic way.
That's what we've been spending our energies on - so we could for example index the whole worldwide web every hour - but there would be network implications.
What's the biggest technology mistake you've ever made - either at work or in your own life?
I've been with the company more than 20 years and CTO more than 10 of those 20 years.
I think rather than a single mistake it is more a collection of things, or wisdom that you develop over the years.
It is where in the past one would lean more towards trying to get a product complete with all the features that we desire, and in fact more than we initially desire. As we develop we want to put more into it. So basically the desire to get towards perfection.
But over the years I've learned that to be successful one has to have a triangle where yes, feature rich is one point, but you need to have two other points - time to market and cost.
So over the 10 years I've been CTO, what I've learnt from the past is that one has to look at these three points as you develop products rather than the single pursuit of feature richness.
In the early days if you had something that was feature rich, you could still be very successful.
But over the years people can come up with innovations much quicker, so you have to start thinking about getting things to the market in a timely manner. Times have changed, and we have now to look at things in the this tri-partite way rather a singular pursuit of feature richness.
That's the lesson I've learnt