AI conquers challenge of 1980s platform games

By Paul Rincon
Science editor, BBC News website

Published
image copyrightGo-Explore
image captionMontezuma's Revenge is deceptively difficult to solve for an algorithm

Scientists have come up with a computer program that can master a variety of 1980s exploration games, paving the way for more self-sufficient robots.

They created a family of algorithms (software-based instructions for solving a problem) able to complete classic Atari games, such as Pitfall.

Previously, these scrolling platform games have been challenging to solve using artificial intelligence (AI).

The algorithms could help robots better navigate real-world environments.

This remains a core challenge in the fields of robotics and artificial intelligence. The types of environments in question include disaster zones, where robots could be sent out to search for survivors, or even just the average home.

The work in this study falls into an area of AI research known as reinforcement learning.

A number of games used in the research require the user to explore mazes containing rewards, obstacles and hazards. The family of algorithms, known collectively as Go-Explore, produced substantial improvements on previous attempts to solve games such as the wittily titled Montezuma's Revenge, released in 1984, Freeway (1981) and the aforementioned Pitfall (1982).

One way the researchers did this was by developing algorithms that build up archives of areas they have already visited.

image copyrightActivision
image captionPitfall was released by Activision in 1982

"Our method is indeed pretty simple and straightforward, although that is often the case with scientific breakthroughs," researchers Adrien Ecoffet, Joost Huizinga and Jeff Clune said in response to questions sent over email.

"The reason our approach hadn't been considered before is that it differs strongly from the dominant approach that has historically been used for addressing these problems in the reinforcement learning community, called 'intrinsic motivation'. In intrinsic motivation, instead of dividing exploration into returning and exploring like we do, the agent is simply rewarded for discovering new areas."

A problem with the intrinsic motivation approach is that, while searching for a solution, the algorithm can "forget" about promising areas that still need to be explored. This is known as "detachment".

The team found a way to overcome this: by compiling the archive of areas it has visited, the algorithm can return to a promising intermediate stage of the game as a point from which to explore further.

image copyrightGetty Images
image captionThe algorithms could help improve robot intelligence

But there was another problem with previous approaches to mastering these games. "They rely on random actions that may be taken at any point in time, including while the agent is still going towards the area that actually needs to be explored," the scientists told BBC News.

"If you have an environment where your actions have to be accurate and precise, such as a game with many hazards that can instantly kill you, such random actions can prevent you from reaching the area you actually want to explore."

The technical term for this is "derailment".

The new method, described in the prestigious journal Nature, resolves the derailment problem by separating the process of returning to previously visited areas from the process of exploring new ones - and tackles them in different ways.

The team members, who carried out their work while employed by Uber AI Labs in California, said the work lends itself to algorithms used for guiding robots in the home or in industrial settings.

They say that Go-Explore is designed to tackle longstanding problems in reinforcement learning. "Think about asking a robot to get you a coffee: there is virtually no chance it will happen to operate the coffee machine by just acting randomly."

The scientists added: "In addition to robotics, Go-Explore has already seen some experimental research in language learning, where an agent learns the meaning of words by exploring a text-based game, and for discovering potential failures in the behaviour of a self-driving car."

Follow Paul on Twitter.