OpenAI’s new AI learned to play Minecraft by watching 70,000 hours of YouTube

ADVERTISEMENT

OpenAI’s new AI learned to play Minecraft by watching 70,000 hours of YouTube

In 2020, OpenAI’s GPT-3 machine learning algorithm shocked people when, after ingesting billions of words mined from the internet, it began spitting out well-crafted sentences. This year, DALL-E 2, a cousin of GPT-3 trained on text and images, caused a similar stir online when it started generating surreal images of astronauts riding horses and, more recently, crafting weird stuff, photorealistic faces of people who do not exist.

Now the company says its latest AI has learned to play Minecraft after watching some 70,000 hours of video showing people playing it on YouTube.

school of mines

Compared to numerous previous Minecraft algorithms that work on much simpler versions of the game, the new AI plays in the same environment as humans, using standard keyboard and mouse commands.

in a blog post Y prepress Detailing the work, the OpenAI team says that immediately the algorithm learned basic skills, such as felling trees, making planks, and building workbenches. They also watched him swim, hunt, cook, and “jump pillars.”

“To our knowledge, there is no published work that operates in the full, unmodified human action space, which includes drag-and-drop inventory management and item crafting,” the authors wrote in their article.

By fine-tuning, that is, training the model on a more focused dataset, they found that the algorithm performed all of these tasks more reliably, but they also began to advance their technological prowess by crafting tools from wood and stone and building basic shelters, explore villages and raid chests.

After further adjustment with reinforcement learning, he learned how to build a diamond pickaxe, a skill that takes human players about 20 minutes and 24,000 actions to complete.

This is a remarkable result. AI has long struggled with the open game of Minecraft. Games like chess and Go, which the AI ​​already masters, have clear goals and progress towards those goals can be measured. To conquer Go, the researchers used reinforcement learning, where an algorithm is assigned a goal and rewarded for progress toward that goal. Minecraft, on the other hand, has a plethora of possible goals, progress is less linear, and deep reinforcement learning algorithms are generally left to spin their wheels.

In the 2019 MineRL Minecraft competition for AI developers, for example, none of the 660 submissions made it to the the relatively simple goal of the competition to mine diamonds.

It’s worth noting that to reward creativity and show that throwing computing power at a problem isn’t always the answer, the organizers of MineRL placed strict limits on entrants: they were allowed an NVIDIA GPU and 1,000 hours of recorded gameplay. Although the contestants performed admirably, the OpenAI result, achieved with more data and 720 NVIDIA GPUs, seems to show that computing power still has its benefits.

AI gets cunning

With its video pretraining (VPT) algorithm for Minecraft, OpenAI returned to the approach used with GPT-3 and DALL-E: pretraining an algorithm on a massive dataset of human-created content. But the success of the algorithm was not possible with computing power or data alone. Training a Minecraft AI on video was not practical before.

Raw video streams are not as useful for behavioral AI as they are for content generators like GPT-3 and DALL-E. It shows what people are doing, but it doesn’t explain how they are doing it. For the algorithm to link the video to the actions, it needs tags. A video frame showing a player’s item collection, for example, should be labeled “inventory” along with the command key “E” used to open the inventory.

Labeling every frame in 70,000 hours of video would be… crazy. So the team paid Upwork contractors to record and tag basic Minecraft skills. They used 2000 hours of this video to teach a second algorithm how to tag Minecraft videos, and that algorithm, IDM, scored YouTube’s 70,000 hours of footage. (The team says that IDM was more than 90 percent accurate in labeling keyboard and mouse commands.)

This approach of humans training a data labeling algorithm to unlock online behavioral datasets can also help AI learn other skills. “VPT paves the way to allow agents learn to act watching the vast number of videos on the Internet,” the researcher wrote. Beyond Minecraft, OpenAI believes VPT can bring new real-world applications, like algorithms that operate computers on a prompt (imagine, for example, asking your laptop to find a document and email it to your boss ).

Diamonds are not forever

Perhaps much to the chagrin of the organizers of the MineRL competition, the results seem to show that computing power and resources still move the needle on the most advanced AI.

No matter the cost of computing, OpenAI said Upwork contractors alone cost $160,000. Though to be fair, manual labeling of the entire dataset would have cost millions and taken considerable time to complete. And although the computing power was not negligible, the model was quite small. The hundreds of millions of parameters of VPT are orders of magnitude smaller than the hundreds of billions of GPT-3.

Still, the push to find smart new approaches that use less data and computing is valid. A child can learn the basics of Minecraft by watching a video or two. Today’s AI requires much more to learn even simple skills. Making AI more efficient is a big and worthwhile challenge.

In any case, OpenAI is in the mood to share this time. The researchers say VPT is not without risk: They strictly controlled access to algorithms like GPT-3 and DALL-E in part to limit misuse, but the risk is minimal for now. They have opened up the data, environment and algorithm and are partnering with MineRL. This year’s contestants are free to use, modify, and tweak the latest in Minecraft AI.

It is very likely that this time they will far exceed the mining of diamonds.

Image credit: SIMON READS / unsplash

ADVERTISEMENT

ADVERTISEMENT