In the article, developers explained how they achieved the decrease and presented new features of the toolkit, including Generative Adversarial Imitation Learning (GAIL) and the Soft Actor-Critic algorithm for Proximal Policy Optimization (PPO). Generative Adversarial Imitation Learning (GAIL) enables the use of human demonstrations to guide the learning process that leads to higher efficiency. One of the main features PPO with the new algorithm is sample-efficiency that means there is no need to run the game as long to understand the good policy.
They also illustrated the differences between versions of the toolkit and showed the performance result in Snoopy Pop.
Check the article for a more detailed review of the update!