Facebook AI Introduces ‘ReBeL’: An Algorithm That Generalizes The Paradigm Of Self-Play Reinforcement Learning And Search To Imperfect-Information…

Posted: December 14, 2020 at 1:55 am


without comments

Most AI systems excel in generating specific responses to a particular problem. Today, AI can outperform humans in various fields. For AI to do any task it is presented with; it needs to generalize, learn, and understand new situations as they occur without supplementary guidance. However, as humans can recognize chess and Poker both as games in the broadest sense, teaching a single AI to play both is challenging.

Perfect-Information games versus Imperfect-Information games

AI systems are relatively successful at mastering perfect-information games like chess, where nothing is hidden to either player. Each player can see the entire board and all possible moves in all instances. With bots like AlphaZero, AI can even combine reinforcement learning with search (RL+Search) to teach themselves to master these games from scratch.

Unlike perfect-information games and single-agent settings, imperfect-information games have a critical challenge that an actions value may depend on their chosen probability. Therefore, the team states that it is also crucial to include the probability that different sequences of actions occurred and not just the sequences of actions alone.

ReBel

Facebook has recently introduced Recursive Belief-based Learning (ReBeL). It is a general RL+Search algorithm that works in all two-player zero-sum games, including imperfect-information games. ReBeL grows on the RL+Search algorithms that have proved successful in perfect-information games. However, unlike past AIs, ReBeL makes decisions by factoring in the probability distribution of different views each player might have about the games current state, which is called a public belief state (PBS). For example, ReBeL can assess the chances that its poker opponent thinks it has.

Former RL+Search algorithms break down in imperfect-information games like Poker, where not complete information is known (for example, players keep their cards secret in Poker). These algorithms give a fixed value to each action regardless of whether the action is chosen. For instance, in chess, a right step is good irrespective of whether it is chosen frequently or rarely. But in games like Poker, the more a player bluffs, its value goes down as opponents can alter their strategy to call more of those bluffs. Thus Pluribus poker bot is trained on an approach that uses search during actual gameplay and not before.

ReBeL can treat imperfect-information games similar to perfect-information games by accounting for the views of each player. Facebook has developed a modified RL+Search algorithm that ReBeL can leverage to work with the higher-dimensional state and action range of imperfect-information games.

Experiments show that ReBeL is efficient in large-scale two-player zero-sum imperfect-information games such as Liars Dice and Poker. ReBeL achieves superhuman performance by even defeating a top human professional in the benchmark game of heads-up no-limit Texas Hold em.

Several works have occurred before to achieve the same. However, ReBeL executes it using considerably less expert domain knowledge than any previous poker AI. This is a crucial step to building a generalized AI that can solve complex real-world problems involving hidden information like negotiations, fraud detection, cybersecurity, etc.

Limitations:

ReBeL is the first AI to empower RL+Search in imperfect-information games. However, there are some limitations to its current implementation, as listed below:

Nevertheless, ReBeL achieves low exploitability in benchmark games and is a significant start toward creating more general AI algorithms. To promote further research, Facebook has open-sourced the implementation of ReBeL for Liars Dice.

GitHub: (For ReBeL for Liars Dice) https://github.com/facebookresearch/rebel?

Source: https://ai.facebook.com/blog/rebel-a-general-game-playing-ai-bot-that-excels-at-poker-and-more

Related Paper: https://arxiv.org/pdf/2007.13544.pdf

Read the original here:

Facebook AI Introduces 'ReBeL': An Algorithm That Generalizes The Paradigm Of Self-Play Reinforcement Learning And Search To Imperfect-Information...

Related Posts

Written by admin |

December 14th, 2020 at 1:55 am

Posted in Alphazero




matomo tracker