GPT-4 uses Theory of Mind to play Texas Hold’em and successfully defeats humans

GPT-4 harnesses Theory of Mind to dominate in Texas Hold'em, outplaying human opponents

Author: Xin Zhiyuan

The Suspicion Agent from the University of Tokyo, using GPT-4, has demonstrated advanced Theory of Mind (ToM) capabilities in incomplete information games.

In complete information games, each player knows all the elements of information.

But incomplete information games are different; they simulate the complexity of decision-making in the real world under uncertainty or incomplete information.

GPT-4, as the most powerful model currently, has extraordinary knowledge retrieval and reasoning abilities.

But can GPT-4 use the knowledge it has learned to play incomplete information games?

To investigate this, researchers at the University of Tokyo introduced the innovative agent, Suspicion Agent, which utilizes the capabilities of GPT-4 to execute incomplete information games.

Paper link: https://arxiv.org/abs/2309.17277

In their research, the GPT-4-based Suspicion Agent can achieve different functionalities through appropriate prompt engineering and demonstrated outstanding adaptability in a series of incomplete information games.

Most importantly, during the game, GPT-4 showed powerful Theory of Mind (ToM) abilities.

GPT-4 can utilize its understanding of human cognition to predict opponents’ thought processes, susceptibility, and actions.

This means that GPT-4 is capable of understanding others like humans and intentionally influencing their behavior.

Similarly, the performance of the GPT-4-based agent in incomplete information games surpasses traditional algorithms, which may inspire more applications of LLM in incomplete information games.

01 Training Method

In order to enable LLM to play various incomplete information games without specific training, researchers divided the entire task into several modules as shown in the diagram below, such as Observation Interpreter, Game Mode Analysis, and Planning Module.

Furthermore, to alleviate the issue of LLM being potentially misled in incomplete information games, researchers first developed structured prompts to help LLM understand the game rules and current state.

For each type of incomplete information game, the following structured rules description can be written:

General rules: game introduction, number of rounds, and betting rules;

Action description: (description of action 1), (description of action 2), etc.;

Single round win/loss rules: conditions for a single round win/loss or draw;

Win/loss payoff rules: rewards or penalties for a single round win/loss;

Overall win/loss rules: number of rounds and overall win/loss conditions.

In most incomplete information game environments, game states are usually represented as low-level values, such as click vectors, for ease of machine learning.

But with LLM, low-level game states can be transformed into natural language texts, aiding in pattern understanding.

Input Description: The type of input received, such as dictionary, list, or other format, and describes the number of elements in the game state and the name of each element;

Element Description: (Description of element 1, (Description of element 2),….

Conversion tip: More guidelines for converting low-level game states into text.

beyfMqHmFbURoO6EQO5AoTFYhrYUnnA6gLdnZWWU.png

In incomplete information games, this way of expression can facilitate understanding and interaction between models.

Researchers have introduced a method of void planning, which has a Reflexion module designed to automatically check the history of the game, allowing LLMs to learn and improve planning from past experiences, as well as a separate planning module specifically for making corresponding decisions.

However, void planning methods often struggle with the inherent uncertainty of incomplete information games, especially when facing opponents who are skilled at exploiting others’ strategies.

Inspired by this adaptability, researchers have designed a new planning method that uses LLM’s Theory of Mind (ToM) ability to understand the opponent’s behavior and adjust strategies accordingly.

02 Experimental Quantitative Evaluation

As shown in Table 1, the Suspicion Agent outperforms all baselines, and the GPT-4-based Suspicion Agent achieved the highest average number of chips in comparison.

These findings strongly demonstrate the advantages of adopting large language models in the field of incomplete information games, and also demonstrate the effectiveness of the proposed framework.

The following figure shows the action percentages of the Suspicion Agent and baseline models.

It can be observed:

Suspicion Agent vs CFR: The CFR algorithm is a conservative strategy that tends to fold when holding weak cards.

The Suspicion Agent successfully identified this pattern and strategically chose to raise more frequently, putting pressure on CFR to fold.

This resulted in the Suspicion Agent accumulating more chips even in situations where its cards are weak or on par with CFR’s cards.

Suspicion Agent vs DMC: DMC is based on a search algorithm and adopts more diversified strategies, including bluffing. It often raises when holding both the weakest and strongest hand.

In response, the Suspicion Agent reduces its raising frequency based on its own cards and observed behavior of DMC, opting for more calls or folds.

Suspicion Agent vs DON: The DON algorithm takes a more aggressive stance, almost always raising with strong or medium cards and never folding.

The Suspicion Agent noticed this and in turn tries to minimize its own raises, choosing more calls or folds based on the community cards and DON’s actions.

Agent of Suspicion Vs NFSP: NFSP exhibits a calling strategy, always choosing to call and never fold.

The response of Agent of Suspicion is to reduce the frequency of raising and fold based on the community cards and the actions observed by NFSP.

Based on the above analysis, it can be seen that Agent of Suspicion possesses strong adaptability and is able to exploit the weaknesses of strategies adopted by various algorithms.

This fully demonstrates the reasoning and adaptability of large language models in imperfect information games.

03 Qualitative Evaluation

In the qualitative evaluation, the researchers evaluated Agent of Suspicion in three incomplete information games (Coup, Texas Hold’em Limit, and Leduc Hold’em).

Coup is a card game where players assume the role of politicians and try to overthrow the governments of other players. The objective of the game is to survive and accumulate power.

Texas Hold’em Limit is a widely popular variation of poker, with multiple variants. “Limit” indicates that there is a fixed limit in each round of betting, meaning that players can only bet a fixed amount.

Leduc Hold’em is a simplified version of Texas Hold’em used for studying game theory and artificial intelligence.

In each case, Agent of Suspicion holds a Jack while the opponents either have a Jack or a Queen.

The opponents initially choose to call instead of raising, implying that their hand is weak. Under the normal plan strategy, Agent of Suspicion chooses to call to see the community cards.

When this reveals that the opponents have a weak hand, they quickly raise, putting Agent of Suspicion in an unstable situation since Jack is the weakest hand.

Under the first-order theory of mind strategy, Agent of Suspicion chooses to fold to minimize losses. This decision is based on the observation that opponents usually call when they have a Queen or a Jack in their hand.

However, these strategies fail to fully exploit the speculative weakness in the opponents’ hand. This drawback stems from them not considering how Agent of Suspicion’s moves may affect the opponents’ reactions.

In contrast, as shown in figure 9, simple cues allow Agent of Suspicion to understand how to influence the opponents’ actions. Choosing to raise intentionally puts pressure on the opponents, prompting them to fold and minimize losses.

Thus, even with similar hand strengths, Agent of Suspicion is able to win many matches, thereby winning more chips than the baseline.

In addition, as shown in figure 10, when the opponent calls or responds to Agent of Suspicion’s raise (indicating a strong hand), Agent of Suspicion quickly adjusts its strategy and chooses to fold to prevent further losses.

This showcases the excellent strategic flexibility of the Suspicion Agent.

04 Melting Research and Component Analysis

To explore how different levels of theory of mind (ToM) perception planning methods affect the behavior of large language models, researchers conducted experiments and comparisons on Leduc Hold’em and plaagainst CFR.

Figure 5 shows the action percentage of the Suspicion Agent using different levels of ToM planning, and table 3 shows the chip gain results.

Table 3: Comparison results of the Suspicion Agent using different levels of ToM and CFR on the Leduc Hold’em environment, as well as quantization results after 100 games.

Observations:

Based on the Reflexion modulevanilla plan, it tends to call and check more during gameplay (highest call​ and check​ ratio when playing against CFR and DMC), which fails to pressure opponents into folding and leads to many unnecessary losses.

However, as shown in table 3, the vanilla plan has the lowest chip gains.

Using first-order ToM, the Suspicion Agent is able to make decisions based on its own card strength and estimates of opponent card strength.

Therefore, it raises more often than a regular plan, but it tends to fold more than other strategies in an effort to minimize unnecessary losses. However, this cautious approach can be exploited by cunning opponent models.

For example, DMC often raises when holding the weakest hand, while CFR sometimes even raises with medium-level hands to pressure the Suspicion Agent. In these situations, the Suspicion Agent’s tendency to raise results in losses.

In contrast, the Suspicion Agent excels at identifying and leveraging behavioral patterns of opponent models.

Specifically, when CFR chooses to check (usually indicating a weak hand) or when DMC checks (indicating inconsistency between its hand and the community cards), the Suspicion Agent raises in a bluffing manner to induce opponents to fold.

As a result, the Suspicion Agent exhibits the highest raising rate among the three planning methods.

This aggressive strategy allows the Suspicion Agent to accumulate more chips even when holding weak cards, maximizing chip gains.

To evaluate the impact of hindsight observation, researchers conducted a hindsight observation omitted from the current game in a melting study.

As shown in table 4 and table 5, even without hindsight observation, the Suspicion Agent still maintains its performance advantage compared to the baseline method.

Table 4: Comparison results highlighting the impact of incorporating opponent observation outcomes into game history in the Leduc Hold’em environment

Table 5: The comparative results indicate that when the Suspicion Agent plays against CFR in the Leduc Hold’em environment, the inclusion of opponent observation results in the game history has an impact. The results show the number of chips won or lost after playing 100 games with different seeds, ranging from 1 to 14.

Conclusion 05

The Suspicion Agent did not undergo any specialized training but was able to defeat algorithms specifically trained for different incomplete information games like CFR and NFSP, solely using the prior knowledge and reasoning abilities of GPT-4, in games such as Leduc Hold’em.

This demonstrates the potential of large models to achieve strong performance in incomplete information games.

By integrating first-order and second-order theory of mind models, the Suspicion Agent can predict opponent actions and adjust its strategy accordingly. This allows it to adapt to different types of opponents.

The Suspicion Agent also demonstrates the ability to generalize across different incomplete information games, making decisions based on game rules and observation rules in games like Coup and Texas Hold’em.

However, the Suspicion Agent also has certain limitations. For example, due to computational cost constraints, the evaluation sample size for different algorithms is small.

In addition, the inference cost is high, with each game costing nearly $1, and the output of the Suspicion Agent is sensitive to hints, leading to issues of hallucination.

At the same time, the performance of the Suspicion Agent is not satisfactory when it comes to complex reasoning and computation.

In the future, the Suspicion Agent will be improved in terms of computational efficiency, robustness in reasoning, and support for multimodal and multi-step reasoning, to better adapt to complex gaming environments.

Furthermore, the application of the Suspicion Agent in incomplete information games can be extended to the integration of multimodal information in the future, simulating more realistic interactions and expanding to multiplayer gaming environments.

References:

https://arxiv.org/abs/2309.17277

We will continue to update Blocking; if you have any questions or suggestions, please contact us!

Share:

Was this article helpful?

93 out of 132 found this helpful

Discover more

Market

Banks Join Forces to Transform Cross-Border Transactions

Fashion-forward global banks, Deutsche Bank and Standard Chartered, are leading the way in a revolutionary solution t...

Market

Binance's Guilty Plea Fails to Shake Crypto Traders' Bullish Belief in Bitcoin

Despite CZ's departure as CEO of Binance, traders remain focused on Bitcoin's popularity.

Blockchain

🏎️ Enhancing the Excitement: Wingalaxy Revs Up the Racing Game on the Cronos Blockchain 🏁

Wingalaxy has recently announced the launch of their first race-to-win game on the Cronos blockchain, specifically de...

Blockchain

Is the SEC Losing its Mojo? Ripple’s Chief Legal Officer Raises Concerns

Fashionista, take note Ripple's chief legal officer, Stuart Alderoty, has raised concerns about the leadership of SEC...

Bitcoin

Satoshi Nakamoto: The Anonymous Genius Behind Bitcoin

Gabor Gurbacs commended Satoshi Nakamoto's decision to step away from the spotlight after creating his invention as o...

Blockchain

Tether (USDT): Brazil’s New Crypto Darling

USDT has emerged as the dominant crypto in Brazil, accounting for an overwhelming 80% of all crypto transactions made...