Counterfactual Regret Minimization (CFR) on Kuhn Poker

This applies Counterfactual Regret Minimization (CFR) to Kuhn poker.

Kuhn Poker is a two player 3-card betting game. The players are dealt one card each out of Ace, King and Queen (no suits). There are only three cards in the pack so one card is left out. Ace beats King and Queen and King beats Queen - just like in normal ranking of cards.

Both players ante $1$ chip (blindly bet $1$ chip). After looking at the cards, the first player can either pass or bet $1$ chip. If first player passes, the the player with higher card wins the pot. If first player bets, the second play can bet (i.e. call) $1$ chip or pass (i.e. fold). If the second player bets and the player with the higher card wins the pot. If the second player passes (i.e. folds) the first player gets the pot. This game is played repeatedly and a good strategy will optimize for the long term utility (or winnings).

Here’s some example games:

  • KAp - Player 1 gets K. Player 2 gets A. Player 1 passes. Player 2 doesn’t get a betting chance and Player 2 wins the pot of $2$ chips.
  • QKbp - Player 1 gets Q. Player 2 gets K. Player 1 bets a chip. Player 2 passes (folds). Player 1 gets the pot of $4$ because Player 2 folded.
  • QAbb - Player 1 gets Q. Player 2 gets A. Player 1 bets a chip. Player 2 also bets (calls). Player 2 wins the pot of $4$.

He we extend the InfoSet class and History class defined in __init__.py with Kuhn Poker specifics.

Open In Colab View Run

38from typing import List, cast, Dict
39
40import numpy as np
41
42from labml import experiment
43from labml.configs import option
44from labml_nn.cfr import History as _History, InfoSet as _InfoSet, Action, Player, CFRConfigs
45from labml_nn.cfr.infoset_saver import InfoSetSaver

Kuhn poker actions are pass (p) or bet (b)

48ACTIONS = cast(List[Action], ['p', 'b'])

The three cards in play are Ace, King and Queen

50CHANCES = cast(List[Action], ['A', 'K', 'Q'])

There are two players

52PLAYERS = cast(List[Player], [0, 1])
55class InfoSet(_InfoSet):

Does not support save/load

60    @staticmethod
61    def from_dict(data: Dict[str, any]) -> 'InfoSet':
63        pass

Return the list of actions. Terminal states are handled by History class.

65    def actions(self) -> List[Action]:
69        return ACTIONS

Human readable string representation - it gives the betting probability

71    def __repr__(self):
75        total = sum(self.cumulative_strategy.values())
76        total = max(total, 1e-6)
77        bet = self.cumulative_strategy[cast(Action, 'b')] / total
78        return f'{bet * 100: .1f}%'

History

This defines when a game ends, calculates the utility and sample chance events (dealing cards).

The history is stored in a string: * First two characters are the cards dealt to player 1 and player 2 * The third character is the action by the first player * Fourth character is the action by the second player

81class History(_History):

History

94    history: str

Initialize with a given history string

96    def __init__(self, history: str = ''):
100        self.history = history

Whether the history is terminal (game over).

102    def is_terminal(self):

Players are yet to take actions

107        if len(self.history) <= 2:
108            return False

Last player to play passed (game over)

110        elif self.history[-1] == 'p':
111            return True

Both players called (bet) (game over)

113        elif self.history[-2:] == 'bb':
114            return True

Any other combination

116        else:
117            return False

Calculate the terminal utility for player $1$, $u_1(z)$

119    def _terminal_utility_p1(self) -> float:

$+1$ if Player 1 has a better card and $-1$ otherwise

124        winner = -1 + 2 * (self.history[0] < self.history[1])

Second player passed

127        if self.history[-2:] == 'bp':
128            return 1

Both players called, the player with better card wins $2$ chips

130        elif self.history[-2:] == 'bb':
131            return winner * 2

First player passed, the player with better card wins $1$ chip

133        elif self.history[-1] == 'p':
134            return winner

History is non-terminal

136        else:
137            raise RuntimeError()

Get the terminal utility for player $i$

139    def terminal_utility(self, i: Player) -> float:

If $i$ is Player 1

144        if i == PLAYERS[0]:
145            return self._terminal_utility_p1()

Otherwise, $u_2(z) = -u_1(z)$

147        else:
148            return -1 * self._terminal_utility_p1()

The first two events are card dealing; i.e. chance events

150    def is_chance(self) -> bool:
154        return len(self.history) < 2

Add an action to the history and return a new history

156    def __add__(self, other: Action):
160        return History(self.history + other)

Current player

162    def player(self) -> Player:
166        return cast(Player, len(self.history) % 2)

Sample a chance action

168    def sample_chance(self) -> Action:
172        while True:

Randomly pick a card

174            r = np.random.randint(len(CHANCES))
175            chance = CHANCES[r]

See if the card was dealt before

177            for c in self.history:
178                if c == chance:
179                    chance = None
180                    break

Return the card if it was not dealt before

183            if chance is not None:
184                return cast(Action, chance)

Human readable representation

186    def __repr__(self):
190        return repr(self.history)

Information set key for the current history. This is a string of actions only visible to the current player.

192    def info_set_key(self) -> str:

Get current player

198        i = self.player()

Current player sees her card and the betting actions

200        return self.history[i] + self.history[2:]
202    def new_info_set(self) -> InfoSet:

Create a new information set object

204        return InfoSet(self.info_set_key())

A function to create an empty history object

207def create_new_history():
209    return History()

Configurations extends the CFR configurations class

212class Configs(CFRConfigs):
216    pass

Set the create_new_history method for Kuhn Poker

219@option(Configs.create_new_history)
220def _cnh():
224    return create_new_history

Run the experiment

227def main():

Create an experiment, we only write tracking information to sqlite to speed things up. Since the algorithm iterates fast and we track data on each iteration, writing to other destinations such as Tensorboard can be relatively time consuming. SQLite is enough for our analytics.

236    experiment.create(name='kuhn_poker', writers={'sqlite'})

Initialize configuration

238    conf = Configs()

Load configuration

240    experiment.configs(conf)

Set models for saving

242    experiment.add_model_savers({'info_sets': InfoSetSaver(conf.cfr.info_sets)})

Start the experiment

244    with experiment.start():

Start iterating

246        conf.cfr.iterate()
250if __name__ == '__main__':
251    main()