paint-brush
Broader Insights into Exploitable Strategies in Zero-Sum Gamesby@algorithmicbias

Broader Insights into Exploitable Strategies in Zero-Sum Games

by Algorithmic BiasJanuary 24th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section explores generalizing strategies to beat behaviorally-biased opponents in zero-sum games. Predicting actions remains feasible with a halving algorithm, but learning best responses depends on the opponent's strategy. A key condition is that opponents must eventually reveal best responses to actions after repeated play. Without this, consistent exploitation is impossible for certain strategies like Gambler’s Fallacy or Tie-Stay variants.
featured image - Broader Insights into Exploitable Strategies in Zero-Sum Games
Algorithmic Bias HackerNoon profile picture
0-item

Authors:

(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;

(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.

Abstract and 1 Introduction

2 Setting and 2.1 Models of behaviorally-biased opponents

3 Preliminaries and Intuition

4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent

4.3 Win-Stay, Lose-Shift Opponent

4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent

5 Generalizing

5.1 Other Behaviorally-Biased Strategies

5.2 Exploiting an Unknown Strategy from a Known Set of Strategies

6 Future Work and References

A Appendix

A.1 Win-Stay Lose-Shift Variant: Tie-Stay

A.2 Follow-the-Leader Variant: Limited History

A.3 Ellipsoid Mistake Bounds

A.4 Highest Average Payoff Opponent

5.1 Other Behaviorally-Biased Strategies

A natural question to ask is, what kinds of behaviorally-biased strategies can we exploit to win nearly every round of a permissible game (Definition 1)? Clearly, if we can predict the opponent’s actions and learn best responses to those actions the opponent plays, we can achieve this goal.


Predicting the Opponent’s Actions



Learning Best Responses


The more challenging question is which deterministic strategies we can exploit to learn best responses for any action the opponent might play. Note that we cannot do this for every deterministic strategy; consider a very simple opponent who always plays the same action a. In this case, the opponent’s play does not reveal any information about a best response to a.


▶ Observation 7. It is not always possible to exploit a known deterministic strategy to guarantee winning any rounds.




Note that not every reasonable behavioral strategy has this property; several of the biased opponents we discussed earlier do not: the Gambler’s Fallacy opponent, the Tie-Stay variant of Win-Stay Lose-Shift opponent, and the Highest Average Payoff opponent.


This paper is available on arxiv under CC BY 4.0 DEED license.