paint-brush
The Key to Defeating Win-Stay, Lose-Shift Opponent Variantsby@algorithmicbias
New Story

The Key to Defeating Win-Stay, Lose-Shift Opponent Variants

by Algorithmic BiasJanuary 24th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section introduces algorithms to counter the Tie-Shift and Tie-Stay variants of the Win-Stay, Lose-Shift opponent. Against Tie-Shift, the strategy involves learning the opponent's action order, recording best responses, and predicting their next moves. For Tie-Stay, the approach is simpler, as the opponent only shifts actions after a loss. Both algorithms ensure nearly perfect wins after an initial learning phase.
featured image - The Key to Defeating Win-Stay, Lose-Shift Opponent Variants
Algorithmic Bias HackerNoon profile picture
0-item

Authors:

(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;

(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.

Abstract and 1 Introduction

2 Setting and 2.1 Models of behaviorally-biased opponents

3 Preliminaries and Intuition

4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent

4.3 Win-Stay, Lose-Shift Opponent

4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent

5 Generalizing

5.1 Other Behaviorally-Biased Strategies

5.2 Exploiting an Unknown Strategy from a Known Set of Strategies

6 Future Work and References

A Appendix

A.1 Win-Stay Lose-Shift Variant: Tie-Stay

A.2 Follow-the-Leader Variant: Limited History

A.3 Ellipsoid Mistake Bounds

A.4 Highest Average Payoff Opponent

4.3 Win-Stay, Lose-Shift Opponent

Recall that the Win-Stay Lose-Shift opponent plays the same action immediately following a win, and switches to the next action in its action ordering immediately following a loss. The Tie-Shift variant of this opponent treats a tie like a loss and shifts, and the Tie-Stay variant treats a tie like a win and stays.

4.3.1 Variant: Tie-Shift



Proof. In the first phase, we record the correct action ordering: the opponent starts by playing the first action in their action ordering and always shifts to the next action in the ordering, so by observing n − 1 shifts we observe all n actions in the correct order. Because we play each action in succession against the opponent’s current action, they are guaranteed to shift after at most n − 1 rounds, since their action must tie with itself and lose to at least one other action.



At the start of phase 4, we correctly predict the opponent’s next action: we know we won the last round, so we know that the opponent will shift to the next action in its action order (which we have recorded correctly, as shown above). We showed above that we correctly recorded a best response to each action, so we win by playing the recorded best response to the predicted action. At the start of the next round, the same conditions hold (and will hold after each round), so we will win all subsequent rounds.


4.3.2 Variant: Tie-Stay

The main difference in beating the Tie-Stay variant compared to the Tie-Shift variant is that we can find a best response to each action by simply playing each action in succession until the opponent switches, since they only switch following a loss. For the algorithm, theorem and proof see A.1 in the Appendix.


This paper is available on arxiv under CC BY 4.0 DEED license.