Mixed Strategy Game

Mixed Strategy Game

每一個博弈者按照一定概率選擇策略。

在某些情況下 Pure Strategy 是不適用的,比如零和博弈、多個納什均衡節點。

A probability distribution for each player.
The distributions are mutual best responses to one another in the sense of expected payoffs.
It is a stochastic steady state.

Solving matching pennies

在這裏插入圖片描述
Player 1’s expected payoffs:
If Player 1 chooses Head, -q+(1-q)=1-2q
If Player 1 chooses Tail, q-(1-q)=2q-1
Player 1’s best response B1(q):
For q<0.5, Head (r=1)
For q>0.5, Tail (r=0)
For q=0.5, indifferent (0≤r≤1)
在這裏插入圖片描述
Player 2’s expected payoffs:
If Player 2 chooses Head, r-(1-r)=2r-1
If Player 2 chooses Tail, -r+(1-r)=1-2r
Player 2’s best response B2®:
For r<0.5, Tail (q=0)
For r>0.5, Head (q=1)
For r=0.5, indifferent (0≤q≤1)
在這裏插入圖片描述
達到 “概率” 的納什均衡。決策不僅取決於對手的策略也同時 取決於每個策略對應的概率。
在這裏插入圖片描述

Example

Expected payoffs: 2 players each with two pure strategies.
在這裏插入圖片描述
Player 1 plays a mixed strategy (r, 1- r ). Player 2 plays a mixed strategy (q, 1- q).

Player 1’s expected payoff of playing s11: EU1(s11, (q, 1-q))=q×u1(s11, s21)+(1-q)×u1(s11, s22)
Player 1’s expected payoff of playing s12: EU1(s12, (q, 1-q))= q×u1(s12, s21)+(1-q)×u1(s12, s22)
Player 1’s expected payoff from her mixed strategy: v1((r, 1-r), (q, 1-q))=r×EU1(s11, (q, 1-q))+(1-r)×EU1(s12, (q, 1-q))

Player 2’s expected payoff of playing s21: EU2(s21, (r, 1-r))=r×u2(s11, s21)+(1-r)×u2(s12, s21)
Player 2’s expected payoff of playing s22: EU2(s22, (r, 1-r))= r×u2(s11, s22)+(1-r)×u2(s12, s22)
Player 2’s expected payoff from her mixed strategy: v2((r, 1-r),(q, 1-q))=q×EU2(s21, (r, 1-r))+(1-q)×EU2(s22, (r, 1-r))

Mixed strategy Nash equilibrium:
A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if (r*,1-r*) is a best response to (q*, 1-q*), and (q*, 1-q*) is a best response to (r*,1-r*). That is,
v1((r*, 1-r*), (q*, 1-q*)) ≥ v1((r, 1-r), (q*, 1-q*)), for all 0≤ r ≤1
v2((r*, 1-r*), (q*, 1-q*)) ≥ v2((r*, 1-r*), (q, 1-q)), for all 0≤ q ≤1

Theorem

Theorem 1

A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if and only if
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s11, (q*, 1-q*))
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s12, (q*, 1-q*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s21, (r*, 1-r*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s22, (r*, 1-r*))

在競爭者使用 mixed strategy 時,選擇使用 mixed strategy 一定比使用單邊 pure strategy 帶來的收益要高。pure strategy 是 mixed strategy 的一個特例,是以 1 的概率選擇策略,很顯然,有更多的選擇肯定要比單一選擇帶來的收益高。

Theorem 2

Let ((r*, 1-r*), (q*, 1-q*)) be a pair of mixed strategies, where 0 <r*<1, 0<q*<1. Then ((r*, 1-r*), (q*, 1-q*)) is a mixed strategy Nash equilibrium if and only if
EU1(s11, (q*, 1-q*)) = EU1(s12, (q*, 1-q*))
EU2(s21, (r*, 1-r*)) = EU2(s22, (r*, 1-r*))
That is, each player is indifferent between her two strategies.
Significance: it gives conditions for a mixed strategy NE in terms of each player’s expected payoffs only to her pure strategies.

Mixed Strategy Nash Equilibrium

Mixed Strategy:
A mixed strategy of a player is a probability distribution over the player’s strategies.

Mixed strategy Nash equilibrium:
A probability distribution for each player
The distributions are mutual best responses to one another in the sense of expected payoffs

Employee Monitoring

在這裏插入圖片描述
Employee’s expected payoff of playing “work”
EU1(Work, (q, 1–q)) = q×50 + (1–q)×50=50

Employee’s expected payoff of playing “shirk”
EU1(Shirk, (q, 1–q)) = q×0 + (1–q)×100=100(1–q)

Employee is indifferent between playing Work and Shirk.
50=100(1–q)
q=1/2

Manager’s expected payoff of playing “Monitor”
EU2(Monitor, (r, 1–r)) = r×90+(1–r)×(-10) =100r–10

Manager’s expected payoff of playing “Not”
EU2(Not, (r, 1–r)) = r×100+(1–r)×(-100) =200r–100

Manager is indifferent between playing Monitor and Not
100r–10 =200r–100 implies that r=0.9.

Hence, ((0.9, 0.1), (0.5, 0.5)) is a mixed strategy Nash equilibrium by Theorem 2.

最大程度的干擾敵手,不能讓敵手猜測出自己的偏好,讓其沒有一個一定最佳的應對策略。

Prisoners’ Dilemma

這裏假設 Prisoners’ Dilemma 爲一個 Mixed Strategy Game,prisoner 按一定的概率去選擇 mum 還是 confess。
在這裏插入圖片描述
prisoner1:
U1(m, q*) = U1(c, q*)
根據定理,對於 prisoner1 來講,單獨的選擇 m 和 c 帶來收益是一樣的(prisoner2 會控制 q 使得 prisoner1 無法猜出其偏好)
U1(m, q*) = q×(-1)+(1-q)×(-9)= 8q*-9
U1(c, q*) = q×0 +(1-q)×(-6)= 6q*-6
=> 8q*-9 = 6q*-6
=> q* = 3/2
同理求得:r* = 3/2
因爲前提條件是 0≤ q ≤1; 0≤ r ≤1,所以在 Prisoners’ Dilemma 中不存在 Mixed Strategy Nash Equilibrium.

Existence of NE

Any finite game has a (mixed-strategy) NE.

strategy profile x* ∈ X,is called NE if only if,
1、inequality constraints
Ui(xi*, x-i*) >= Ui(xi, x-i*) for all xi ∈ X,all i ∈ N
任何節點沒有動機去改變策略
2、 solution to multivariate function
Ui(xi*, x-i*) = maxUi(xi, x-i*) for all xi ∈ X,all i ∈ N
最佳收益策略
3、fixed point of best response function
xi* ∈ BRi(x-i*) where BRi(x-i*) = maxUi(xi, x-i*)
定點定理

fixed point定理

Brouwer fixed-point theorem: Let S⊂Rn be convex and compact, if T: S -> S is continuous, then there exits a fixed point, that is, there exits x* ∈ S such that x* = T(x*).
S: set is convex and compact, that is, x ∈ S, y ∈ S, 0<α<1 => αx + (1-α)y ∈ S, close and bound.
so, fixed point of best response function means, xa* = BRa(BRb(xa*)).
在 a 的決策空間中針對 b 選擇了一個最佳映射,b 同樣 執行相同的操作。

Proof

We define a finite f over the space of the mixed strategy profile Δ. We will argue that Δ is compact and convex and if f is continuous, hence the sequence defined by Δ0 … Δn => Δn = f(Δn-1) has an accumulated point. We will also argue that every fixed point of f must be a NE.
Δ is clearly compact and convex, since it is Δ = {{Δi}: any i ∈ N, δij ∈ Δi, j ∈ Si, δij≥0, ∑δij = 1}
Δn = f(Δn-1) => NE
The expect utility of player i if he were to play a particular pure strategy s ∈ Si instead of mixed strategy Δi would be
Ui(Si, Δ-i) = ∑∑ Δj Ui(Si, Sj);
Given a mixed strategy profile Δ = ∏ Δi, the expected utility of player i is
Ui(Δ) = ∑∑ Δj Ui(Sj, S-j);
Define Pi(Si, Δ) = Ui(Si, Δ-i) - Ui(Δ);
we define (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi);(如果有一個策略的 pure strategy 的收益高於平均水平,此時會增加該策略的概率來提高平均收益)
=> Δi = (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi)
=> max(Pi(Si, Δ), 0) = ∑max(Pi(Si, Δ), 0)
so, f(Δ) function, there exits fixed point Δ => max(Pi(Si, Δ), 0) = 0
=> Ui(Si, Δ-i) - Ui(Δ) ≤ 0 (NE’s definition),that is Ui(Si, Δ-i) ≤ Ui(Δ)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章