Mixed Strategy Game
每一個博弈者按照一定概率選擇策略。
在某些情況下 Pure Strategy 是不適用的,比如零和博弈、多個納什均衡節點。
A probability distribution for each player.
The distributions are mutual best responses to one another in the sense of expected payoffs.
It is a stochastic steady state.
Solving matching pennies
Player 1’s expected payoffs:
If Player 1 chooses Head, -q+(1-q)=1-2q
If Player 1 chooses Tail, q-(1-q)=2q-1
Player 1’s best response B1(q):
For q<0.5, Head (r=1)
For q>0.5, Tail (r=0)
For q=0.5, indifferent (0≤r≤1)
Player 2’s expected payoffs:
If Player 2 chooses Head, r-(1-r)=2r-1
If Player 2 chooses Tail, -r+(1-r)=1-2r
Player 2’s best response B2®:
For r<0.5, Tail (q=0)
For r>0.5, Head (q=1)
For r=0.5, indifferent (0≤q≤1)
達到 “概率” 的納什均衡。決策不僅取決於對手的策略也同時 取決於每個策略對應的概率。
Example
Expected payoffs: 2 players each with two pure strategies.
Player 1 plays a mixed strategy (r, 1- r ). Player 2 plays a mixed strategy (q, 1- q).
Player 1’s expected payoff of playing s11: EU1(s11, (q, 1-q))=q×u1(s11, s21)+(1-q)×u1(s11, s22)
Player 1’s expected payoff of playing s12: EU1(s12, (q, 1-q))= q×u1(s12, s21)+(1-q)×u1(s12, s22)
Player 1’s expected payoff from her mixed strategy: v1((r, 1-r), (q, 1-q))=r×EU1(s11, (q, 1-q))+(1-r)×EU1(s12, (q, 1-q))
Player 2’s expected payoff of playing s21: EU2(s21, (r, 1-r))=r×u2(s11, s21)+(1-r)×u2(s12, s21)
Player 2’s expected payoff of playing s22: EU2(s22, (r, 1-r))= r×u2(s11, s22)+(1-r)×u2(s12, s22)
Player 2’s expected payoff from her mixed strategy: v2((r, 1-r),(q, 1-q))=q×EU2(s21, (r, 1-r))+(1-q)×EU2(s22, (r, 1-r))
Mixed strategy Nash equilibrium:
A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if (r*,1-r*) is a best response to (q*, 1-q*), and (q*, 1-q*) is a best response to (r*,1-r*). That is,
v1((r*, 1-r*), (q*, 1-q*)) ≥ v1((r, 1-r), (q*, 1-q*)), for all 0≤ r ≤1
v2((r*, 1-r*), (q*, 1-q*)) ≥ v2((r*, 1-r*), (q, 1-q)), for all 0≤ q ≤1
Theorem
Theorem 1
A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if and only if
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s11, (q*, 1-q*))
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s12, (q*, 1-q*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s21, (r*, 1-r*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s22, (r*, 1-r*))
在競爭者使用 mixed strategy 時,選擇使用 mixed strategy 一定比使用單邊 pure strategy 帶來的收益要高。pure strategy 是 mixed strategy 的一個特例,是以 1 的概率選擇策略,很顯然,有更多的選擇肯定要比單一選擇帶來的收益高。
Theorem 2
Let ((r*, 1-r*), (q*, 1-q*)) be a pair of mixed strategies, where 0 <r*<1, 0<q*<1. Then ((r*, 1-r*), (q*, 1-q*)) is a mixed strategy Nash equilibrium if and only if
EU1(s11, (q*, 1-q*)) = EU1(s12, (q*, 1-q*))
EU2(s21, (r*, 1-r*)) = EU2(s22, (r*, 1-r*))
That is, each player is indifferent between her two strategies.
Significance: it gives conditions for a mixed strategy NE in terms of each player’s expected payoffs only to her pure strategies.
Mixed Strategy Nash Equilibrium
Mixed Strategy:
A mixed strategy of a player is a probability distribution over the player’s strategies.
Mixed strategy Nash equilibrium:
A probability distribution for each player
The distributions are mutual best responses to one another in the sense of expected payoffs
Employee Monitoring
Employee’s expected payoff of playing “work”
EU1(Work, (q, 1–q)) = q×50 + (1–q)×50=50
Employee’s expected payoff of playing “shirk”
EU1(Shirk, (q, 1–q)) = q×0 + (1–q)×100=100(1–q)
Employee is indifferent between playing Work and Shirk.
50=100(1–q)
q=1/2
Manager’s expected payoff of playing “Monitor”
EU2(Monitor, (r, 1–r)) = r×90+(1–r)×(-10) =100r–10
Manager’s expected payoff of playing “Not”
EU2(Not, (r, 1–r)) = r×100+(1–r)×(-100) =200r–100
Manager is indifferent between playing Monitor and Not
100r–10 =200r–100 implies that r=0.9.
Hence, ((0.9, 0.1), (0.5, 0.5)) is a mixed strategy Nash equilibrium by Theorem 2.
最大程度的干擾敵手,不能讓敵手猜測出自己的偏好,讓其沒有一個一定最佳的應對策略。
Prisoners’ Dilemma
這裏假設 Prisoners’ Dilemma 爲一個 Mixed Strategy Game,prisoner 按一定的概率去選擇 mum 還是 confess。
prisoner1:
U1(m, q*) = U1(c, q*)
根據定理,對於 prisoner1 來講,單獨的選擇 m 和 c 帶來收益是一樣的(prisoner2 會控制 q 使得 prisoner1 無法猜出其偏好)
U1(m, q*) = q×(-1)+(1-q)×(-9)= 8q*-9
U1(c, q*) = q×0 +(1-q)×(-6)= 6q*-6
=> 8q*-9 = 6q*-6
=> q* = 3/2
同理求得:r* = 3/2
因爲前提條件是 0≤ q ≤1; 0≤ r ≤1,所以在 Prisoners’ Dilemma 中不存在 Mixed Strategy Nash Equilibrium.
Existence of NE
Any finite game has a (mixed-strategy) NE.
strategy profile x* ∈ X,is called NE if only if,
1、inequality constraints
Ui(xi*, x-i*) >= Ui(xi, x-i*) for all xi ∈ X,all i ∈ N
任何節點沒有動機去改變策略
2、 solution to multivariate function
Ui(xi*, x-i*) = maxUi(xi, x-i*) for all xi ∈ X,all i ∈ N
最佳收益策略
3、fixed point of best response function
xi* ∈ BRi(x-i*) where BRi(x-i*) = maxUi(xi, x-i*)
定點定理
fixed point定理
Brouwer fixed-point theorem: Let S⊂Rn be convex and compact, if T: S -> S is continuous, then there exits a fixed point, that is, there exits x* ∈ S such that x* = T(x*).
S: set is convex and compact, that is, x ∈ S, y ∈ S, 0<α<1 => αx + (1-α)y ∈ S, close and bound.
so, fixed point of best response function means, xa* = BRa(BRb(xa*)).
在 a 的決策空間中針對 b 選擇了一個最佳映射,b 同樣 執行相同的操作。
Proof
We define a finite f over the space of the mixed strategy profile Δ. We will argue that Δ is compact and convex and if f is continuous, hence the sequence defined by Δ0 … Δn => Δn = f(Δn-1) has an accumulated point. We will also argue that every fixed point of f must be a NE.
Δ is clearly compact and convex, since it is Δ = {{Δi}: any i ∈ N, δij ∈ Δi, j ∈ Si, δij≥0, ∑δij = 1}
Δn = f(Δn-1) => NE
The expect utility of player i if he were to play a particular pure strategy s ∈ Si instead of mixed strategy Δi would be
Ui(Si, Δ-i) = ∑∑ Δj Ui(Si, Sj);
Given a mixed strategy profile Δ = ∏ Δi, the expected utility of player i is
Ui(Δ) = ∑∑ Δj Ui(Sj, S-j);
Define Pi(Si, Δ) = Ui(Si, Δ-i) - Ui(Δ);
we define (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi);(如果有一個策略的 pure strategy 的收益高於平均水平,此時會增加該策略的概率來提高平均收益)
=> Δi = (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi)
=> max(Pi(Si, Δ), 0) = ∑max(Pi(Si, Δ), 0)
so, f(Δ) function, there exits fixed point Δ => max(Pi(Si, Δ), 0) = 0
=> Ui(Si, Δ-i) - Ui(Δ) ≤ 0 (NE’s definition),that is Ui(Si, Δ-i) ≤ Ui(Δ)