Mixed Strategy Game
每一个博弈者按照一定概率选择策略。
在某些情况下 Pure Strategy 是不适用的,比如零和博弈、多个纳什均衡节点。
A probability distribution for each player.
The distributions are mutual best responses to one another in the sense of expected payoffs.
It is a stochastic steady state.
Solving matching pennies
Player 1’s expected payoffs:
If Player 1 chooses Head, -q+(1-q)=1-2q
If Player 1 chooses Tail, q-(1-q)=2q-1
Player 1’s best response B1(q):
For q<0.5, Head (r=1)
For q>0.5, Tail (r=0)
For q=0.5, indifferent (0≤r≤1)
Player 2’s expected payoffs:
If Player 2 chooses Head, r-(1-r)=2r-1
If Player 2 chooses Tail, -r+(1-r)=1-2r
Player 2’s best response B2®:
For r<0.5, Tail (q=0)
For r>0.5, Head (q=1)
For r=0.5, indifferent (0≤q≤1)
达到 “概率” 的纳什均衡。决策不仅取决于对手的策略也同时 取决于每个策略对应的概率。
Example
Expected payoffs: 2 players each with two pure strategies.
Player 1 plays a mixed strategy (r, 1- r ). Player 2 plays a mixed strategy (q, 1- q).
Player 1’s expected payoff of playing s11: EU1(s11, (q, 1-q))=q×u1(s11, s21)+(1-q)×u1(s11, s22)
Player 1’s expected payoff of playing s12: EU1(s12, (q, 1-q))= q×u1(s12, s21)+(1-q)×u1(s12, s22)
Player 1’s expected payoff from her mixed strategy: v1((r, 1-r), (q, 1-q))=r×EU1(s11, (q, 1-q))+(1-r)×EU1(s12, (q, 1-q))
Player 2’s expected payoff of playing s21: EU2(s21, (r, 1-r))=r×u2(s11, s21)+(1-r)×u2(s12, s21)
Player 2’s expected payoff of playing s22: EU2(s22, (r, 1-r))= r×u2(s11, s22)+(1-r)×u2(s12, s22)
Player 2’s expected payoff from her mixed strategy: v2((r, 1-r),(q, 1-q))=q×EU2(s21, (r, 1-r))+(1-q)×EU2(s22, (r, 1-r))
Mixed strategy Nash equilibrium:
A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if (r*,1-r*) is a best response to (q*, 1-q*), and (q*, 1-q*) is a best response to (r*,1-r*). That is,
v1((r*, 1-r*), (q*, 1-q*)) ≥ v1((r, 1-r), (q*, 1-q*)), for all 0≤ r ≤1
v2((r*, 1-r*), (q*, 1-q*)) ≥ v2((r*, 1-r*), (q, 1-q)), for all 0≤ q ≤1
Theorem
Theorem 1
A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if and only if
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s11, (q*, 1-q*))
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s12, (q*, 1-q*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s21, (r*, 1-r*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s22, (r*, 1-r*))
在竞争者使用 mixed strategy 时,选择使用 mixed strategy 一定比使用单边 pure strategy 带来的收益要高。pure strategy 是 mixed strategy 的一个特例,是以 1 的概率选择策略,很显然,有更多的选择肯定要比单一选择带来的收益高。
Theorem 2
Let ((r*, 1-r*), (q*, 1-q*)) be a pair of mixed strategies, where 0 <r*<1, 0<q*<1. Then ((r*, 1-r*), (q*, 1-q*)) is a mixed strategy Nash equilibrium if and only if
EU1(s11, (q*, 1-q*)) = EU1(s12, (q*, 1-q*))
EU2(s21, (r*, 1-r*)) = EU2(s22, (r*, 1-r*))
That is, each player is indifferent between her two strategies.
Significance: it gives conditions for a mixed strategy NE in terms of each player’s expected payoffs only to her pure strategies.
Mixed Strategy Nash Equilibrium
Mixed Strategy:
A mixed strategy of a player is a probability distribution over the player’s strategies.
Mixed strategy Nash equilibrium:
A probability distribution for each player
The distributions are mutual best responses to one another in the sense of expected payoffs
Employee Monitoring
Employee’s expected payoff of playing “work”
EU1(Work, (q, 1–q)) = q×50 + (1–q)×50=50
Employee’s expected payoff of playing “shirk”
EU1(Shirk, (q, 1–q)) = q×0 + (1–q)×100=100(1–q)
Employee is indifferent between playing Work and Shirk.
50=100(1–q)
q=1/2
Manager’s expected payoff of playing “Monitor”
EU2(Monitor, (r, 1–r)) = r×90+(1–r)×(-10) =100r–10
Manager’s expected payoff of playing “Not”
EU2(Not, (r, 1–r)) = r×100+(1–r)×(-100) =200r–100
Manager is indifferent between playing Monitor and Not
100r–10 =200r–100 implies that r=0.9.
Hence, ((0.9, 0.1), (0.5, 0.5)) is a mixed strategy Nash equilibrium by Theorem 2.
最大程度的干扰敌手,不能让敌手猜测出自己的偏好,让其没有一个一定最佳的应对策略。
Prisoners’ Dilemma
这里假设 Prisoners’ Dilemma 为一个 Mixed Strategy Game,prisoner 按一定的概率去选择 mum 还是 confess。
prisoner1:
U1(m, q*) = U1(c, q*)
根据定理,对于 prisoner1 来讲,单独的选择 m 和 c 带来收益是一样的(prisoner2 会控制 q 使得 prisoner1 无法猜出其偏好)
U1(m, q*) = q×(-1)+(1-q)×(-9)= 8q*-9
U1(c, q*) = q×0 +(1-q)×(-6)= 6q*-6
=> 8q*-9 = 6q*-6
=> q* = 3/2
同理求得:r* = 3/2
因为前提条件是 0≤ q ≤1; 0≤ r ≤1,所以在 Prisoners’ Dilemma 中不存在 Mixed Strategy Nash Equilibrium.
Existence of NE
Any finite game has a (mixed-strategy) NE.
strategy profile x* ∈ X,is called NE if only if,
1、inequality constraints
Ui(xi*, x-i*) >= Ui(xi, x-i*) for all xi ∈ X,all i ∈ N
任何节点没有动机去改变策略
2、 solution to multivariate function
Ui(xi*, x-i*) = maxUi(xi, x-i*) for all xi ∈ X,all i ∈ N
最佳收益策略
3、fixed point of best response function
xi* ∈ BRi(x-i*) where BRi(x-i*) = maxUi(xi, x-i*)
定点定理
fixed point定理
Brouwer fixed-point theorem: Let S⊂Rn be convex and compact, if T: S -> S is continuous, then there exits a fixed point, that is, there exits x* ∈ S such that x* = T(x*).
S: set is convex and compact, that is, x ∈ S, y ∈ S, 0<α<1 => αx + (1-α)y ∈ S, close and bound.
so, fixed point of best response function means, xa* = BRa(BRb(xa*)).
在 a 的决策空间中针对 b 选择了一个最佳映射,b 同样 执行相同的操作。
Proof
We define a finite f over the space of the mixed strategy profile Δ. We will argue that Δ is compact and convex and if f is continuous, hence the sequence defined by Δ0 … Δn => Δn = f(Δn-1) has an accumulated point. We will also argue that every fixed point of f must be a NE.
Δ is clearly compact and convex, since it is Δ = {{Δi}: any i ∈ N, δij ∈ Δi, j ∈ Si, δij≥0, ∑δij = 1}
Δn = f(Δn-1) => NE
The expect utility of player i if he were to play a particular pure strategy s ∈ Si instead of mixed strategy Δi would be
Ui(Si, Δ-i) = ∑∑ Δj Ui(Si, Sj);
Given a mixed strategy profile Δ = ∏ Δi, the expected utility of player i is
Ui(Δ) = ∑∑ Δj Ui(Sj, S-j);
Define Pi(Si, Δ) = Ui(Si, Δ-i) - Ui(Δ);
we define (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi);(如果有一个策略的 pure strategy 的收益高于平均水平,此时会增加该策略的概率来提高平均收益)
=> Δi = (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi)
=> max(Pi(Si, Δ), 0) = ∑max(Pi(Si, Δ), 0)
so, f(Δ) function, there exits fixed point Δ => max(Pi(Si, Δ), 0) = 0
=> Ui(Si, Δ-i) - Ui(Δ) ≤ 0 (NE’s definition),that is Ui(Si, Δ-i) ≤ Ui(Δ)