CS188相關

#CS188

uniformed

BFS

breadth-first
time-complexity: $O(b^s)$
sapce-complexity: $O(b^s)$
complete:yes
optimal:yes(only when the cost = 1)
$b$ is the number of nodes of each layer,$$s is the number of layers(the shallowest layer)

DFS

depth-first
time-complexity: $O(b^m)$
sapce-complexity: $O(bm)$
complete:no(may stuck in loop)
optimal:no(it will choose the deeper)
$b$ is the number of nodes of each layer, $m$ is the number of layers(the deepest layer)

Uniformed-cost

UCS
choose the cheapest solution(from start )
time-complexity: $O(b^{\frac{C^*}{\varepsilon}})$
sapce-complexity: $O(b^{\frac{C^*}{\varepsilon}})$
complete:yes
optimal:yes
$b$ is the number of nodes of each layer, $m$ is the number of layers(the deepest layer), $\varepsilon$ is the least cost. $C^*$ is the optimal solution cost.

informed

A*

heuristic:estimate how close is to a goal
greedy:expand the node closest(form the heuristic)
$A^*$ is greedy(forward) + UCS(backword)(f(n)=g(n)(past)+h(n)(future))
condition(optimal):
- admissible: h(n) <=h(a)(actual)
- Consistency:h(a)-h(g)<=cost(a to g)
propritity

CSP

(constraint satisfaction problems)

factors:
- variables
- domains
- constraints
vartieties:
discrete
constinuous
solution
- backtracking:
  - one variable a time
  - check constrains as going
  - dfs +i+ii = backtracking
- filtering:forward checking(constraint propagation)
  - consistency of a single arc:An arc X $\rightarrow$ Y is consistent iff for every x in the tail there is some y in the head which could be assigned without violating a constraint
    
    （對X中的每一個，Y都能找到相對應的）
  - consider the order,after backtracking,the domains all meet the constrains,what we need do is to forward and find a solution
- order：
  - MRV(fewest legal value)
  - LCV(more option)
- mini-conflicts:
  - assign ignoring the constrains
  - selsect a conflict
  - change to the state with fewest conflicts(hill climb)

Game Trees: (Adversarial Search)

Minimax Expectimax（with purning）

complexity : just like dfs
process:

non-terminal state:linear sum of features.

MDP

value iteration

iteration: $V_{k+1} \leftarrow max_a\sum_{s^{'}}T(s,a,s^{'})[R(s,a,s^{'})+\gamma V_k(s^{'})]$
optimal : $V^*(s)=max_a\sum_{s^{'}}T(s,a,s^{'})[R(s,a,s^{'})+\gamma V^*(s^{'})]$
sum 指的是對於每一個action導致的結果的sum（對於transition 100%成立的不需要考慮sum）
max 是指所有action之中的max

policy iteration x

step:
- policy estimation: $V^{\pi_i}_{k+1}(s) \leftarrow \sum_{s^{'}}T(s,\pi_i (s),s^{'})[R(s,\pi_i (s),s^{'})+\gamma V_k^{\pi_i}(s^{'})]$
- policy improvement:(one step)
  
  $\pi_{i+1}(s)=argmax_a\sum_{s{'}}T(s,a,s^{'})[R(s,a,s^{'})+\gamma V^{\pi_i}(s^{'})$
difference:
- value iteration is just ignoring the policy and iteration
- while the policy iteration first chose a stable policy,then iteration,then change the policy

Reinforcement learning

relate to MDP:
- states
- actions(for each state)
- model(transition:what the result with action and state)
- reward funciton
- objective:find the best policy
differce :
- don’t know T or R
model based(just count):
- learn a MDP model from the result we have knowm(count outcomes and normalize the T:probability)
- solve like MDP
model free:
- Direct evaluation:just add all the reward together
- Q-learning
- sarsa

Q learning(off-policy)-optimal

$sample = R(s,a,s^{'})+\gamma max_{a^{'}}Q(s^{'},a^{'})\\ Q_{k+1}(s,a) \leftarrow (1-\alpha)Q_{k}(s,a)+(\alpha)[sample]$

condition to converge:
- explore enough
- learning rate small enough(not decrease too quickly)

exploration function and epsilon greedy

$\epsilon greedy$ :choose the best aciton or not
exploration:choose the action whitch is not been visted often(Q before change to f(u,n))

Sarsa(on-policy)

$sample = R(s,a,s^{'})+\gamma Q(s^{'},a^{'})\\ Q_{k+1}(s,a) \leftarrow (1-\alpha)Q_{k}(s,a)+(\alpha)[sample]$

update the Q and best action for each state

BN

Representation Independence(D-Seperation)

D-seperation:condition for answering independence:

just need to have inactive triples

Inference(Variable Elimination and Sampling)

probability

joint distribution：more than one &
marginal distribution：sub or sum from a joint (the number of variable is decrease )
chain rule:
$P(x_1,x_2,x_3,...) = \prod_iP(x_i|x1,x2,...x_{i-1})\\P(x_i|x_1...x_{i-1})=P(x_i|parents(x_i))$

inference

select a hidden variable H
joint all factors mentioned H
eliminate H

sampling

step:
- get sample from uniform [0,1)
- convert the sample to an outcome with the given probability
in baye’s net
- prior sampling: one by one ,after that,just count the sample and normalize
- rejection sampling:just ignore the sample not consistent with the evidence
- likehood weighting:for each sample,for evidence,update w with the table(with the given node choose the probability consistent with the evidnece ),else just sample with its parents.then times the result with w
- gibbs sampling:fix evidence ,intialize other variables(randomly),then repeat (update only one variable)

Decision Networks / VPI

MEU：choose action that will maximizes the utility given the evidnece(maximize the EU )
value of information:compute the value of acquiring evidence(compare to no information)

$VPI(E^{'}|e )=(\sum_{e^{'}}P(e^{'}|e)MEU(e,e^{'}))-MEU(e)$

properties:
- nonnegative:VPI>=0
- nonadditive: $VPI(E_j,E_k|e) \neq VPI(E_j|e) +VPI(E_k|e)$
- order-independtent: $VPI(E_j,E_k|e) = VPI(E_j|e) +VPI(E_k|e)+ VPI(E_j|e,E_k) +VPI(E_k|e,E_i)$

HMM Particle Filtering

model:
- state $X_t$
- parameters:transition probabilities
stationary distributions:just assume the probabilities and compute
hidden mardov models:
- defined by
- initial distribution: $P(X_1)$
- transitions: $P(X_t|X_{t-1})$
- emissions: $P(E_t|X_t)$
- conditional independence:
  - future depends on past via the present
  - current observation independent of all else given current state

filtering

passage of time

$B_t(X)=P_t(X_t|e_1,e_2,e_3,...e_t)\\ P(X_{t+1}|e_{1:t})=\sum_{x_t}P(X_{t+1}|x_t)P(x_t|e_{1:t})\\ B^{'}(X_{t+1})=\sum_{x_t}P(X^{'}|x_t)B(x_t)\\$

observation

$B(X_{t})\\ B(X) \rightarrow P(e|X)B^{'}(X)\\ \rightarrow means \\normalization$

step(summary)
- update for time
  $P(x_{t}|e_{1:t-1})=\sum_{x_{t-1}}P(x_{t}|x_{t-1})P(x_{t-1}|e_{1:t-1})$
- update for evidence
$P(x_{t}|e_{1:t})\rightarrow x P(e_{t}|x_{t})P(x_{t}|e_{1:t-1})\\$

Naive Bayes

model: $P(y|x_1,x_2...)=$

$\hat y=argmax_y\Pi _iP(x_i|y)$ $x_i$ is smaple component

**Laplace smoothing **

$P(x)=\frac{c(x)+k}{N+k|x|}$ $P(x|y)=\frac{c(x,y)+k}{c(y)+k|x|}$

$c(x)$ numbr of samples in catagory x,N:total samples num.|x|:catagory num

Perceptrons and Logistic Regression

how to update:
- $y \neq y^*$ 則 $w=w+y^**f(x)$ (二分類)
- 多分類
  - $y \neq y^*$ 則 $w=w-y^**f(x)$ (錯誤分類)
  - $w_{y^*}=w_{y^*}+y^**f(x)$ （正確分類）