CS188相關

#CS188

uniformed

BFS

  • breadth-first
  • time-complexity:O(bs)O(b^s)
  • sapce-complexity:O(bs)O(b^s)
  • complete:yes
  • optimal:yes(only when the cost = 1)
  • bb is the number of nodes of each layer,$$s is the number of layers(the shallowest layer)

DFS

  • depth-first
  • time-complexity:O(bm)O(b^m)
  • sapce-complexity:O(bm)O(bm)
  • complete:no(may stuck in loop)
  • optimal:no(it will choose the deeper)
  • bb is the number of nodes of each layer,mm is the number of layers(the deepest layer)

Uniformed-cost

  • UCS

  • choose the cheapest solution(from start )

  • time-complexity:O(bCε)O(b^{\frac{C^*}{\varepsilon}})

  • sapce-complexity:O(bCε)O(b^{\frac{C^*}{\varepsilon}})

  • complete:yes

  • optimal:yes

  • bb is the number of nodes of each layer,mm is the number of layers(the deepest layer),ε\varepsilon is the least cost.CC^* is the optimal solution cost.

informed

A*

  • heuristic:estimate how close is to a goal
  • greedy:expand the node closest(form the heuristic)
  • AA^* is greedy(forward) + UCS(backword)(f(n)=g(n)(past)+h(n)(future))
  • condition(optimal):
    • admissible: h(n) <=h(a)(actual)
    • Consistency:h(a)-h(g)<=cost(a to g)
  • propritity

CSP

(constraint satisfaction problems)

  • factors:

    • variables
    • domains
    • constraints
  • vartieties:

  • discrete

  • constinuous

  • solution

    • backtracking:

      • one variable a time
      • check constrains as going
      • dfs +i+ii = backtracking
    • filtering:forward checking(constraint propagation)

      • consistency of a single arc:An arc X \rightarrow Y is consistent iff for every x in the tail there is some y in the head which could be assigned without violating a constraint

        (對X中的每一個,Y都能找到相對應的)

      • consider the order,after backtracking,the domains all meet the constrains,what we need do is to forward and find a solution

    • order:

      • MRV(fewest legal value)
      • LCV(more option)
    • mini-conflicts:

      • assign ignoring the constrains
      • selsect a conflict
      • change to the state with fewest conflicts(hill climb)

Game Trees: (Adversarial Search)

Minimax Expectimax(with purning)

  • complexity : just like dfs
  • process:

在這裏插入圖片描述

  • non-terminal state:linear sum of features.

MDP

value iteration

  • iteration:Vk+1maxasT(s,a,s)[R(s,a,s)+γVk(s)]V_{k+1} \leftarrow max_a\sum_{s^{&#x27;}}T(s,a,s^{&#x27;})[R(s,a,s^{&#x27;})+\gamma V_k(s^{&#x27;})]
  • optimal :V(s)=maxasT(s,a,s)[R(s,a,s)+γV(s)]V^*(s)=max_a\sum_{s^{&#x27;}}T(s,a,s^{&#x27;})[R(s,a,s^{&#x27;})+\gamma V^*(s^{&#x27;})]
  • sum 指的是對於每一個action導致的結果的sum(對於transition 100%成立的不需要考慮sum)
  • max 是指所有action之中的max

policy iteration x

  • step:

    • policy estimation:Vk+1πi(s)sT(s,πi(s),s)[R(s,πi(s),s)+γVkπi(s)]V^{\pi_i}_{k+1}(s) \leftarrow \sum_{s^{&#x27;}}T(s,\pi_i (s),s^{&#x27;})[R(s,\pi_i (s),s^{&#x27;})+\gamma V_k^{\pi_i}(s^{&#x27;})]

    • policy improvement:(one step)

      πi+1(s)=argmaxasT(s,a,s)[R(s,a,s)+γVπi(s)\pi_{i+1}(s)=argmax_a\sum_{s{&#x27;}}T(s,a,s^{&#x27;})[R(s,a,s^{&#x27;})+\gamma V^{\pi_i}(s^{&#x27;})

  • difference:

    • value iteration is just ignoring the policy and iteration
    • while the policy iteration first chose a stable policy,then iteration,then change the policy

Reinforcement learning

  • relate to MDP:
    • states
    • actions(for each state)
    • model(transition:what the result with action and state)
    • reward funciton
    • objective:find the best policy
  • differce :
    • don’t know T or R
  • model based(just count):
    • learn a MDP model from the result we have knowm(count outcomes and normalize the T:probability)
    • solve like MDP
  • model free:
    • Direct evaluation:just add all the reward together
    • Q-learning
    • sarsa

Q learning(off-policy)-optimal

sample=R(s,a,s)+γmaxaQ(s,a)Qk+1(s,a)(1α)Qk(s,a)+(α)[sample] sample = R(s,a,s^{&#x27;})+\gamma max_{a^{&#x27;}}Q(s^{&#x27;},a^{&#x27;})\\ Q_{k+1}(s,a) \leftarrow (1-\alpha)Q_{k}(s,a)+(\alpha)[sample]

  • condition to converge:
    • explore enough
    • learning rate small enough(not decrease too quickly)

exploration function and epsilon greedy

  • ϵgreedy\epsilon greedy :choose the best aciton or not
  • exploration:choose the action whitch is not been visted often(Q before change to f(u,n))

Sarsa(on-policy)

sample=R(s,a,s)+γQ(s,a)Qk+1(s,a)(1α)Qk(s,a)+(α)[sample] sample = R(s,a,s^{&#x27;})+\gamma Q(s^{&#x27;},a^{&#x27;})\\ Q_{k+1}(s,a) \leftarrow (1-\alpha)Q_{k}(s,a)+(\alpha)[sample]

  • update the Q and best action for each state

BN

Representation Independence(D-Seperation)

D-seperation:condition for answering independence:

在這裏插入圖片描述

just need to have inactive triples

Inference(Variable Elimination and Sampling)

probability

  • joint distribution:more than one &
  • marginal distribution:sub or sum from a joint (the number of variable is decrease )
  • chain rule:
    P(x1,x2,x3,...)=iP(xix1,x2,...xi1)P(xix1...xi1)=P(xiparents(xi))P(x_1,x_2,x_3,...) = \prod_iP(x_i|x1,x2,...x_{i-1})\\P(x_i|x_1...x_{i-1})=P(x_i|parents(x_i))

inference

  • select a hidden variable H
  • joint all factors mentioned H
  • eliminate H

sampling

  • step:
    • get sample from uniform [0,1)
    • convert the sample to an outcome with the given probability
  • in baye’s net
    • prior sampling: one by one ,after that,just count the sample and normalize
    • rejection sampling:just ignore the sample not consistent with the evidence
    • likehood weighting:for each sample,for evidence,update w with the table(with the given node choose the probability consistent with the evidnece ),else just sample with its parents.then times the result with w
    • gibbs sampling:fix evidence ,intialize other variables(randomly),then repeat (update only one variable)

Decision Networks / VPI

  • MEU:choose action that will maximizes the utility given the evidnece(maximize the EU )

  • value of information:compute the value of acquiring evidence(compare to no information)

VPI(Ee)=(eP(ee)MEU(e,e))MEU(e)VPI(E^{&#x27;}|e )=(\sum_{e^{&#x27;}}P(e^{&#x27;}|e)MEU(e,e^{&#x27;}))-MEU(e)

  • properties:
    • nonnegative:VPI>=0
    • nonadditive:VPI(Ej,Eke)VPI(Eje)+VPI(Eke)VPI(E_j,E_k|e) \neq VPI(E_j|e) +VPI(E_k|e)
    • order-independtent:VPI(Ej,Eke)=VPI(Eje)+VPI(Eke)+VPI(Eje,Ek)+VPI(Eke,Ei)VPI(E_j,E_k|e) = VPI(E_j|e) +VPI(E_k|e)+ VPI(E_j|e,E_k) +VPI(E_k|e,E_i)

HMM Particle Filtering

  • model:
    • stateXtX_t
    • parameters:transition probabilities
  • stationary distributions:just assume the probabilities and compute
  • hidden mardov models:
    • defined by
    • initial distribution:P(X1)P(X_1)
    • transitions:P(XtXt1)P(X_t|X_{t-1})
    • emissions:P(EtXt)P(E_t|X_t)
    • conditional independence:
      • future depends on past via the present
      • current observation independent of all else given current state

filtering

  • passage of time

Bt(X)=Pt(Xte1,e2,e3,...et)P(Xt+1e1:t)=xtP(Xt+1xt)P(xte1:t)B(Xt+1)=xtP(Xxt)B(xt) B_t(X)=P_t(X_t|e_1,e_2,e_3,...e_t)\\ P(X_{t+1}|e_{1:t})=\sum_{x_t}P(X_{t+1}|x_t)P(x_t|e_{1:t})\\ B^{&#x27;}(X_{t+1})=\sum_{x_t}P(X^{&#x27;}|x_t)B(x_t)\\

  • observation

B(Xt)B(X)P(eX)B(X)meansnormalization B(X_{t})\\ B(X) \rightarrow P(e|X)B^{&#x27;}(X)\\ \rightarrow means \\normalization

  • step(summary)

    • update for time
      P(xte1:t1)=xt1P(xtxt1)P(xt1e1:t1) P(x_{t}|e_{1:t-1})=\sum_{x_{t-1}}P(x_{t}|x_{t-1})P(x_{t-1}|e_{1:t-1})

    • update for evidence

    P(xte1:t)xP(etxt)P(xte1:t1) P(x_{t}|e_{1:t})\rightarrow x P(e_{t}|x_{t})P(x_{t}|e_{1:t-1})\\

Naive Bayes

model:P(yx1,x2...)=P(y|x_1,x_2...)=

y^=argmaxyΠiP(xiy)\hat y=argmax_y\Pi _iP(x_i|y) xix_iis smaple component

**Laplace smoothing **

P(x)=c(x)+kN+kxP(x)=\frac{c(x)+k}{N+k|x|}P(xy)=c(x,y)+kc(y)+kxP(x|y)=\frac{c(x,y)+k}{c(y)+k|x|}

c(x)c(x)numbr of samples in catagory x,N:total samples num.|x|:catagory num

Perceptrons and Logistic Regression

  • how to update:
    • yyy \neq y^*w=w+yf(x)w=w+y^**f(x)(二分類)
    • 多分類
      • yyy \neq y^*w=wyf(x)w=w-y^**f(x)(錯誤分類)
      • wy=wy+yf(x)w_{y^*}=w_{y^*}+y^**f(x)(正確分類)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章