#CS188
uniformed
BFS
- breadth-first
- time-complexity:
- sapce-complexity:
- complete:yes
- optimal:yes(only when the cost = 1)
- is the number of nodes of each layer,$$s is the number of layers(the shallowest layer)
DFS
- depth-first
- time-complexity:
- sapce-complexity:
- complete:no(may stuck in loop)
- optimal:no(it will choose the deeper)
- is the number of nodes of each layer, is the number of layers(the deepest layer)
Uniformed-cost
-
UCS
-
choose the cheapest solution(from start )
-
time-complexity:
-
sapce-complexity:
-
complete:yes
-
optimal:yes
-
is the number of nodes of each layer, is the number of layers(the deepest layer), is the least cost. is the optimal solution cost.
informed
A*
- heuristic:estimate how close is to a goal
- greedy:expand the node closest(form the heuristic)
- is greedy(forward) + UCS(backword)(f(n)=g(n)(past)+h(n)(future))
- condition(optimal):
- admissible: h(n) <=h(a)(actual)
- Consistency:h(a)-h(g)<=cost(a to g)
- propritity
CSP
(constraint satisfaction problems)
-
factors:
- variables
- domains
- constraints
-
vartieties:
-
discrete
-
constinuous
-
solution
-
backtracking:
- one variable a time
- check constrains as going
- dfs +i+ii = backtracking
-
filtering:forward checking(constraint propagation)
-
consistency of a single arc:An arc X Y is consistent iff for every x in the tail there is some y in the head which could be assigned without violating a constraint
(對X中的每一個,Y都能找到相對應的)
-
consider the order,after backtracking,the domains all meet the constrains,what we need do is to forward and find a solution
-
-
order:
- MRV(fewest legal value)
- LCV(more option)
-
mini-conflicts:
- assign ignoring the constrains
- selsect a conflict
- change to the state with fewest conflicts(hill climb)
-
Game Trees: (Adversarial Search)
Minimax Expectimax(with purning)
- complexity : just like dfs
- process:
- non-terminal state:linear sum of features.
MDP
value iteration
- iteration:
- optimal :
- sum 指的是對於每一個action導致的結果的sum(對於transition 100%成立的不需要考慮sum)
- max 是指所有action之中的max
policy iteration x
-
step:
-
policy estimation:
-
policy improvement:(one step)
-
-
difference:
- value iteration is just ignoring the policy and iteration
- while the policy iteration first chose a stable policy,then iteration,then change the policy
Reinforcement learning
- relate to MDP:
- states
- actions(for each state)
- model(transition:what the result with action and state)
- reward funciton
- objective:find the best policy
- differce :
- don’t know T or R
- model based(just count):
- learn a MDP model from the result we have knowm(count outcomes and normalize the T:probability)
- solve like MDP
- model free:
- Direct evaluation:just add all the reward together
- Q-learning
- sarsa
Q learning(off-policy)-optimal
- condition to converge:
- explore enough
- learning rate small enough(not decrease too quickly)
exploration function and epsilon greedy
- :choose the best aciton or not
- exploration:choose the action whitch is not been visted often(Q before change to f(u,n))
Sarsa(on-policy)
- update the Q and best action for each state
BN
Representation Independence(D-Seperation)
D-seperation:condition for answering independence:
just need to have inactive triples
Inference(Variable Elimination and Sampling)
probability
- joint distribution:more than one &
- marginal distribution:sub or sum from a joint (the number of variable is decrease )
- chain rule:
inference
- select a hidden variable H
- joint all factors mentioned H
- eliminate H
sampling
- step:
- get sample from uniform [0,1)
- convert the sample to an outcome with the given probability
- in baye’s net
- prior sampling: one by one ,after that,just count the sample and normalize
- rejection sampling:just ignore the sample not consistent with the evidence
- likehood weighting:for each sample,for evidence,update w with the table(with the given node choose the probability consistent with the evidnece ),else just sample with its parents.then times the result with w
- gibbs sampling:fix evidence ,intialize other variables(randomly),then repeat (update only one variable)
Decision Networks / VPI
-
MEU:choose action that will maximizes the utility given the evidnece(maximize the EU )
-
value of information:compute the value of acquiring evidence(compare to no information)
- properties:
- nonnegative:VPI>=0
- nonadditive:
- order-independtent:
HMM Particle Filtering
- model:
- state
- parameters:transition probabilities
- stationary distributions:just assume the probabilities and compute
- hidden mardov models:
- defined by
- initial distribution:
- transitions:
- emissions:
- conditional independence:
- future depends on past via the present
- current observation independent of all else given current state
filtering
- passage of time
- observation
-
step(summary)
-
update for time
-
update for evidence
-
Naive Bayes
model:
is smaple component
**Laplace smoothing **
numbr of samples in catagory x,N:total samples num.|x|:catagory num
Perceptrons and Logistic Regression
- how to update:
- 則(二分類)
- 多分類
- 則(錯誤分類)
- (正確分類)