http://pan.baidu.com/s/1bnyCFIB
Question 1
You suspect that the SVM is underfitting (==high bais)your dataset. Should you try increasing or decreasing C? Increasing or decreasing σ2?
Your Answer | Score | Explanation | |
---|---|---|---|
It would be reasonable to try decreasing C. It would also be reasonable to try decreasing σ2. | |||
It would be reasonable to try increasing C. It would also be reasonable to try decreasing σ2. | Correct | 1.00 | The figure shows a decision boundary that is underfit to the training set, so we'd like to lower the bias / increase the variance of the SVM. We can do so by either increasing the parameter C or decreasing σ2. |
It would be reasonable to try increasing C. It would also be reasonable to try increasing σ2. | |||
It would be reasonable to try decreasing C. It would also be reasonable to try increasing σ2. | |||
Total | 1.00 / 1.00 |
Question 2
Which of the following is a plot of f1 when σ2=0.25?
Your Answer | Score | Explanation | |
---|---|---|---|
Correct | 1.00 | This figure shows a "narrower" Gaussian kernel centered at the same location which is the effect of decreasing σ2. | |
Total | 1.00 / 1.00 |
Question 3
The first term in the objective is: C∑mi=1y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i)). This first term will be zero if two of the following four conditions hold true. Which are the two conditions that would guarantee that this term equals zero?
Your Answer | Score | Explanation | |
---|---|---|---|
For every example with y(i)=0, we have that θTx(i)≤0. | Correct | 0.25 | cost0(θTx(i)) is still non-zero for inputs between -1 and 0, so being less than or equal to 0 is insufficient. |
For every example with y(i)=0, we have that θTx(i)≤−1. | Correct | 0.25 | For examples with y(i)=0, only the cost0(θTx(i)) term is present. As you can see in the graph, this will be zero for all inputs less than or equal to -1. |
For every example with y(i)=1, we have that θTx(i)≥0. | Correct | 0.25 | cost1(θTx(i)) is still non-zero for inputs between 0 and 1, so being greater than or equal to 0 is insufficient. |
For every example with y(i)=1, we have that θTx(i)≥1. | Correct | 0.25 | For examples with y(i)=1, only the cost1(θTx(i)) term is present. As you can see in the graph, this will be zero for all inputs greater than or equal to 1. |
Total | 1.00 / 1.00 |
Question 4
Your Answer | Score | Explanation | |
---|---|---|---|
Use a different optimization method since using gradient descent to train logistic regression might result in a local minimum. | Correct | 0.25 | The logistic regression cost function is convex, so gradient descent will always find the global minimum. |
Try using a neural network with a large number of hidden units. | Correct | 0.25 | A neural network with many hidden units is a more complex (higher variance) model than logistic regression, so it is less likely to underfit the data. |
Reduce the number of examples in the training set. | Correct | 0.25 | While you can improve accuracy on the training set by removing examples, doing so results in a worse model that will not generalize as well. |
Use an SVM with a Gaussian Kernel. | Correct | 0.25 | By using a Gaussian kernel, your model will have greater complexity and can avoid underfitting the data. |
Total | 1.00 / 1.00 |
總結: 當feature較多,訓練集合較大,出現欠擬合,使用神經網絡和帶有高斯核的支持向量機
Question 5
Your Answer | Score | Explanation | |
---|---|---|---|
If the data are linearly separable, an SVM using a linear kernel willreturn the same parameters θ regardless of the chosen value ofC (i.e., the resulting value of θ does not depend on C). | Correct | 0.25 | A linearly separable dataset can usually be separated by many different lines. Varying the parameter C will cause the SVM's decision boundary to vary among these possibilities. For example, for a very large value of C, it might learn larger values of θ in order to increase the margin on certain examples. |
Suppose you are using SVMs to do multi-class classification andwould like to use the one-vs-all approach. If you have K differentclasses, you will train K - 1 different SVMs. | Correct | 0.25 | The one-vs-all method requires that we have a separate classifier for every class, so you will train K different SVMs. |
The maximum value of the Gaussian kernel (i.e., sim(x,l(1))) is 1. | Correct | 0.25 | When x=l(1), the Gaussian kernel has value exp(0)=1, and it is less than 1 otherwise. |
It is important to perform feature normalization before using the Gaussian kernel. | Correct | 0.25 | The similarity measure used by the Gaussian kernel expects that the data lie in approximately the same range. |