吳恩達機器學習學習筆記——Week 1——Introduction

在網易雲課堂和Coursera上學完了一遍吳恩達機器學習,受益匪淺,特此記錄一下學習筆記,權當複習。

爲什麼要同時使用網易雲課堂和Coursera? 主要是想結合兩者的優點:網易雲課堂(https://study.163.com/course/introduction/1210076550.htm)看教學視頻,帶雙語字幕;Coursera(https://www.coursera.org/learn/machine-learning/home/welcome)上則可以做課堂練習、單元測試、編程作業。

在此,先允許我向吳恩達老師、Coursera、網易雲課堂致以誠摯的謝意!

 

一、課件及課堂練習

1. 機器學習應用示例:數據挖掘、不能顯式編程的應用場合、自適應(推薦)系統

 

2. 機器學習的定義

 

3. 機器學習算法: 監督學習、非監督學習、強化學習

 

4.  監督學習:示例1 —— 房價的預測(迴歸法)

 

5. 監督學習:示例2 —— 疾病診斷(分類法)

 

6. 非監督學習:數據集中不含標記

 

7. 非監督學習應用示例——新聞自動分組(如Google News)、基因學研究、計算集羣的管理、社交網絡分析、市場劃分、天文數據分析、語音識別、

 

二、內容概要

1. 機器學習:通過經驗E提升性能P,以更好的完成任務T

What is Machine Learning?

Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed." This is an older, informal definition.

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:

Supervised learning and Unsupervised learning.

 

2. 監督學習:迴歸、分類,數據集含有標記

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Example 1:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.

Example 2:

(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

 

3. 非監督學習:聚類、嘈雜環境下語音識別問題(即,雞尾酒會(Cocktail Party)問題),數據集不含標記

Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).

 

三、單元測試

(1). A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E.

Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. In this setting, what is T?

  1. The weather prediction task.
  2. The process of the algorithm examining a large amount of historical weather data.
  3. None of these.
  4. The probability of it correctly predicting a future date's weather.

(2). The amount of rain that falls in a day is usually measured in either millimeters (mm) or inches. Suppose you use a learning algorithm to predict how much rain will fall tomorrow.

Would you treat this as a classification or a regression problem?

  1. Classification
  2. Regression

(3). Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy). Would you treat this as a classification or a regression problem?

  1. Regression
  2. Classification

(4). Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.

  1. Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of the treatment, side effects, etc.), discover whether there are different categories or "types" of patients in terms of how they respond to the drug, and if so what these categories are.
  2. Examine a web page, and classify whether the content on the web page should be considered "child friendly" (e.g., non-pornographic, etc.) or "adult."
  3. Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments.
  4. In farming, given data on crop yields over the last 50 years, learn to predict next year's crop yields.

5. Which of these is a reasonable definition of machine learning?

  1. Machine learning is the field of allowing robots to act intelligently.
  2. Machine learning learns from labeled data.
  3. Machine learning is the science of programming computers.
  4. Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章