Algorithm: KNN for Regression Problem

The KNN algorithm can not only used in the classify problem, it also can be used for the value prediction as regression problem.

Predict for the salary:

such as K = 3

calculate the distance between all of the sample, and choose the nearest K=3 sample. and calculate the mean value.

Example:

Feature Means:

這裏給出了對於數據的簡單描述。
Ask Price字段是我們要預測的值,即二手車的估價。

Brand爲車輛的牌子。

Type指的是它的發動機類型。

Color字段爲車輛外觀顏色。

Construction Year字段爲車子生產年份。

Odometer爲儀表盤已經行駛的里程數。

DaysUntilMOT指的是自從上一次的保養過了多久

HP字段代表的是馬力。 "

Using the KNN for Regression example:

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# read data
df = pd.read_csv('data.csv')
df # data frame

Output:

# feature processing
# one hot endocding for 'Color
df_colors = df['Color'].str.get_dummies().add_prefix('Color: ')
# one hot encoding for 'Type'
df_type = df['Type'].apply(str).str.get_dummies().add_prefix('Type: ')
# add on hot encoding column
df = pd.concat([df, df_colors, df_type], axis = 1)
# remove the original column before the one hot encoding.
df = df.drop(['Brand', 'Type', 'Color'], axis = 1)

df

 

# data convert
matrix = df.corr()
f, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(matrix, square=True)
plt.title('Car Price Variables')

 

from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
import numpy as np

# get input data X and lablels Y
X = df[['Construction Year', 'Days Until MOT', 'Odometer']]
y = df['Ask Price'].values.reshape(-1, 1)
# split train, test data set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 41)

# data normalization
X_normalizer = StandardScaler() # N(0, 1)
X_train = X_normalizer.fit_transform(X_train)
X_test = X_normalizer.transform(X_test)

y_normalizer = StandardScaler()
y_train = y_normalizer.fit_transform(y_train)
y_test = y_normalizer.transform(y_test)

knn = KNeighborsRegressor(n_heighbors = 2)
knn.fit(X_train, y_train.ravel())

# Now we can predict prices:
y_pred = knn.predict(X_test)
y_pred_inv = y_normalizer.inverse_transform(y_pred)
y_test_inv = y_normalizer.inverse_transform(y_test)

# Build a plot
plt.scatter(y_pred_inv, y_test_inv)
plt.xlabel('Prediction')
plt.ylabel('Real value')

# Now add the perfect prediction line
diagonal = np.linspace(500, 1500, 100)
plt.plot(diagonal, diagonal, '-r')
plt.xlabel('Predicted ask price')
plt.ylabel('Ask price')
plt.show()

print(y_pred_inv)
knn

 

 

[1199. 1199.  700.  899.]
KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
                    metric_params=None, n_jobs=None, n_neighbors=2, p=2,
                    weights='uniform')
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章