Kaggle競賽PetFinder日記，第一次提交：簡單粗暴先裸奔一下

原創

超级大笨狼

2019-01-25 17:26

競賽題目見：

https://www.kaggle.com/c/petfinder-adoption-prediction

第一次提交：簡單粗暴先裸奔一下隨機森林可以得0.4分，先提交一個結果看看能得幾分,0.287,哈哈，排名第635，不經特徵工程，這樣已經把我的第一次Kaggle流程跑通了，休息一下，回頭繼續努力。

#!/usr/bin/python
# -*- coding: utf-8 -*-

import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
import warnings

warnings.filterwarnings("ignore")

train = pd.read_csv("../input/train/train.csv")

# 填充缺失值
from sklearn.preprocessing import Imputer

imp = Imputer(missing_values="NaN", strategy="median", axis=0)
train["AdoptionSpeed"] = imp.fit_transform(train[["AdoptionSpeed"]])
train["AdoptionSpeed"] = train["AdoptionSpeed"].astype(int)

# 選取一些特徵作爲我們劃分的依據
x = train[['Type', 'Age', 'Breed1', 'Breed2', 'Gender', 'Color1', 'Color2', 'Color3',
           'MaturitySize', 'FurLength', 'Vaccinated', 'Dewormed', 'Sterilized', 'Health'
    , 'Quantity', 'Fee', 'State', 'VideoAmt', 'PhotoAmt']]
y = train['AdoptionSpeed']

# grouped = x['Type'].groupby(x["Type"])
# print(grouped.count())
# print( y.value_counts())

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

# 使用決策樹
dtc = DecisionTreeClassifier()
dtc.fit(x_train, y_train)
dt_predict = dtc.predict(x_test)
print(dtc.score(x_test, y_test))
print(classification_report(y_test, dt_predict, target_names=["0", "1", "2", "3", "4"]))

# 使用隨機森林

rfc = RandomForestClassifier(n_estimators=100)
rfc.fit(x_train, y_train)
rfc_y_predict = rfc.predict(x_test)
print(rfc.score(x_test, y_test))

# 使用隨機森林全量學習和全量預測
rfc = RandomForestClassifier(n_estimators=100)
rfc.fit(x, y)
test = pd.read_csv("../input/test/test.csv")
x_test = test[['Type', 'Age', 'Breed1', 'Breed2', 'Gender', 'Color1', 'Color2', 'Color3',
               'MaturitySize', 'FurLength', 'Vaccinated', 'Dewormed', 'Sterilized', 'Health'
    , 'Quantity', 'Fee', 'State', 'VideoAmt', 'PhotoAmt']]
final_result = rfc.predict(x_test)
submission_df = pd.DataFrame(data={'PetID': test['PetID'].tolist(), 'AdoptionSpeed': final_result})
submission_df.to_csv('submission.csv', index=False)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Kaggle競賽PetFinder日記，第一次提交：簡單粗暴先裸奔一下

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

提供三萬單詞庫備份下載，MS-SQL 2000格式，下載請“自覺”捐贈可用分給我。

中國象棋與人工智能的研究成果

最近點對問題[CPP]C# N*LogN複雜度解法

繁體解決方案一，ASP,JSP,PHP,DotNet任何開發通用。

繁體解決方案，dotNet開發專用。

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結