kaggle上對舊金山城市的犯罪案件進行分類，屬於多分類問題，提供的數據特徵包含時間、地點、描述等。

導入數據和包

#imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time as systime
import datetime as dt
import string
import seaborn as sns
import matplotlib.colors as colors
%matplotlib inline

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 878049 entries, 0 to 878048
Data columns (total 9 columns):
Dates         878049 non-null object
Category      878049 non-null object
Descript      878049 non-null object
DayOfWeek     878049 non-null object
PdDistrict    878049 non-null object
Resolution    878049 non-null object
Address       878049 non-null object
X             878049 non-null float64
Y             878049 non-null float64
dtypes: float64(2), object(7)
memory usage: 60.3+ MB

train.shape

(878049, 9)

train.head(3)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	Dates	Category	Descript	DayOfWeek	PdDistrict	Resolution	Address	X	Y
0	2015-05-13 23:53:00	WARRANTS	WARRANT ARREST	Wednesday	NORTHERN	ARREST, BOOKED	OAK ST / LAGUNA ST	-122.425892	37.774599
1	2015-05-13 23:53:00	OTHER OFFENSES	TRAFFIC VIOLATION ARREST	Wednesday	NORTHERN	ARREST, BOOKED	OAK ST / LAGUNA ST	-122.425892	37.774599
2	2015-05-13 23:33:00	OTHER OFFENSES	TRAFFIC VIOLATION ARREST	Wednesday	NORTHERN	ARREST, BOOKED	VANNESS AV / GREENWICH ST	-122.424363	37.800414

test.shape

(884262, 7)

test.head(3)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	Id	Dates	DayOfWeek	PdDistrict	Address	X	Y
0	0	2015-05-10 23:59:00	Sunday	BAYVIEW	2000 Block of THOMAS AV	-122.399588	37.735051
1	1	2015-05-10 23:51:00	Sunday	BAYVIEW	3RD ST / REVERE AV	-122.391523	37.732432
2	2	2015-05-10 23:50:00	Sunday	NORTHERN	2000 Block of GOUGH ST	-122.426002	37.792212

數據分析

train.isnull().sum()

Dates         0
Category      0
Descript      0
DayOfWeek     0
PdDistrict    0
Resolution    0
Address       0
X             0
Y             0
dtype: int64

PdDistrict

dis_group = train.groupby(by='PdDistrict').size()
print(len(dis_group))
dis_group

PdDistrict
BAYVIEW 89431
CENTRAL 85460
INGLESIDE 78845
MISSION 119908
NORTHERN 105296
PARK 49313
RICHMOND 45209
SOUTHERN 157182
TARAVAL 65596
TENDERLOIN 81809
dtype: int64

dis_group = dis_group/sum(dis_group)

dis_group.index = dis_group.index.map(string.capwords)
dis_group.sort_values(ascending=True,inplace=True)
dis_group.plot(kind='barh',figsize=(15,10),fontsize=10,color=sns.color_palette('coolwarm',10))
plt.title('Frequncy. of crimes by district',fontsize=20)
plt.show()

可以看出，地區之間差異還是挺大的，southern地區犯罪率較高，治安最好的是Richmond。

year/month/day

#將object類型轉爲datetime類型
train['date'] = pd.to_datetime(train['Dates'])

train.head(1)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	Dates	Category	Descript	DayOfWeek	PdDistrict	Resolution	Address	X	Y	date
0	2015-05-13 23:53:00	WARRANTS	WARRANT ARREST	Wednesday	NORTHERN	ARREST, BOOKED	OAK ST / LAGUNA ST	-122.425892	37.774599	2015-05-13 23:53:00

train['year'] = train.date.dt.year
train['month'] = train.date.dt.month
train['day'] = train.date.dt.day
train['hour'] = train.date.dt.hour

plt.figure(figsize=(8,19))

year_group = train.groupby('year').size()
plt.subplot(311)
plt.plot(year_group.index[:-1],year_group[:-1],'ks-')
plt.xlabel('year')

month_group = train.groupby('month').size()
plt.subplot(312)
plt.plot(month_group,'ks-')
plt.xlabel('month')

day_group = train.groupby('day').size()
plt.subplot(313)
plt.plot(day_group,'ks-')
plt.xlabel('day')

plt.show()

從上圖可知，在2010年前SF的犯罪數基本上呈遞減趨勢，2010後數量激增，案件高發期是在一年中的5月和10月，在每個月的月初和月末會有輕微漲幅。

Day of week

week_group = train.groupby(['DayOfWeek','hour']).size()#多重分組
week_group = week_group.unstack()#對分組後的多重索引轉爲xy索引

week_group.T.plot(figsize=(12,8))#行列互換後畫圖
plt.xlabel('hour of day',size=15)
plt.ylabel('Number of crimes',size=15)
plt.show()

可以看出，案件高發時間是在12點和18點左右，凌晨後數量會顯著減少，在週五週六的晚上8點後案件發生率會比平時要高。

高發案件的時間和地點

對數量較多的前6種犯罪類型做分析：

hour

tmp = train[train['Category'].map(string.capwords).isin(top6)]
tmp_group = tmp.groupby(['Category','hour']).size()
tmp_group = tmp_group.unstack()
tmp_group.T.plot(figsize=(12,6),style='o-')
plt.show()

時間上與上述分析是一致的，對於偷盜類案件在12、18點發生率更高；assault類案件在晚上6點後沒有下降趨勢。

PdDistrict

tmp2 = tmp.groupby(['Category','PdDistrict']).size()
tmp2.unstack()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

PdDistrict	BAYVIEW	CENTRAL	INGLESIDE	MISSION	NORTHERN	PARK	RICHMOND	SOUTHERN	TARAVAL	TENDERLOIN
Category
ASSAULT	9857	6977	8533	11149	8318	3515	3202	12183	5463	7679
DRUG/NARCOTIC	4498	1805	2373	8757	4511	2573	999	9228	1531	17696
LARCENY/THEFT	10119	25060	10236	18223	28630	9146	9893	41845	11845	9903
NON-CRIMINAL	6099	10940	6853	12372	10240	5925	5744	19745	6919	7467
OTHER OFFENSES	17053	8901	13203	19330	12233	6184	5632	21308	8614	13724
VEHICLE THEFT	7219	4210	8960	7148	6291	3963	4117	4725	6142	1006

tmp2.unstack().T.plot(kind='bar',figsize=(12,6),rot=45)
plt.show()

從上圖可知，犯罪率最高的Southern地區，偷竊類、暴力衝突類案件數量最多，車輛失竊類案件較少，猜測可能屬於貧困地區，治安很好的地區Park,Richmond中，毒品、人身攻擊類案件比例明顯較少.

DayOfWeek

tmp3 = tmp.groupby(['Category','DayOfWeek']).size()
tmp3 = tmp3.unstack()

tmp3.sum(axis=1)[0]

76876

tmp3.iloc[0]

DayOfWeek
Friday 11160
Monday 10560
Saturday 11995
Sunday 12082
Thursday 10246
Tuesday 10280
Wednesday 10553
Name: ASSAULT, dtype: int64

for i in range(6):
    tmp3.iloc[i] = tmp3.iloc[i]/tmp3.sum(axis=1)[i]
tmp3

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

DayOfWeek	Friday	Monday	Saturday	Sunday	Thursday	Tuesday	Wednesday
Category
ASSAULT	0.145169	0.137364	0.156030	0.157162	0.133280	0.133722	0.137273
DRUG/NARCOTIC	0.137481	0.144948	0.118397	0.113820	0.156640	0.157010	0.171703
LARCENY/THEFT	0.154969	0.134763	0.155615	0.138079	0.139594	0.136975	0.140006
NON-CRIMINAL	0.151499	0.139268	0.151749	0.140546	0.138878	0.138001	0.140059
OTHER OFFENSES	0.147311	0.140963	0.135748	0.122498	0.146312	0.149062	0.158105
VEHICLE THEFT	0.160149	0.137818	0.150964	0.139529	0.138636	0.135048	0.137855

wkm = {
    'Monday':0,
    'Tuesday':1,
    'Wednesday':2,
    'Thursday':3,
    'Friday':4,
    'Saturday':5,
    'Sunday':6
}
tmp3.columns = tmp3.columns.map(wkm)

tmp3 = tmp3.ix[:,range(7)]
tmp3

D:\programs\anaconda\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated “”“Entry point for launching an IPython kernel.

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

DayOfWeek	0	1	2	3	4	5	6
Category
ASSAULT	0.137364	0.133722	0.137273	0.133280	0.145169	0.156030	0.157162
DRUG/NARCOTIC	0.144948	0.157010	0.171703	0.156640	0.137481	0.118397	0.113820
LARCENY/THEFT	0.134763	0.136975	0.140006	0.139594	0.154969	0.155615	0.138079
NON-CRIMINAL	0.139268	0.138001	0.140059	0.138878	0.151499	0.151749	0.140546
OTHER OFFENSES	0.140963	0.149062	0.158105	0.146312	0.147311	0.135748	0.122498
VEHICLE THEFT	0.137818	0.135048	0.137855	0.138636	0.160149	0.150964	0.139529

tmp3.T.plot(figsize=(12,6),style='o-')
plt.xlabel("weekday",size=20)
#plt.axes.set_xticks([])
plt.xticks([0,1,2,3,4,5,6],['Mon','Tue','Wed','Thur','Fri','Sat','Sun'])
plt.show()

趨勢不太一樣的是毒品類案件，在週三發生最多，週末有急劇下降的趨勢；其餘多數案件，除了other offenses外，都在週五週六有所增多。

month

mon_g = tmp.groupby(['Category','month']).size()
mon_g = mon_g.unstack()
for i in range(6):
    mon_g.iloc[i] = mon_g.iloc[i]/mon_g.sum(axis=1)[i]
mon_g.T.plot(figsize=(12,6),style='o-')
plt.show()

分類變化趨勢與總體基本一致，2-6月和8-12月是案件高發期，1-2月drug和other offense案發率較高。

高發案件的時間趨勢

ddf = tmp.groupby(['Category',pd.Grouper('date')]).size()
ddf = ddf.unstack().fillna(0)

ddf = ddf.T#將時間序列設爲index方便後續使用resample進行統計
ddf.index

DatetimeIndex([‘2015-05-13 23:53:00’, ‘2015-05-13 23:33:00’,
‘2015-05-13 23:30:00’, ‘2015-05-13 23:00:00’,
‘2015-05-13 22:58:00’, ‘2015-05-13 22:30:00’,
‘2015-05-13 22:06:00’, ‘2015-05-13 22:00:00’,
‘2015-05-13 21:55:00’, ‘2015-05-13 21:40:00’,
…
‘2003-01-06 02:00:00’, ‘2003-01-06 01:54:00’,
‘2003-01-06 01:50:00’, ‘2003-01-06 01:36:00’,
‘2003-01-06 00:55:00’, ‘2003-01-06 00:40:00’,
‘2003-01-06 00:33:00’, ‘2003-01-06 00:31:00’,
‘2003-01-06 00:20:00’, ‘2003-01-06 00:01:00’],
dtype=’datetime64[ns]’, name=’date’, length=306742, freq=None)

df2 = ddf.resample('m',how='sum')#按月求和

D:\programs\anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: how in .resample() is deprecated the new syntax is .resample(…).sum() “”“Entry point for launching an IPython kernel.

plt.style.use('ggplot')
moav = df2.rolling(12).mean()#每12個月統計平均，相當於加了個窗
i = 1
for cat in df2.columns:
    plt.figure(figsize=(12,15))
    ax = plt.subplot(6,1,i)
    plt.plot(df2.index,df2[cat])
    plt.plot(df2.index,moav[cat])
    plt.title(cat)
    i+=1

df2.plot()

可見，不同種類的案件隨時間是有不同變化的，如vehicle theft在05年後急劇下降，可能有專項整治等活動，theft卻在12年後有升高的趨勢。

地圖座標展示

在給的訓練和測試數據最後，有2列是代表犯罪案件發生的經緯度座標，從上面分析知道有些地區是案件高發區，有些地區某類案件比例較高，所以可知，地理位置和案件分類有較強的關係，我們以地圖的形式展示某些案件的高發地區。

train[['X','Y']].describe()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	X	Y
count	878049.000000	878049.000000
mean	-122.422616	37.771020
std	0.030354	0.456893
min	-122.513642	37.707879
25%	-122.432952	37.752427
50%	-122.416420	37.775421
75%	-122.406959	37.784369
max	-120.500000	90.000000

#show SF map
mapdata = np.loadtxt('sf_map_copyright_openstreetmap_contributors.txt')
plt.figure(figsize=(8,8))
plt.imshow(mapdata,cmap=plt.get_cmap('gray'))
plt.show()

#我們選取數量最多的偷盜類案件
theft=train[train['Category']=='LARCENY/THEFT']

#我的電腦用所有訓練數據畫圖時，時間特別長，所以這裏選取部分數據，並去除可能不正確的數據
#theft['Xok'] = theft[theft.X<-121].X
#theft['Yok'] = theft[theft.Y>40].Y
theft = theft[1:300000]

asp = mapdata.shape[0]*1.0/mapdata.shape[1]
lon_lat_box = (-122.5247, -122.3366, 37.699, 37.8299)
clipsize = [[-122.5247, -122.3366],[ 37.699, 37.8299]]

plt.figure(figsize=(8,8*asp))
ax = sns.kdeplot(theft.X,theft.Y,clip=clipsize,aspect=1/asp)
#ax = sns.regplot('X', 'Y', data=theft, fit_reg=False)
ax.imshow(mapdata,cmap=plt.get_cmap('gray'),extent=lon_lat_box,aspect=asp)

<matplotlib.image.AxesImage at 0x1f6a1ec4828>

im = plt.imread('SanFranMap.png')
plt.figure(figsize=(8,8))
ax = sns.kdeplot(theft.X,theft.Y,clip=clipsize,aspect=1/asp)
#ax = sns.regplot('X', 'Y', data=theft, fit_reg=False)
ax.imshow(im,cmap=plt.get_cmap('gray'),extent=lon_lat_box,aspect=asp)

<matplotlib.image.AxesImage at 0x1f6a22434e0>

因爲只是在一個city，可以看出XY（經緯座標）範圍很小，數值型數據再經過標準化處理後，其指示的地域範圍就很模糊了，分類效果不明顯，但地理位置對案件類型還是有影響的，我們暫且選用PdDistrict。

數據處理

類別特徵：Dates,Descript,DayOfWeek,PdDistrict,Resolution,Address
數值型特徵：X,Y,year,month,day,hour
時間特徵：date

from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_validation import train_test_split
#from sklearn.feature_selection import SelectKBest
#from sklearn.feature_selection import chi2
from sklearn.cross_validation import train_test_split

D:\programs\anaconda\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

#對測試集的Dates做同樣的處理
test['date'] = pd.to_datetime(test['Dates'])
test['year'] = test.date.dt.year
test['month'] = test.date.dt.month
test['day'] = test.date.dt.day
test['hour'] = test.date.dt.hour
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 884262 entries, 0 to 884261
Data columns (total 12 columns):
Id            884262 non-null int64
Dates         884262 non-null object
DayOfWeek     884262 non-null object
PdDistrict    884262 non-null object
Address       884262 non-null object
X             884262 non-null float64
Y             884262 non-null float64
date          884262 non-null datetime64[ns]
year          884262 non-null int64
month         884262 non-null int64
day           884262 non-null int64
hour          884262 non-null int64
dtypes: datetime64[ns](1), float64(2), int64(5), object(4)
memory usage: 81.0+ MB

train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 878049 entries, 0 to 878048
Data columns (total 14 columns):
Dates         878049 non-null object
Category      878049 non-null object
Descript      878049 non-null object
DayOfWeek     878049 non-null object
PdDistrict    878049 non-null object
Resolution    878049 non-null object
Address       878049 non-null object
X             878049 non-null float64
Y             878049 non-null float64
date          878049 non-null datetime64[ns]
year          878049 non-null int64
month         878049 non-null int64
day           878049 non-null int64
hour          878049 non-null int64
dtypes: datetime64[ns](1), float64(2), int64(4), object(7)
memory usage: 93.8+ MB

預測類別標籤

#對分類目標做標籤化處理

label = preprocessing.LabelEncoder()
target = label.fit_transform(train.Category)
target

array([37, 21, 21, ..., 16, 35, 12], dtype=int64)

#處理不統一的特徵
Id = test['Id']
des = train['Descript']
res = train['Resolution']
train.drop(['Category','Descript','Resolution'],axis=1,inplace=True)
test.drop('Id',axis=1,inplace=True)

#合併數據方便處理
full = pd.concat([train,test],keys=['train','test'])

full.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1762311 entries, (train, 0) to (test, 884261)
Data columns (total 11 columns):
Dates         object
DayOfWeek     object
PdDistrict    object
Address       object
X             float64
Y             float64
date          datetime64[ns]
year          int64
month         int64
day           int64
hour          int64
dtypes: datetime64[ns](1), float64(2), int64(4), object(4)
memory usage: 163.0+ MB

選取特徵

#對DayOfWeek做one-hot編碼轉爲數值型
week = pd.get_dummies(full.DayOfWeek)

#PdDistrict和Address重複
#選擇PdDistrict並做處理
full.drop('Address',axis=1,inplace=True)
dist = pd.get_dummies(full.PdDistrict)

#時間特徵
#刪除重複的Dates,date
full.drop(['Dates','date'],axis=1,inplace=True)

對數值型時間特徵year month day hour，不同類型案件的year趨勢不一樣，month特徵在年初會有不同，hour特徵在18點後會有不同，所以添加2個新特徵newy,dark.


full['newy'] = full['month'].apply(lambda x:1 if x==1 or x==2 else 0)
full['dark'] = full['hour'].apply(lambda x:1 if x>=18 and x<=24 else 0)

hour_dum = pd.get_dummies(full.hour)

year_dum = pd.get_dummies(full.year)

month_dum = pd.get_dummies(full.month)


#刪除、合併特徵
full.drop(['month','hour','day','year','DayOfWeek','PdDistrict'],axis=1,inplace=True)

#full = pd.concat(['week','dist','year'],axis=1)
#full.drop('year',axis=1,inplace=True)
full = pd.concat([full,week,dist,year_dum,month_dum,hour_dum,],axis=1)

#full.drop(['month','hour','day','year','DayOfWeek','PdDistrict'],axis=1,inplace=True)
#full = pd.concat([full,week,dist,year_dum,hour_dum,month_dum],axis=1)

full.isnull().sum()

newy          0
dark          0
Friday        0
Monday        0
Saturday      0
Sunday        0
Thursday      0
Tuesday       0
Wednesday     0
BAYVIEW       0
CENTRAL       0
INGLESIDE     0
MISSION       0
NORTHERN      0
PARK          0
RICHMOND      0
SOUTHERN      0
TARAVAL       0
TENDERLOIN    0
2003          0
2004          0
2005          0
2006          0
2007          0
2008          0
2009          0
2010          0
2011          0
             ..
7             0
8             0
9             0
10            0
11            0
12            0
0             0
1             0
2             0
3             0
4             0
5             0
6             0
7             0
8             0
9             0
10            0
11            0
12            0
13            0
14            0
15            0
16            0
17            0
18            0
19            0
20            0
21            0
22            0
23            0
Length: 70, dtype: int64

生成驗證集、測試集

#加入所有特徵
training,valid,y_train,y_valid = train_test_split(full[:train.shape[0]],target,train_size=0.7,random_state=0)

training.shape

(614634, 68)

model

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
from sklearn.naive_bayes import BernoulliNB
import time

training.shape

(614634, 68)

邏輯迴歸

LR = LogisticRegression(C=0.1)
lrstart = time.time()
LR.fit(training, y_train)
lrcost_time = time.time()-lrstart
predicted = np.array(LR.predict_proba(valid))
print("邏輯迴歸log損失爲 %f" %(log_loss(y_valid, predicted)))
print('邏輯迴歸建模耗時 %f 秒' %(lrcost_time))

邏輯迴歸log損失爲 2.596991
邏輯迴歸建模耗時 130.701451 秒

樸素貝葉斯

NB = BernoulliNB()
nbstart = time.time()
NB.fit(training,y_train)
nbcost_time = time.time()-nbstart
predicted = np.array(NB.predict_proba(valid))
print("貝葉斯log損失爲 %f" %(log_loss(y_valid, predicted)))
print( "樸素貝葉斯建模耗時 %f 秒" %(nbcost_time))

貝葉斯log損失爲 2.607965
樸素貝葉斯建模耗時 1.765910 秒

train_all = np.c_[training,y_train]
train_all.shape

(614634, 69)

np.savetxt('/forBP/train.csv',train_all,fmt='%d',delimiter=',')

隨機森林

from sklearn.ensemble import RandomForestClassifier

params = [12,13,14,15,16]
for par in params:
    clf = RandomForestClassifier(n_estimators=30, max_depth=par)
#forest_start = time.time()
    clf.fit(training,y_train)
#fcost = time.time()-forest_start
    predicted = np.array(clf.predict_proba(valid))
    print("隨機森林log損失爲 %f" %(log_loss(y_valid, predicted)))
#print( "隨機森林建模耗時 %f 秒" %(fcost))

隨機森林log損失爲 2.575974
隨機森林log損失爲 2.568528
隨機森林log損失爲 2.563786
隨機森林log損失爲 2.559156
隨機森林log損失爲 2.555832

#write the result
result = NB.predict_proba(full[train.shape[0]:])
submission = pd.DataFrame(result,columns=label.classes_)
submission.to_csv('SFresult_v1.csv',index = False, index_label='Id' )

submission.shape

(884262, 39)

這裏使用了邏輯迴歸、貝葉斯分類和隨機森林，目標損失函數是log loss，貝葉斯分類時間快，精確度也較高，使用集成學習器後能大大提高準確度，所以下一步可以考慮使用不同的集成學習器，或者對隨機森林的參數進行調優。
我這裏選擇的特徵方式比較簡單，包含時間和低點，都是非數值型特徵，只是簡單做了onehot編碼，下一步可以考慮加入PCA進行降維，或者重新選擇和構造新特徵。對於文本特徵discription，這裏沒有使用，可以藉助這個特徵進行文本分類預測，或者通過分析關鍵詞，對other offense類案件有更清晰的瞭解。
雖然樣板數量較之前的比賽有了提升，但特徵量並不算多，下一步我考慮使用TensorFlow對處理好的數據進行BP神經網絡預測。

新手學習，歡迎指教！！！

kaggle 舊金山犯罪案件分類預測

導入數據和包

數據分析

Category

PdDistrict

year/month/day

Day of week

高發案件的時間和地點

hour

PdDistrict

DayOfWeek

month

高發案件的時間趨勢

地圖座標展示

數據處理

預測類別標籤

選取特徵

生成驗證集、測試集

model

邏輯迴歸

樸素貝葉斯

隨機森林

985 碩士程序員，空窗 4 個月沒有 Offer！

營銷系統黑名單優化：位圖的應用解析

我真的從測試轉成了開發......

nginx添加相應配置，通過瀏覽器訪問或curl時返回客戶端對應公網IP

[oeasy]python020在遊戲中體驗數值自由_勇闖地下城_終端文字遊戲

爲何我建議你學會抄代碼

解密遊戲神作

導入地址表鉤取技術解析

盛大發布 | Zabbix 7.0 LTS--性能與擴展的卓越融合

mmsql 臨時表和主表 merge into 語法

kaggle Home Depot relevance相關性預測

selenium+Python Behave行爲驅動測試開發用例設計

使用bat快速打開Jupyter到指定目錄

Python實現http接口自動化測試

selenium+Python Page Object自動化測試

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結