第3章 分組

第3章 分組

import numpy as np
import pandas as pd
df = pd.read_csv('data/table.csv',index_col='ID')
df.head()
School Class Gender Address Height Weight Math Physics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1102 S_1 C_1 F street_2 192 73 32.5 B+
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
# import numpy as np
# import pandas as pd 
# df=pd.read_csv('data/table.csv',index_col='ID')
# df.head()

一、SAC過程

1. 內涵

SAC指的是分組操作中的split-apply-combine過程

其中split指基於某一些規則,將數據拆成若干組,apply是指對每一組獨立地使用函數,combine指將每一組的結果組合成某一類數據結構

2. apply過程

在該過程中,我們實際往往會遇到四類問題:

整合(Aggregation)——即分組計算統計量(如求均值、求每組元素個數)

變換(Transformation)——即分組對每個單元的數據進行操作(如元素標準化)

過濾(Filtration)——即按照某些規則篩選出一些組(如選出組內某一指標小於50的組)

綜合問題——即前面提及的三種問題的混合

二、groupby函數

1. 分組函數的基本內容:

(a)根據某一列分組

grouped_single = df.groupby('School')
grouped_single =df.groupby('School')
display(grouped_single)
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000026FFBC23DA0>

經過groupby後會生成一個groupby對象,該對象本身不會返回任何東西,只有當相應的方法被調用纔會起作用

例如取出某一個組:

grouped_single.get_group('S_1').head()
grouped_single.get_group('S_1')
School Class Gender Address Height Weight Math Physics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1102 S_1 C_1 F street_2 192 73 32.5 B+
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
1201 S_1 C_2 M street_5 188 68 97.0 A-
1202 S_1 C_2 F street_4 176 94 63.5 B-
1203 S_1 C_2 M street_6 160 53 58.8 A+
1204 S_1 C_2 F street_5 162 63 33.8 B
1205 S_1 C_2 F street_6 167 63 68.4 B-
1301 S_1 C_3 M street_4 161 68 31.5 B+
1302 S_1 C_3 F street_1 175 57 87.7 A-
1303 S_1 C_3 M street_7 188 82 49.7 B
1304 S_1 C_3 M street_2 195 70 85.2 A
1305 S_1 C_3 F street_5 187 69 61.7 B-

(b)根據某幾列分組

grouped_mul = df.groupby(['School','Class'])
grouped_mul.get_group(('S_2','C_4'))
grouped_mul=df.groupby(['School','Class'])
grouped_mul.get_group(('S_2','C_1'))
School Class Gender Address Height Weight Math Physics
ID
2101 S_2 C_1 M street_7 174 84 83.3 C
2102 S_2 C_1 F street_6 161 61 50.6 B+
2103 S_2 C_1 M street_4 157 61 52.5 B-
2104 S_2 C_1 F street_5 159 97 72.2 B+
2105 S_2 C_1 M street_4 170 81 34.2 A

(c)組容量與組數

grouped_single.size()
# grouped_single.size()
School
S_1    15
S_2    20
dtype: int64
grouped_mul.size()
# grouped_mul.size()
School  Class
S_1     C_1      5
        C_2      5
        C_3      5
S_2     C_1      5
        C_2      5
        C_3      5
        C_4      5
dtype: int64
grouped_single.ngroups
grouped_single.ngroups
2
grouped_mul.ngroups
grouped_mul.ngroups
7

(d)組的遍歷

for name,group in grouped_single:
    print(name)
    display(group.head())
# for name ,group in grouped_single:
#     print(name)
#     display(group.head())
S_1
School Class Gender Address Height Weight Math Physics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1102 S_1 C_1 F street_2 192 73 32.5 B+
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
S_2
School Class Gender Address Height Weight Math Physics
ID
2101 S_2 C_1 M street_7 174 84 83.3 C
2102 S_2 C_1 F street_6 161 61 50.6 B+
2103 S_2 C_1 M street_4 157 61 52.5 B-
2104 S_2 C_1 F street_5 159 97 72.2 B+
2105 S_2 C_1 M street_4 170 81 34.2 A

(e)level參數(用於多級索引)和axis參數

df.set_index(['Gender','School']).groupby(level=1,axis=0).get_group('S_1').head()
df.set_index(['Gender','School']).groupby(level=0).get_group('M')#.head()

Class Address Height Weight Math Physics
Gender School
M S_1 C_1 street_1 173 63 34.0 A+
S_1 C_1 street_2 186 82 87.2 B+
S_1 C_2 street_5 188 68 97.0 A-
S_1 C_2 street_6 160 53 58.8 A+
S_1 C_3 street_4 161 68 31.5 B+
S_1 C_3 street_7 188 82 49.7 B
S_1 C_3 street_2 195 70 85.2 A
S_2 C_1 street_7 174 84 83.3 C
S_2 C_1 street_4 157 61 52.5 B-
S_2 C_1 street_4 170 81 34.2 A
S_2 C_2 street_5 193 100 39.1 B
S_2 C_2 street_4 155 91 73.8 A+
S_2 C_2 street_1 175 74 47.2 B-
S_2 C_3 street_5 171 88 32.7 A
S_2 C_3 street_4 187 73 48.9 B
S_2 C_4 street_7 166 82 48.7 B

2. groupby對象的特點

(a)查看所有可調用的方法

由此可見,groupby對象可以使用相當多的函數,靈活程度很高

print([attr for attr in dir(grouped_single) if not attr.startswith('_')])
# print([attr for attr in dir(grouped_single) if not attr.startswith('_')])
['Address', 'Class', 'Gender', 'Height', 'Math', 'Physics', 'School', 'Weight', 'agg', 'aggregate', 'all', 'any', 'apply', 'backfill', 'bfill', 'boxplot', 'corr', 'corrwith', 'count', 'cov', 'cumcount', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'dtypes', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'get_group', 'groups', 'head', 'hist', 'idxmax', 'idxmin', 'indices', 'last', 'mad', 'max', 'mean', 'median', 'min', 'ndim', 'ngroup', 'ngroups', 'nth', 'nunique', 'ohlc', 'pad', 'pct_change', 'pipe', 'plot', 'prod', 'quantile', 'rank', 'resample', 'rolling', 'sem', 'shift', 'size', 'skew', 'std', 'sum', 'tail', 'take', 'transform', 'tshift', 'var']
['Address', 'Class', 'Gender', 'Height', 'Math', 'Physics', 'School', 'Weight', 'agg', 'aggregate', 'all', 'any', 'apply', 'backfill', 'bfill', 'boxplot', 'corr', 'corrwith', 'count', 'cov', 'cumcount', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'dtypes', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'get_group', 'groups', 'head', 'hist', 'idxmax', 'idxmin', 'indices', 'last', 'mad', 'max', 'mean', 'median', 'min', 'ndim', 'ngroup', 'ngroups', 'nth', 'nunique', 'ohlc', 'pad', 'pct_change', 'pipe', 'plot', 'prod', 'quantile', 'rank', 'resample', 'rolling', 'sem', 'shift', 'size', 'skew', 'std', 'sum', 'tail', 'take', 'transform', 'tshift', 'var']

(b)分組對象的head和first

對分組對象使用head函數,返回的是每個組的前幾行,而不是數據集前幾行

grouped_single.head(2)
grouped_single.head(1)
School Class Gender Address Height Weight Math Physics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
2101 S_2 C_1 M street_7 174 84 83.3 C

first顯示的是以分組爲索引的每組的第一個分組信息

grouped_single.first()
grouped_single.first()
Class Gender Address Height Weight Math Physics
School
S_1 C_1 M street_1 173 63 34.0 A+
S_2 C_1 M street_7 174 84 83.3 C

(c)分組依據

對於groupby函數而言,分組的依據是非常自由的,只要是與數據框長度相同的列表即可,同時支持函數型分組

df.groupby(np.random.choice(['a','b','c'],df.shape[0])).get_group('a')#.head()
#相當於將np.random.choice(['a','b','c'],df.shape[0])當做新的一列進行分組
print(np.random.choice(['a','b','c'],df.shape[0]))
a=df.groupby(np.random.choice(['a','b','c'],df.shape[0]))

for name ,group in a:
    print(name )
    display(group)

a.size()
['a' 'b' 'b' 'a' 'c' 'b' 'c' 'b' 'b' 'b' 'b' 'c' 'c' 'a' 'a' 'b' 'b' 'a'
 'c' 'b' 'b' 'c' 'c' 'a' 'b' 'a' 'a' 'a' 'a' 'a' 'c' 'a' 'a' 'a' 'a']
a
School Class Gender Address Height Weight Math Physics
ID
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
1203 S_1 C_2 M street_6 160 53 58.8 A+
2101 S_2 C_1 M street_7 174 84 83.3 C
2105 S_2 C_1 M street_4 170 81 34.2 A
2301 S_2 C_3 F street_4 157 78 72.3 B+
2304 S_2 C_3 F street_6 164 81 95.5 A-
2402 S_2 C_4 M street_7 166 82 48.7 B
2403 S_2 C_4 F street_6 158 60 59.7 B+
2404 S_2 C_4 F street_2 160 84 67.7 B
2405 S_2 C_4 F street_6 193 54 47.6 B
b
School Class Gender Address Height Weight Math Physics
ID
1102 S_1 C_1 F street_2 192 73 32.5 B+
1204 S_1 C_2 F street_5 162 63 33.8 B
1303 S_1 C_3 M street_7 188 82 49.7 B
1304 S_1 C_3 M street_2 195 70 85.2 A
1305 S_1 C_3 F street_5 187 69 61.7 B-
2201 S_2 C_2 M street_5 193 100 39.1 B
2204 S_2 C_2 M street_1 175 74 47.2 B-
2303 S_2 C_3 F street_7 190 99 65.9 C
c
School Class Gender Address Height Weight Math Physics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1201 S_1 C_2 M street_5 188 68 97.0 A-
1202 S_1 C_2 F street_4 176 94 63.5 B-
1205 S_1 C_2 F street_6 167 63 68.4 B-
1301 S_1 C_3 M street_4 161 68 31.5 B+
1302 S_1 C_3 F street_1 175 57 87.7 A-
2102 S_2 C_1 F street_6 161 61 50.6 B+
2103 S_2 C_1 M street_4 157 61 52.5 B-
2104 S_2 C_1 F street_5 159 97 72.2 B+
2202 S_2 C_2 F street_7 194 77 68.5 B+
2203 S_2 C_2 M street_4 155 91 73.8 A+
2205 S_2 C_2 F street_7 183 76 85.4 B
2302 S_2 C_3 M street_5 171 88 32.7 A
2305 S_2 C_3 M street_4 187 73 48.9 B
2401 S_2 C_4 F street_2 192 62 45.3 A
a    12
b     8
c    15
dtype: int64

從原理上說,我們可以看到利用函數時,傳入的對象就是索引,因此根據這一特性可以做一些複雜的操作

df[:5].groupby(lambda x:print(x)).head(5)
a=df[:5].groupby(pd.Series([2,1,1,4,5],index=[1105,1104,1103,1102,1101]))
# a.size()
for name ,group in a:
    print(name )
    display(group)
# df[:5].groupby(lambda x:x*2).head(5)
1101
1102
1103
1104
1105
display(df.iloc[0:5,0:5])
b=df.iloc[0:5,0:5].groupby([1,1,1,2,1],axis=1)
for name ,group in b:
    print(name )
    display(group)
School Class Gender Address Height
ID
1101 S_1 C_1 M street_1 173
1102 S_1 C_1 F street_2 192
1103 S_1 C_1 M street_2 186
1104 S_1 C_1 F street_2 167
1105 S_1 C_1 F street_4 159
1
School Class Gender Height
ID
1101 S_1 C_1 M 173
1102 S_1 C_1 F 192
1103 S_1 C_1 M 186
1104 S_1 C_1 F 167
1105 S_1 C_1 F 159
2
Address
ID
1101 street_1
1102 street_2
1103 street_2
1104 street_2
1105 street_4

根據奇偶行分組

# df.groupby(lambda x :print(x))
df.index.get_loc(1102)
1
display(df.groupby(lambda x:'奇數行'  if not df.index.get_loc(x)%2==1 else '偶數行').groups)
df.groupby(lambda x:'奇數ID行' if  x%2==1 else '偶數ID行').groups
{'偶數行': Int64Index([1102, 1104, 1201, 1203, 1205, 1302, 1304, 2101, 2103, 2105, 2202,
             2204, 2301, 2303, 2305, 2402, 2404],
            dtype='int64', name='ID'),
 '奇數行': Int64Index([1101, 1103, 1105, 1202, 1204, 1301, 1303, 1305, 2102, 2104, 2201,
             2203, 2205, 2302, 2304, 2401, 2403, 2405],
            dtype='int64', name='ID')}





{'偶數ID行': Int64Index([1102, 1104, 1202, 1204, 1302, 1304, 2102, 2104, 2202, 2204, 2302,
             2304, 2402, 2404],
            dtype='int64', name='ID'),
 '奇數ID行': Int64Index([1101, 1103, 1105, 1201, 1203, 1205, 1301, 1303, 1305, 2101, 2103,
             2105, 2201, 2203, 2205, 2301, 2303, 2305, 2401, 2403, 2405],
            dtype='int64', name='ID')}

如果是多層索引,那麼lambda表達式中的輸入就是元組,下面實現的功能爲查看兩所學校中男女生分別均分是否及格

注意:此處只是演示groupby的用法,實際操作不會這樣寫

df.set_index(['Gender','School']).head()

Class Address Height Weight Math Physics
Gender School
M S_1 C_1 street_1 173 63 34.0 A+
F S_1 C_1 street_2 192 73 32.5 B+
M S_1 C_1 street_2 186 82 87.2 B+
F S_1 C_1 street_2 167 81 80.4 B-
S_1 C_1 street_4 159 64 84.8 B+
df.set_index(['Gender','School']).sort_index().groupby(lambda x:print(x))
('F', 'S_1')
('F', 'S_1')
('F', 'S_1')
('F', 'S_1')
('F', 'S_1')
('F', 'S_1')
('F', 'S_1')
('F', 'S_1')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('F', 'S_2')
('M', 'S_1')
('M', 'S_1')
('M', 'S_1')
('M', 'S_1')
('M', 'S_1')
('M', 'S_1')
('M', 'S_1')
('M', 'S_2')
('M', 'S_2')
('M', 'S_2')
('M', 'S_2')
('M', 'S_2')
('M', 'S_2')
('M', 'S_2')
('M', 'S_2')
('M', 'S_2')





<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000026FFBE152E8>
math_score = df.set_index(['Gender','School'])['Math'].sort_index()
grouped_score = df.set_index(['Gender','School']).sort_index().\
            groupby(lambda x:(x,'均分及格' if math_score[x].mean()>=60 else '均分不及格'))
for name,_ in grouped_score:print(name)
for name ,group in grouped_score:
    print(name)
    display(group)
# math_score =df.set_index(['Gender','School'])['Math'].sort_index().groupby(lambda x:(x,'均分及格' if math_socre[x].mean()>60 else '均分不及格'))

(('F', 'S_1'), '均分及格')
(('F', 'S_2'), '均分及格')
(('M', 'S_1'), '均分及格')
(('M', 'S_2'), '均分不及格')
(('F', 'S_1'), '均分及格')
Class Address Height Weight Math Physics
Gender School
F S_1 C_1 street_2 192 73 32.5 B+
S_1 C_1 street_2 167 81 80.4 B-
S_1 C_1 street_4 159 64 84.8 B+
S_1 C_2 street_4 176 94 63.5 B-
S_1 C_2 street_5 162 63 33.8 B
S_1 C_2 street_6 167 63 68.4 B-
S_1 C_3 street_1 175 57 87.7 A-
S_1 C_3 street_5 187 69 61.7 B-
(('F', 'S_2'), '均分及格')
Class Address Height Weight Math Physics
Gender School
F S_2 C_1 street_6 161 61 50.6 B+
S_2 C_1 street_5 159 97 72.2 B+
S_2 C_2 street_7 194 77 68.5 B+
S_2 C_2 street_7 183 76 85.4 B
S_2 C_3 street_4 157 78 72.3 B+
S_2 C_3 street_7 190 99 65.9 C
S_2 C_3 street_6 164 81 95.5 A-
S_2 C_4 street_2 192 62 45.3 A
S_2 C_4 street_6 158 60 59.7 B+
S_2 C_4 street_2 160 84 67.7 B
S_2 C_4 street_6 193 54 47.6 B
(('M', 'S_1'), '均分及格')
Class Address Height Weight Math Physics
Gender School
M S_1 C_1 street_1 173 63 34.0 A+
S_1 C_1 street_2 186 82 87.2 B+
S_1 C_2 street_5 188 68 97.0 A-
S_1 C_2 street_6 160 53 58.8 A+
S_1 C_3 street_4 161 68 31.5 B+
S_1 C_3 street_7 188 82 49.7 B
S_1 C_3 street_2 195 70 85.2 A
(('M', 'S_2'), '均分不及格')
Class Address Height Weight Math Physics
Gender School
M S_2 C_1 street_7 174 84 83.3 C
S_2 C_1 street_4 157 61 52.5 B-
S_2 C_1 street_4 170 81 34.2 A
S_2 C_2 street_5 193 100 39.1 B
S_2 C_2 street_4 155 91 73.8 A+
S_2 C_2 street_1 175 74 47.2 B-
S_2 C_3 street_5 171 88 32.7 A
S_2 C_3 street_4 187 73 48.9 B
S_2 C_4 street_7 166 82 48.7 B
math_score.tail()
print(math_score[('M','S_2')])
math_score[('M', 'S_2')].mean()
(M, S_2)    83.3
(M, S_2)    52.5
(M, S_2)    34.2
(M, S_2)    39.1
(M, S_2)    73.8
(M, S_2)    47.2
(M, S_2)    32.7
(M, S_2)    48.9
(M, S_2)    48.7
Name: Math, dtype: float64





51.155555555555544

(d)groupby的[]操作

可以用[]選出groupby對象的某個或者某幾個列,上面的均分比較可以如下簡潔地寫出:

df.groupby(['Gender','School'])['Math'].mean()>=60
# df.groupby(['Gender','School'])['Math'].mean()>=60
Gender  School
F       S_1        True
        S_2        True
M       S_1        True
        S_2       False
Name: Math, dtype: bool

用列表可選出多個屬性列:

df.groupby(['Gender','School'])[['Math','Height']].mean()
# df.groupby(['Gender','School'])[['Math']].mean()
# a=df.set_index(['Gender','School']).sort_index()[['Math']]
# a.query('(School=="S_1")and (Gender=="F")').mean()
Math Height
Gender School
F S_1 64.100000 173.125000
S_2 66.427273 173.727273
M S_1 63.342857 178.714286
S_2 51.155556 172.000000

(e)連續型變量分組

例如利用cut函數對數學成績分組:

bins = [0,40,60,80,90,100]
cuts = pd.cut(df['Math'],bins=bins) #可選label添加自定義標籤
df.groupby(cuts)['Math'].count()
# bins=[0,40,60,80,90,100]
# cuts=pd.cut(df['Math'],bins=bins)
# df.groupby(cuts).size()
Math
(0, 40]       7
(40, 60]     10
(60, 80]      9
(80, 90]      7
(90, 100]     2
dtype: int64

三、聚合、過濾和變換

1. 聚合(Aggregation)

(a)常用聚合函數

所謂聚合就是把一堆數,變成一個標量,因此mean/sum/size/count/std/var/sem/describe/first/last/nth/min/max都是聚合函數

爲了熟悉操作,不妨驗證標準誤sem函數,它的計算公式是:\frac{組內標準差}{\sqrt{組容量}},下面進行驗證:

group_m = grouped_single['Math']
display(group_m.std().values/np.sqrt(group_m.count().values)== group_m.sem().values)

group_m=grouped_single['Math']
display(group_m.std().values)
# np.sqrt()
display(np.sqrt(group_m.count().values))
# group_m.head()
group_m.std().values/np.sqrt(group_m.count().values)==group_m.sem().values
array([ True,  True])



array([23.07747407, 17.58930521])



array([3.87298335, 4.47213595])





array([ True,  True])

(b)同時使用多個聚合函數

group_m.agg(['sum','mean','std'])
group_m.agg(['sum','mean','std','sem','count'])
sum mean std sem count
School
S_1 956.2 63.746667 23.077474 5.958578 15
S_2 1191.1 59.555000 17.589305 3.933088 20

利用元組進行重命名

group_m.agg([('rename_sum','sum'),('rename_mean','mean')])

group_m.agg([('rename_sum','sum'),('rename_mean','mean')])
rename_sum rename_mean
School
S_1 956.2 63.746667
S_2 1191.1 59.555000

指定哪些函數作用哪些列

grouped_mul.agg({'Math':['mean','max'],'Height':'var'})

grouped_mul.agg({'Math':['mean','max'],'Height':'var'})
Math Height
mean max var
School Class
S_1 C_1 63.78 87.2 183.3
C_2 64.30 97.0 132.8
C_3 63.16 87.7 179.2
S_2 C_1 58.56 83.3 54.7
C_2 62.80 85.4 256.0
C_3 63.06 95.5 205.7
C_4 53.80 67.7 300.2

(c)使用自定義函數

grouped_single['Math'].agg(lambda x:print(x.head(),x.count(),'間隔'))
#可以發現,agg函數的傳入是分組逐列進行的,有了這個特性就可以做許多事情
# grouped_single['Math'].agg(lambda x:print(x.head(),x.count(),'間隔'))
1101    34.0
1102    32.5
1103    87.2
1104    80.4
1105    84.8
Name: Math, dtype: float64 15 間隔
2101    83.3
2102    50.6
2103    52.5
2104    72.2
2105    34.2
Name: Math, dtype: float64 20 間隔





School
S_1    None
S_2    None
Name: Math, dtype: object

官方沒有提供極差計算的函數,但通過agg可以容易地實現組內極差計算

grouped_single['Math'].agg(lambda x:x.max()-x.min())

grouped_single['Math'].agg(lambda x:x.max()-x.min())
School
S_1    65.5
S_2    62.8
Name: Math, dtype: float64

(d)利用NamedAgg函數進行多個聚合

注意:不支持lambda函數,但是可以使用外置的def函數

def R1(x):
    return x.max()-x.min()
def R2(x):
    return x.max()-x.median()
grouped_single['Math'].agg(min_score1=pd.NamedAgg(column='col1', aggfunc=R1),
                           max_score1=pd.NamedAgg(column='col2', aggfunc='max'),
                           range_score2=pd.NamedAgg(column='col3', aggfunc=R2)).head()



display(grouped_single['Math'].head())

def R1(x):
    return x.max()-x.min()
def R2(x):
    return x.max()-x.median()
grouped_single['Math','Height'].agg(min_score1=pd.NamedAgg(column='Math',aggfunc=R1),
                          max_score1=pd.NamedAgg(column='Math',aggfunc='max'),
                          range_score2=pd.NamedAgg(column='Height',aggfunc=R2)).head()#
ID
1101    34.0
1102    32.5
1103    87.2
1104    80.4
1105    84.8
2101    83.3
2102    50.6
2103    52.5
2104    72.2
2105    34.2
Name: Math, dtype: float64


F:\dev\anaconda\envs\python35\lib\site-packages\ipykernel_launcher.py:17: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
min_score1 max_score1 range_score2
School
S_1 65.5 97.0 20.0
S_2 62.8 95.5 23.5

(e)帶參數的聚合函數

判斷是否組內數學分數至少有一個值在50-52之間:

def f(s,low,high):
    return s.between(low,high).max()
grouped_single['Math'].agg(f,50,52)

grouped_single['Math'].agg(lambda x:print(x.between(50,52)))
grouped_single['Math'].agg(lambda x:print(x.between(50,52).max()))
def f(s,low,high):
    return s.between(low,high).any()
grouped_single['Math'].agg(f,50,52)

1101    False
1102    False
1103    False
1104    False
1105    False
1201    False
1202    False
1203    False
1204    False
1205    False
1301    False
1302    False
1303    False
1304    False
1305    False
Name: Math, dtype: bool
2101    False
2102     True
2103    False
2104    False
2105    False
2201    False
2202    False
2203    False
2204    False
2205    False
2301    False
2302    False
2303    False
2304    False
2305    False
2401    False
2402    False
2403    False
2404    False
2405    False
Name: Math, dtype: bool
False
True





School
S_1    False
S_2     True
Name: Math, dtype: bool

如果需要使用多個函數,並且其中至少有一個帶參數,則使用wrap技巧:

def f_test(s,low,high):
    return s.between(low,high).max()
def agg_f(f_mul,name,*args):#,**kwargs
    def wrapper(x):
        return f_mul(x,*args)#,**kwargs
    wrapper.__name__ = name
    return wrapper
new_f = agg_f(f_test,'at_least_one_in_50_52',50,52)
grouped_single['Math'].agg([new_f,'mean']).head()



# def f_test(s,low,high):
#     return s.between(low,high).max()
# def agg_f(f_mul,*args):
#     def wrapper(x):
#         return f_mul(x,*args)
#     return wrapper
# grouped_single['Math'].agg([agg_f(f_test,50,52),'mean'])

at_least_one_in_50_52 mean
School
S_1 False 63.746667
S_2 True 59.555000

現在這段的目的就是我agg裏面能夠加帶參數的函數,那麼我們知道agg的傳入x會傳到agg_f(f_test,50,52)裏面,那agg_f()的返回結果是個什麼呢?是wrapper,那麼wrapper返回的又是什麼?是f_mul(x,50,52),這樣就把外層的參數通過包裹傳到了內層,並且最終agg傳入的x會最終傳入f_mul中的x,巧妙地利用agg_f中的args將數值傳到f_mul中的args數值。

2. 過濾(Filteration)

filter函數是用來篩選某些組的(務必記住結果是組的全體),因此傳入的值應當是布爾標量

grouped_single[['Math','Physics']].filter(lambda x:print((x['Math']>32).all())).head()
# grouped_single[['Math','Physics']].filter(lambda x:(x['Math']>32)).head()
# grouped_single[['Math','Physics']].agg(lambda x:print(x['Math']>32))
# grouped_single[['Math','Physics']].agg(lambda x:print(x.head(),x.count(),'間隔'))
grouped_single[['Math','Physics']].filter(lambda x:(x['Math']>32).all()).head()
False
True
Math Physics
ID
2101 83.3 C
2102 50.6 B+
2103 52.5 B-
2104 72.2 B+
2105 34.2 A
grouped_single[['Math','Physics']].filter(lambda x:(x['Math']>34).all()).head()
Math Physics
ID

filter選的是組,所以組的所有人都成績超過32(34)則返回True,否則返回False,注意True和False選的是組,所以32的時候因爲兩個組只有一個符合條件所以選出來一個,34都不符合,所以沒有選出來的。

3. 變換(Transformation)

(a)傳入對象

transform函數中傳入的對象是組內的列,並且返回值需要與列長完全一致

grouped_single[['Math','Height']].agg(lambda x:print(x-x.min())).head()
1101     2.5
1102     1.0
1103    55.7
1104    48.9
1105    53.3
1201    65.5
1202    32.0
1203    27.3
1204     2.3
1205    36.9
1301     0.0
1302    56.2
1303    18.2
1304    53.7
1305    30.2
Name: Math, dtype: float64
2101    50.6
2102    17.9
2103    19.8
2104    39.5
2105     1.5
2201     6.4
2202    35.8
2203    41.1
2204    14.5
2205    52.7
2301    39.6
2302     0.0
2303    33.2
2304    62.8
2305    16.2
2401    12.6
2402    16.0
2403    27.0
2404    35.0
2405    14.9
Name: Math, dtype: float64
1101    14
1102    33
1103    27
1104     8
1105     0
1201    29
1202    17
1203     1
1204     3
1205     8
1301     2
1302    16
1303    29
1304    36
1305    28
Name: Height, dtype: int64
2101    19
2102     6
2103     2
2104     4
2105    15
2201    38
2202    39
2203     0
2204    20
2205    28
2301     2
2302    16
2303    35
2304     9
2305    32
2401    37
2402    11
2403     3
2404     5
2405    38
Name: Height, dtype: int64
Math Height
School
S_1 None None
S_2 None None
grouped_single[['Math','Height']].transform(lambda x:x-x.min()).head()
grouped_single[['Math','Height']].transform(lambda x:x-x.min()).head()
Math Height
ID
1101 2.5 14
1102 1.0 33
1103 55.7 27
1104 48.9 8
1105 53.3 0

如果返回了標量值,那麼組內的所有元素會被廣播爲這個值

grouped_single[['Math','Height']].transform(lambda x:x.mean()).head()
# grouped_single[['Math','Height']].transform(lambda x:x.mean()).head()
Math Height
ID
1101 63.746667 175.733333
1102 63.746667 175.733333
1103 63.746667 175.733333
1104 63.746667 175.733333
1105 63.746667 175.733333

(b)利用變換方法進行組內標準化

grouped_single[['Math','Height']].transform(lambda x:(x-x.mean())/x.std()).head()
grouped_single[['Math','Height']].transform(lambda x:(x-x.mean())/x.std()).head()
Math Height
ID
1101 -1.288991 -0.214991
1102 -1.353990 1.279460
1103 1.016287 0.807528
1104 0.721627 -0.686923
1105 0.912289 -1.316166

(c)利用變換方法進行組內缺失值的均值填充

df_nan = df[['Math','School']].copy().reset_index()
df_nan.loc[np.random.randint(0,df.shape[0],25),['Math']]=np.nan
df_nan.head()

df_nan=df[['Math','School']].copy().reset_index()
df_nan.loc[np.random.randint(0,df.shape[0],25),['Math']]=np.nan
df_nan.head()
ID Math School
0 1101 34.0 S_1
1 1102 NaN S_1
2 1103 NaN S_1
3 1104 80.4 S_1
4 1105 84.8 S_1
df_nan.groupby('School').transform(lambda x: x.fillna(x.mean())).join(df.reset_index()['School']).head()
df_nan.groupby('School').transform(lambda x: x.fillna(x.mean())).join(df.reset_index()['School']).head()
ID Math School
0 1101 68.214286 S_1
1 1102 68.214286 S_1
2 1103 87.200000 S_1
3 1104 80.400000 S_1
4 1105 68.214286 S_1

四、apply函數

1. apply函數的靈活性

可能在所有的分組函數中,apply是應用最爲廣泛的,這得益於它的靈活性:

對於傳入值而言,從下面的打印內容可以看到是以分組的表傳入apply中:

df.groupby('School').apply(lambda x:print(x.head(5)))
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
2101    S_2   C_1      M  street_7     174      84  83.3       C
2102    S_2   C_1      F  street_6     161      61  50.6      B+
2103    S_2   C_1      M  street_4     157      61  52.5      B-
2104    S_2   C_1      F  street_5     159      97  72.2      B+
2105    S_2   C_1      M  street_4     170      81  34.2       A

apply函數的靈活性很大程度來源於其返回值的多樣性:

① 標量返回值

df[['School','Math','Height']].groupby('School').apply(lambda x:x.max())
df[['School','Math','Height']].groupby('School').apply(lambda x:print(x,x.max()))
display(df[['School','Math','Height']].groupby('School').agg(lambda x:x.max()))
df[['School','Math','Height']].groupby('School').apply(lambda x:x.max())
     School  Math  Height
ID                       
1101    S_1  34.0     173
1102    S_1  32.5     192
1103    S_1  87.2     186
1104    S_1  80.4     167
1105    S_1  84.8     159
1201    S_1  97.0     188
1202    S_1  63.5     176
1203    S_1  58.8     160
1204    S_1  33.8     162
1205    S_1  68.4     167
1301    S_1  31.5     161
1302    S_1  87.7     175
1303    S_1  49.7     188
1304    S_1  85.2     195
1305    S_1  61.7     187 School    S_1
Math       97
Height    195
dtype: object
     School  Math  Height
ID                       
2101    S_2  83.3     174
2102    S_2  50.6     161
2103    S_2  52.5     157
2104    S_2  72.2     159
2105    S_2  34.2     170
2201    S_2  39.1     193
2202    S_2  68.5     194
2203    S_2  73.8     155
2204    S_2  47.2     175
2205    S_2  85.4     183
2301    S_2  72.3     157
2302    S_2  32.7     171
2303    S_2  65.9     190
2304    S_2  95.5     164
2305    S_2  48.9     187
2401    S_2  45.3     192
2402    S_2  48.7     166
2403    S_2  59.7     158
2404    S_2  67.7     160
2405    S_2  47.6     193 School     S_2
Math      95.5
Height     194
dtype: object
Math Height
School
S_1 97.0 195
S_2 95.5 194
School Math Height
School
S_1 S_1 97.0 195
S_2 S_2 95.5 194

② 列表返回值

display(df[['School','Math','Height']].groupby('School').apply(lambda x:x-x.min()).head())
df[['School','Math','Height']].groupby('School').transform(lambda x:x-x.min()).head()
Math Height
ID
1101 2.5 14.0
1102 1.0 33.0
1103 55.7 27.0
1104 48.9 8.0
1105 53.3 0.0
Math Height
ID
1101 2.5 14
1102 1.0 33
1103 55.7 27
1104 48.9 8
1105 53.3 0

③ 數據框返回值

df[['School','Math','Height']].groupby('School')\
    .apply(lambda x:pd.DataFrame({'col1':x['Math']-x['Math'].max(),
                                  'col2':x['Math']-x['Math'].min(),
                                  'col3':x['Height']-x['Height'].max(),
                                  'col4':x['Height']-x['Height'].min()})).head()




df[['School','Math','Height']].groupby('School').apply(lambda x:pd.DataFrame({
    'col1':x['Math']-x['Math'].max(),
    'col2':x['Math']-x['Math'].min(),
    'col3':x['Height']-x['Height'].max(),
    'col4':x['Height']-x['Height'].min()
})).head()
col1 col2 col3 col4
ID
1101 -63.0 2.5 -22 14
1102 -64.5 1.0 -3 33
1103 -9.8 55.7 -9 27
1104 -16.6 48.9 -28 8
1105 -12.2 53.3 -36 0

2. 用apply同時統計多個指標

此處可以藉助OrderedDict工具進行快捷的統計:

from collections import OrderedDict
def f(df):
    data = OrderedDict()
    data['M_sum'] = df['Math'].sum()
    data['W_var'] = df['Weight'].var()
    data['H_mean'] = df['Height'].mean()
    print('data',data)
    print('series')
    print(pd.Series(data))
    return pd.Series(data)
grouped_single.apply(f)



from collections import OrderedDict
def f(df):
    data=OrderedDict()
    data['M_sum']=df['Math'].sum()
    data['W_var']=df['Weight'].var()
    data['H_mean']=df['Height'].mean()
    return pd.Series(data)
grouped_single.apply(f)
data OrderedDict([('M_sum', 956.2000000000002), ('W_var', 117.42857142857143), ('H_mean', 175.73333333333332)])
series
M_sum     956.200000
W_var     117.428571
H_mean    175.733333
dtype: float64
data OrderedDict([('M_sum', 1191.1), ('W_var', 181.08157894736837), ('H_mean', 172.95)])
series
M_sum     1191.100000
W_var      181.081579
H_mean     172.950000
dtype: float64
M_sum W_var H_mean
School
S_1 956.2 117.428571 175.733333
S_2 1191.1 181.081579 172.950000

五、問題與練習

1. 問題

【問題一】 什麼是fillna的前向/後向填充,如何實現?

df = pd.read_csv('data/table.csv',index_col='ID')
df.head(3)
df_nan = df[['Math','School']].copy().reset_index()
df_nan.loc[np.random.randint(0,df.shape[0],25),['Math']]=np.nan
df_nan.head()
df_nan.Math=df_nan.Math.fillna(method='bfill')
df_nan.head()
ID Math School
0 1101 34.0 S_1
1 1102 87.2 S_1
2 1103 87.2 S_1
3 1104 97.0 S_1
4 1105 97.0 S_1

fillna 的method方法可以控制參數的填充方式,是向上填充:將缺失值填充爲該列中它上一個未缺失值;向下填充相反

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

pad / ffill: 向下自動填充

backfill / bfill: 向上自動填充

【問題二】 下面的代碼實現了什麼功能?請仿照設計一個它的groupby版本。

s = pd.Series ([0, 1, 1, 0, 1, 1, 1, 0])
s1 = s.cumsum()
result = s.mul(s1).diff().where(lambda x: x < 0).ffill().add(s1,fill_value =0)

s1:將s序列求累加和 [0, 1, 2, 2, 3, 4, 5, 5]

s.mul(s1)😒 與s1累乘 [0, 1, 2, 0, 3, 4, 5, 0]

.diff() 求一階差分 [nan, 1.0, 1.0, -2.0, 3.0, 1.0, 1.0, -5.0]

.where(lambda x: x < 0) 值是否小於0:[nan, nan, nan, -2.0, nan, nan, nan, -5.0]

.ffill():向下填充 [nan, nan, nan, -2.0, -2.0, -2.0, -2.0, -5.0]

.add(s1,fill_value =0) 缺失值補0後與s1求和 [0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 3.0, 0.0]

list(s.mul(s1).diff().where(lambda x: x < 0).ffill().add(s1,fill_value =0))
gp =df.groupby('School')

gp.apply(lambda x:x['Math'].mul(x['Math'].cumsum()).diff().where(lambda m: m < 0).ffill().add(x['Math'].cumsum(),fill_value =0))
School  ID  
S_1     1101       34.00
        1102       66.50
        1103      153.70
        1104      234.10
        1105      318.90
        1201      415.90
        1202    -9421.00
        1203    -9362.20
        1204   -11740.56
        1205   -11672.16
        1301   -21966.61
        1302   -21878.91
        1303   -25585.41
        1304   -25500.21
        1305   -16257.66
S_2     2101       83.30
        2102      -29.65
        2103       22.85
        2104       95.05
        2105    -8364.36
        2201    -8325.26
        2202    -8256.76
        2203    -8182.96
        2204    -9864.48
        2205    -9779.08
        2301    -2042.69
        2302   -25111.27
        2303   -25045.37
        2304   -24949.87
        2305   -37377.81
        2401     -300.07
        2402     -251.37
        2403     -191.67
        2404     -123.97
        2405   -19527.49
Name: Math, dtype: float64

【問題三】 如何計算組內0.25分位數與0.75分位數?要求顯示在同一張表上。

【問題四】 既然索引已經能夠選出某些符合條件的子集,那麼filter函數的設計有什麼意義?

【問題五】 整合、變換、過濾三者在輸入輸出和功能上有何異同?

【問題六】 在帶參數的多函數聚合時,有辦法能夠繞過wrap技巧實現同樣功能嗎?

問題三

gp.apply(lambda x:pd.DataFrame({'q25':x.quantile(0.25),
                                  'q75':x.quantile(0.75)
                                       }))
q25 q75
School
S_1 Height 164.50 187.500
Weight 63.00 77.000
Math 41.85 85.000
S_2 Height 159.75 187.750
Weight 70.25 85.000
Math 47.50 72.225

問題四

filter函數是用來篩選組的,結果是組的全體

問題五

整合(Aggregation)分組計算統計量:輸入的是每組數據,輸出是每組的統計量,在列維度上是標量。

變換(Transformation)—即分組對每個單元的數據進行操作(如元素標準化):輸入的是每組數據,輸出是每組數據經過某種規則變換後的數據,不改變數據的維度。

過濾(Filtration)—即按照某些規則篩選出一些組:輸入的是每組數據,輸出的是滿足要求的組的所有數據。

問題六

2. 練習

【練習一】: 現有一份關於diamonds的數據集,列分別記錄了克拉數、顏色、開採深度、價格,請解決下列問題:

pd.read_csv('data/Diamonds.csv').head()
carat color depth price
0 0.23 E 61.5 326
1 0.21 E 59.8 326
2 0.23 E 56.9 327
3 0.29 I 62.4 334
4 0.31 J 63.3 335

(a) 在所有重量超過1克拉的鑽石中,價格的極差是多少?

(b) 若以開採深度的0.2\0.4\0.6\0.8分位數爲分組依據,每一組中鑽石顏色最多的是哪一種?該種顏色是組內平均而言單位重量最貴的嗎?

© 以重量分組(0-0.5,0.5-1,1-1.5,1.5-2,2+),按遞增的深度爲索引排序,求每組中連續的嚴格遞增價格序列長度的最大值。

(d) 請按顏色分組,分別計算價格關於克拉數的迴歸係數。(單變量的簡單線性迴歸,並只使用Pandas和Numpy完成)

a在所有重量超過1克拉的鑽石中,價格的極差是多少?

df=pd.read_csv('data/Diamonds.csv')
df.head()
carat color depth price
0 0.23 E 61.5 326
1 0.21 E 59.8 326
2 0.23 E 56.9 327
3 0.29 I 62.4 334
4 0.31 J 63.3 335
a=df[df['carat']>1]
a['price'].max()-a['price'].min()
17561
df_r=df.query('carat>1')['price']
df_r.max()-df_r.min()
17561

b若以開採深度的0.2\0.4\0.6\0.8分位數爲分組依據,每一組中鑽石顏色最多的是哪一種?該種顏色是組內平均而言單位重量最貴的嗎?

np.linspace(0,1,6)
array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
bins=df['depth'].quantile(np.linspace(0,1,6)).tolist()
df['cuts']=pd.cut(df['depth'],bins=bins)
color_result=df.groupby('cuts')['color'].describe()
color_result
count unique top freq
cuts
(43.0, 60.8] 11294 7 E 2259
(60.8, 61.6] 11831 7 G 2593
(61.6, 62.1] 10403 7 G 2247
(62.1, 62.7] 10137 7 G 2193
(62.7, 79.0] 10273 7 G 2000
color_result = df.groupby('cuts')['color'].describe()
color_result
count unique top freq
cuts
(43.0, 60.8] 11294 7 E 2259
(60.8, 61.6] 11831 7 G 2593
(61.6, 62.1] 10403 7 G 2247
(62.1, 62.7] 10137 7 G 2193
(62.7, 79.0] 10273 7 G 2000
df['均重價格']=df['price']/df['carat']
# color_result['top']=[]
color_result['top'].count()
for i in range(color_result['top'].count()):
    temp=color_result['top'].iloc[i]==df.groupby(['cuts','color'])['均重價格'].mean().groupby(['cuts']).idxmax().values[i][1]
    print(color_result.reset_index()['cuts'][i],temp)
# df.groupby(['cuts','color'])['均重價格'].mean().groupby(['cuts']).head(8)
(43.0, 60.8] False
(60.8, 61.6] False
(61.6, 62.1] False
(62.1, 62.7] True
(62.7, 79.0] True
df.groupby(['cuts','color'])['均重價格'].mean().head()
cuts          color
(43.0, 60.8]  D        4096.138305
              E        3929.625897
              F        4136.841550
              G        4295.283909
              H        4275.933161
Name: 均重價格, dtype: float64
df['均重價格']=df['price']/df['carat']
color_result['top']==[i[1] for i in df.groupby(['cuts','color'])['均重價格'].mean().groupby(['cuts']).idxmax().values]
cuts
(43.0, 60.8]    False
(60.8, 61.6]    False
(61.6, 62.1]    False
(62.1, 62.7]     True
(62.7, 79.0]     True
Name: top, dtype: bool

c以重量分組(0-0.5,0.5-1,1-1.5,1.5-2,2+),按遞增的深度爲索引排序,求每組中連續的嚴格遞增價格序列長度的最大值。

# df = df.drop(columns='均重價格')
# cuts = pd.cut(df['carat'],bins=[0,0.5,1,1.5,2,np.inf]) #可選label添加自定義標籤
# df['cuts'] = cuts
# df.head()


# df=df.drop(columns='均重價格')
cuts=pd.cut(df['carat'],bins=[0,0.5,1,1.5,2,np.inf])
type(cuts)
df['cuts']=cuts
df.head()
carat color depth price cuts
0 0.23 E 61.5 326 (0.0, 0.5]
1 0.21 E 59.8 326 (0.0, 0.5]
2 0.23 E 56.9 327 (0.0, 0.5]
3 0.29 I 62.4 334 (0.0, 0.5]
4 0.31 J 63.3 335 (0.0, 0.5]
def f(nums):
    if not nums:        
        return 0
    res = 1                            
    cur_len = 1                        
    for i in range(1, len(nums)):      
        if nums[i-1] < nums[i]:        
            cur_len += 1                
            res = max(cur_len, res)     
        else:                       
            cur_len = 1                 
    return res
def f2(nums):
    if not nums:
        return 0
    res=1
    temp=1
    for i in range(len(nums)):
        if nums[i-1]>=nums[i]:
            res=max(res,temp)
            temp=1
        else:
            temp+=1
    return max(res,temp)
df.groupby(['cuts']).head()
carat color depth price cuts
0 0.23 E 61.5 326 (0.0, 0.5]
1 0.21 E 59.8 326 (0.0, 0.5]
2 0.23 E 56.9 327 (0.0, 0.5]
3 0.29 I 62.4 334 (0.0, 0.5]
4 0.31 J 63.3 335 (0.0, 0.5]
90 0.70 E 62.5 2757 (0.5, 1.0]
91 0.86 E 55.1 2757 (0.5, 1.0]
92 0.70 G 61.6 2757 (0.5, 1.0]
93 0.71 E 62.4 2759 (0.5, 1.0]
94 0.78 G 63.8 2759 (0.5, 1.0]
172 1.17 J 60.2 2774 (1.0, 1.5]
215 1.01 F 61.8 2781 (1.0, 1.5]
241 1.01 E 64.5 2788 (1.0, 1.5]
242 1.01 H 62.7 2788 (1.0, 1.5]
247 1.05 J 63.2 2789 (1.0, 1.5]
2024 1.52 E 57.3 3105 (1.5, 2.0]
2025 1.52 E 57.3 3105 (1.5, 2.0]
3926 1.51 G 64.0 3497 (1.5, 2.0]
3955 1.52 H 64.9 3504 (1.5, 2.0]
4128 1.52 I 61.2 3541 (1.5, 2.0]
12246 2.06 J 61.2 5203 (2.0, inf]
13002 2.14 J 69.4 5405 (2.0, inf]
13118 2.15 J 65.5 5430 (2.0, inf]
13757 2.22 J 66.7 5607 (2.0, inf]
13991 2.01 I 67.4 5696 (2.0, inf]
for name,group in df.groupby('cuts'):
    group = group.sort_values(by='depth')
    s = group['price']
    print(name,f(s.tolist()))
    
    
# for name,group in df.groupby(['cuts']):
#     group=group.sort_values(by='depth')
#     s=group['price']
#     print(name,f(s.tolist()))
(0.0, 0.5] 8
(0.5, 1.0] 8
(1.0, 1.5] 7
(1.5, 2.0] 11
(2.0, inf] 7
for name,group in df.groupby(['cuts']):
    group=group.sort_values(by='depth')
    s=group['price']
    print(name,f2(s.tolist()))
(0.0, 0.5] 8
(0.5, 1.0] 8
(1.0, 1.5] 7
(1.5, 2.0] 11
(2.0, inf] 7

d請按顏色分組,分別計算價格關於克拉數的迴歸係數。(單變量的簡單線性迴歸,並只使用Pandas和Numpy完成)

for name,group in df[['carat','price','color']].groupby('color'):
    L1 = np.array([np.ones(group.shape[0]),group['carat']])#.reshape(2,group.shape[0])
#     print(np.shape(L1))
#     print(L1)
    L2 = group['price']
#     print('L2',np.shape(L2))
#     print(L2)
    result = (np.linalg.inv(L1.dot(L1.T)).dot(L1)).dot(L2).reshape(2,1)
    print('當顏色爲%s時,截距項爲:%f,迴歸係數爲:%f'%(name,result[0],result[1]))
    
    
    
# for name,group in df[['carat','price','color']].groupby('color'):
#     L1=np.array([np.ones(group.shape[0]),group['carat']])
#     L2=group['price']
#     result=np.linalg.inv(L1.dot(L1.T)).dot(L1).dot(L2)
#     print('當顏色爲%s時,截距項爲:%f,迴歸係數爲:%f'%(name ,result[0],result[1]))
    
# for name,group in df[['carat','price','color']].groupby('color'):
#     L1 = np.array([np.ones(group.shape[0]),group['carat']])#.reshape(2,group.shape[0])
# #     print(np.shape(L1))
# #     print(L1)
#     L2 = group['price']
# #     print('L2',np.shape(L2))
# #     print(L2)
#     result = np.linalg.inv(L1.dot(L1.T)).dot(L1).dot(L2)#.reshape(2,1)
#     print('當顏色爲%s時,截距項爲:%f,迴歸係數爲:%f'%(name,result[0],result[1]))
當顏色爲D時,截距項爲:-2361.017152,迴歸係數爲:8408.353126
當顏色爲E時,截距項爲:-2381.049600,迴歸係數爲:8296.212783
當顏色爲F時,截距項爲:-2665.806191,迴歸係數爲:8676.658344
當顏色爲G時,截距項爲:-2575.527643,迴歸係數爲:8525.345779
當顏色爲H時,截距項爲:-2460.418046,迴歸係數爲:7619.098320
當顏色爲I時,截距項爲:-2878.150356,迴歸係數爲:7761.041169
當顏色爲J時,截距項爲:-2920.603337,迴歸係數爲:7094.192092
當顏色爲D時,截距項爲:-2361.017152,迴歸係數爲:8408.353126
當顏色爲E時,截距項爲:-2381.049600,迴歸係數爲:8296.212783
當顏色爲F時,截距項爲:-2665.806191,迴歸係數爲:8676.658344
當顏色爲G時,截距項爲:-2575.527643,迴歸係數爲:8525.345779
當顏色爲H時,截距項爲:-2460.418046,迴歸係數爲:7619.098320
當顏色爲I時,截距項爲:-2878.150356,迴歸係數爲:7761.041169
當顏色爲J時,截距項爲:-2920.603337,迴歸係數爲:7094.192092
for name,group in df[['carat','price','color']].groupby('color'):
    L1 = np.array([np.ones(group.shape[0]),group['carat']])#.reshape(2,group.shape[0])
#     print(np.shape(L1))
#     print(L1)
    L2 = group['price']
#     print('L2',np.shape(L2))
#     print(L2)
    result = np.linalg.inv(L1.dot(L1.T)).dot(L1).dot(L2)#.reshape(2,1)
    print('當顏色爲%s時,截距項爲:%f,迴歸係數爲:%f'%(name,result[0],result[1]))
當顏色爲D時,截距項爲:-2361.017152,迴歸係數爲:8408.353126
當顏色爲E時,截距項爲:-2381.049600,迴歸係數爲:8296.212783
當顏色爲F時,截距項爲:-2665.806191,迴歸係數爲:8676.658344
當顏色爲G時,截距項爲:-2575.527643,迴歸係數爲:8525.345779
當顏色爲H時,截距項爲:-2460.418046,迴歸係數爲:7619.098320
當顏色爲I時,截距項爲:-2878.150356,迴歸係數爲:7761.041169
當顏色爲J時,截距項爲:-2920.603337,迴歸係數爲:7094.192092

【練習二】:有一份關於美國10年至17年的非法藥物數據集,列分別記錄了年份、州(5個)、縣、藥物類型、報告數量,請解決下列問題:

pd.read_csv('data/Drugs.csv').head()
YYYY State COUNTY SubstanceName DrugReports
0 2010 VA ACCOMACK Propoxyphene 1
1 2010 OH ADAMS Morphine 9
2 2010 PA ADAMS Methadone 2
3 2010 VA ALEXANDRIA CITY Heroin 5
4 2010 PA ALLEGHENY Hydromorphone 5

(a) 按照年份統計,哪個縣的報告數量最多?這個縣所屬的州在當年也是報告數最多的嗎?

(b) 從14年到15年,Heroin的數量增加最多的是哪一個州?它在這個州是所有藥物中增幅最大的嗎?若不是,請找出符合該條件的藥物。

a按照年份統計,哪個縣的報告數量最多?這個縣所屬的州在當年也是報告數最多的嗎?

df = pd.read_csv('data/Drugs.csv')
df.head()
df=pd.read_csv('data/Drugs.csv')
df.head()
YYYY State COUNTY SubstanceName DrugReports
0 2010 VA ACCOMACK Propoxyphene 1
1 2010 OH ADAMS Morphine 9
2 2010 PA ADAMS Methadone 2
3 2010 VA ALEXANDRIA CITY Heroin 5
4 2010 PA ALLEGHENY Hydromorphone 5
re=set()
for name ,group in df.groupby('YYYY'):
#     print(name)
    temp=group.groupby('COUNTY')['DrugReports'].sum()
    county=temp.idxmax()
    state=group[group['COUNTY']==county].iloc[0,1]
    temp2=group.groupby('State')['DrugReports'].sum()
    state_max=temp2.idxmax()
    print(name,county,state==state_max)

2010 PHILADELPHIA True
2011 PHILADELPHIA False
2012 PHILADELPHIA False
2013 PHILADELPHIA False
2014 PHILADELPHIA False
2015 PHILADELPHIA False
2016 HAMILTON True
2017 HAMILTON True
idx=pd.IndexSlice
for i in range(2010,2018):
    county = (df.groupby(['COUNTY','YYYY']).sum().loc[idx[:,i],:].idxmax()[0][0])
    state = df.query('COUNTY == "%s"'%county)['State'].iloc[0]
    state_true = df.groupby(['State','YYYY']).sum().loc[idx[:,i],:].idxmax()[0][0]
    if state==state_true:
        print('在%d年,%s縣的報告數最多,它所屬的州%s也是報告數最多的'%(i,county,state))
    else:
        print('在%d年,%s縣的報告數最多,但它所屬的州%s不是報告數最多的,%s州報告數最多'%(i,county,state,state_true))
        
        
idx=pd.IndexSlice
for i in range(2010,2018):
    county=df.groupby(['COUNTY','YYYY']).sum().loc[idx[:,i],:].idxmax()[0][0]
    state=df.query('COUNTY=="%s"'%county)['State'].iloc[0]
    state_true=df.groupby(['State','YYYY']).sum().loc[idx[:,i,:],:].idxmax()[0][0]
    print(i,county,state==state_true)
在2010年,PHILADELPHIA縣的報告數最多,它所屬的州PA也是報告數最多的
在2011年,PHILADELPHIA縣的報告數最多,但它所屬的州PA不是報告數最多的,OH州報告數最多
在2012年,PHILADELPHIA縣的報告數最多,但它所屬的州PA不是報告數最多的,OH州報告數最多
在2013年,PHILADELPHIA縣的報告數最多,但它所屬的州PA不是報告數最多的,OH州報告數最多
在2014年,PHILADELPHIA縣的報告數最多,但它所屬的州PA不是報告數最多的,OH州報告數最多
在2015年,PHILADELPHIA縣的報告數最多,但它所屬的州PA不是報告數最多的,OH州報告數最多
在2016年,HAMILTON縣的報告數最多,它所屬的州OH也是報告數最多的
在2017年,HAMILTON縣的報告數最多,它所屬的州OH也是報告數最多的
2010 PHILADELPHIA True
2011 PHILADELPHIA False
2012 PHILADELPHIA False
2013 PHILADELPHIA False
2014 PHILADELPHIA False
2015 PHILADELPHIA False
2016 HAMILTON True
2017 HAMILTON True

b從14年到15年,Heroin的數量增加最多的是哪一個州?它在這個州是所有藥物中增幅最大的嗎?若不是,請找出符合該條件的藥物。

df_b = df[(df['YYYY'].isin([2014,2015]))&(df['SubstanceName']=='Heroin')]
df_add = df_b.groupby(['YYYY','State']).sum()
(df_add.loc[2015]-df_add.loc[2014]).idxmax()


# df_b=df[(df['YYYY'].isin([2014,2015]))&(df['SubstanceName']=='Heroin')]
# df_add=df_b.groupby(['YYYY','State']).sum()
# (df_add.loc[2015]-df_add.loc[2014]).idxmax()
DrugReports    OH
dtype: object
for name ,group in df.groupby(['SubstanceName']):
    print(name)
    display(group)
3,4-Methylenedioxy U-47700
YYYY State COUNTY SubstanceName DrugReports
20766 2017 PA ALLEGHENY 3,4-Methylenedioxy U-47700 3
3-Fluorofentanyl
YYYY State COUNTY SubstanceName DrugReports
17946 2016 VA FRANKLIN 3-Fluorofentanyl 1
3-Methylfentanyl
YYYY State COUNTY SubstanceName DrugReports
16761 2016 OH ALLEN 3-Methylfentanyl 5
16853 2016 PA BEAVER 3-Methylfentanyl 7
16858 2016 OH BELMONT 3-Methylfentanyl 1
17123 2016 WV BERKELEY 3-Methylfentanyl 1
17151 2016 PA CAMBRIA 3-Methylfentanyl 1
... ... ... ... ... ...
23661 2017 OH OTTAWA 3-Methylfentanyl 8
23717 2017 OH MAHONING 3-Methylfentanyl 4
23801 2017 OH VAN WERT 3-Methylfentanyl 7
23868 2017 OH ROSS 3-Methylfentanyl 1
24005 2017 OH STARK 3-Methylfentanyl 1

102 rows × 5 columns

4-Fluoroisobutyryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
16784 2016 PA BERKS 4-Fluoroisobutyryl fentanyl 2
16787 2016 PA BLAIR 4-Fluoroisobutyryl fentanyl 2
16837 2016 PA ALLEGHENY 4-Fluoroisobutyryl fentanyl 4
16870 2016 PA BRADFORD 4-Fluoroisobutyryl fentanyl 1
17414 2016 VA ALEXANDRIA CITY 4-Fluoroisobutyryl fentanyl 1
... ... ... ... ... ...
23748 2017 KY MONTGOMERY 4-Fluoroisobutyryl fentanyl 1
23817 2017 VA WARREN 4-Fluoroisobutyryl fentanyl 1
23974 2017 VA ROANOKE 4-Fluoroisobutyryl fentanyl 1
24010 2017 OH SUMMIT 4-Fluoroisobutyryl fentanyl 7
24018 2017 OH TRUMBULL 4-Fluoroisobutyryl fentanyl 1

132 rows × 5 columns

4-Methylfentanyl
YYYY State COUNTY SubstanceName DrugReports
22232 2017 OH LUCAS 4-Methylfentanyl 5
ANPP
YYYY State COUNTY SubstanceName DrugReports
3546 2011 PA DAUPHIN ANPP 1
14060 2014 PA PHILADELPHIA ANPP 1
15584 2015 OH MONTGOMERY ANPP 1
16766 2016 OH ASHTABULA ANPP 1
17753 2016 OH CUYAHOGA ANPP 20
19415 2016 KY MONTGOMERY ANPP 1
19920 2016 PA PHILADELPHIA ANPP 1
20130 2016 OH SUMMIT ANPP 1
20905 2017 KY CLARK ANPP 2
20955 2017 KY FAYETTE ANPP 5
21007 2017 KY BOYD ANPP 3
21040 2017 PA CHESTER ANPP 1
21118 2017 PA DELAWARE ANPP 2
21308 2017 OH CUYAHOGA ANPP 169
21427 2017 OH GREENE ANPP 2
21484 2017 PA BUTLER ANPP 2
21572 2017 VA FAIRFAX ANPP 1
21680 2017 OH GUERNSEY ANPP 1
21724 2017 PA JUNIATA ANPP 2
21863 2017 OH LAWRENCE ANPP 2
21867 2017 PA LEBANON ANPP 1
21909 2017 KY JEFFERSON ANPP 4
21964 2017 PA LUZERNE ANPP 2
22136 2017 OH LAKE ANPP 57
22141 2017 PA LANCASTER ANPP 2
22487 2017 PA NORTHUMBERLAND ANPP 1
22753 2017 PA SCHUYLKILL ANPP 1
22859 2017 WV RALEIGH ANPP 1
22975 2017 VA PRINCE WILLIAM ANPP 1
23438 2017 VA STAFFORD ANPP 2
23529 2017 OH SUMMIT ANPP 22
23616 2017 OH MIAMI ANPP 2
23627 2017 KY MONTGOMERY ANPP 4
23673 2017 PA PHILADELPHIA ANPP 3
24060 2017 PA YORK ANPP 1
Acetyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
11567 2014 VA CHESTERFIELD Acetyl fentanyl 1
12067 2014 PA ELK Acetyl fentanyl 1
12324 2014 OH MARION Acetyl fentanyl 3
12346 2014 OH MONTGOMERY Acetyl fentanyl 1
12825 2014 PA GREENE Acetyl fentanyl 1
... ... ... ... ... ...
23904 2017 OH TRUMBULL Acetyl fentanyl 13
23989 2017 PA SCHUYLKILL Acetyl fentanyl 4
23994 2017 OH SENECA Acetyl fentanyl 4
24033 2017 PA WASHINGTON Acetyl fentanyl 2
24050 2017 WV WOOD Acetyl fentanyl 1

265 rows × 5 columns

Acetylcodeine
YYYY State COUNTY SubstanceName DrugReports
1761 2010 WV OHIO Acetylcodeine 5
1948 2010 PA PHILADELPHIA Acetylcodeine 1
3165 2011 VA FAIRFAX Acetylcodeine 1
10482 2013 PA PHILADELPHIA Acetylcodeine 1
13047 2014 PA PHILADELPHIA Acetylcodeine 2
19318 2016 PA PHILADELPHIA Acetylcodeine 3
Acetyldihydrocodeine
YYYY State COUNTY SubstanceName DrugReports
8480 2012 PA PHILADELPHIA Acetyldihydrocodeine 1
16220 2015 PA PHILADELPHIA Acetyldihydrocodeine 1
19452 2016 PA PHILADELPHIA Acetyldihydrocodeine 3
Acryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
16670 2016 OH ASHLAND Acryl fentanyl 3
17172 2016 OH CLARK Acryl fentanyl 1
17607 2016 OH ERIE Acryl fentanyl 1
18043 2016 OH CUYAHOGA Acryl fentanyl 5
18080 2016 OH FRANKLIN Acryl fentanyl 1
... ... ... ... ... ...
23983 2017 KY RUSSELL Acryl fentanyl 1
24017 2017 KY TRIMBLE Acryl fentanyl 1
24026 2017 OH VAN WERT Acryl fentanyl 5
24036 2017 OH WAYNE Acryl fentanyl 5
24041 2017 PA WESTMORELAND Acryl fentanyl 2

112 rows × 5 columns

Alphaprodine
YYYY State COUNTY SubstanceName DrugReports
1691 2010 PA PHILADELPHIA Alphaprodine 1
Benzylfentanyl
YYYY State COUNTY SubstanceName DrugReports
20042 2017 OH ATHENS Benzylfentanyl 1
20562 2017 OH BUTLER Benzylfentanyl 10
20767 2017 PA ALLEGHENY Benzylfentanyl 36
20906 2017 OH CLARK Benzylfentanyl 2
20950 2017 VA FAIRFAX Benzylfentanyl 1
21068 2017 OH CUYAHOGA Benzylfentanyl 72
21269 2017 OH FRANKLIN Benzylfentanyl 2
21877 2017 OH HAMILTON Benzylfentanyl 1
21927 2017 OH LAKE Benzylfentanyl 18
23750 2017 OH MONTGOMERY Benzylfentanyl 5
23810 2017 OH WARREN Benzylfentanyl 4
Buprenorphine
YYYY State COUNTY SubstanceName DrugReports
12 2010 OH ASHTABULA Buprenorphine 7
21 2010 KY BATH Buprenorphine 1
43 2010 OH BUTLER Buprenorphine 15
60 2010 VA CHARLOTTESVILLE CITY Buprenorphine 1
68 2010 PA CLEARFIELD Buprenorphine 11
... ... ... ... ... ...
24024 2017 PA UNION Buprenorphine 3
24027 2017 VA VIRGINIA BEACH CITY Buprenorphine 2
24042 2017 VA WESTMORELAND Buprenorphine 2
24044 2017 KY WHITLEY Buprenorphine 15
24051 2017 WV WOOD Buprenorphine 4

2524 rows × 5 columns

Butorphanol
YYYY State COUNTY SubstanceName DrugReports
2425 2010 VA VIRGINIA BEACH CITY Butorphanol 3
2979 2010 VA PETERSBURG CITY Butorphanol 1
8082 2012 VA WYTHE Butorphanol 1
8372 2012 VA SCOTT Butorphanol 1
9824 2013 PA MERCER Butorphanol 1
16365 2015 OH SCIOTO Butorphanol 1
Butyryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
12928 2014 PA MONROE Butyryl fentanyl 1
13574 2015 VA ALBEMARLE Butyryl fentanyl 1
13840 2015 PA ALLEGHENY Butyryl fentanyl 1
15041 2015 OH HAMILTON Butyryl fentanyl 6
15462 2015 PA LANCASTER Butyryl fentanyl 1
... ... ... ... ... ...
23446 2017 OH SUMMIT Butyryl fentanyl 3
23453 2017 OH TUSCARAWAS Butyryl fentanyl 2
23475 2017 WV WOOD Butyryl fentanyl 3
23496 2017 OH ROSS Butyryl fentanyl 1
23920 2017 PA WASHINGTON Butyryl fentanyl 17

70 rows × 5 columns

Carfentanil
YYYY State COUNTY SubstanceName DrugReports
16659 2016 OH ALLEN Carfentanil 3
16681 2016 PA BEAVER Carfentanil 4
16844 2016 OH ASHLAND Carfentanil 11
16873 2016 OH BROWN Carfentanil 2
17102 2016 PA ALLEGHENY Carfentanil 4
... ... ... ... ... ...
23846 2017 PA YORK Carfentanil 4
23869 2017 OH ROSS Carfentanil 3
23991 2017 OH SCIOTO Carfentanil 23
24021 2017 OH TUSCARAWAS Carfentanil 7
24032 2017 OH WASHINGTON Carfentanil 5

216 rows × 5 columns

Codeine
YYYY State COUNTY SubstanceName DrugReports
85 2010 PA DAUPHIN Codeine 3
86 2010 KY DAVIESS Codeine 1
113 2010 OH ASHLAND Codeine 1
119 2010 PA BEAVER Codeine 2
124 2010 OH BELMONT Codeine 1
... ... ... ... ... ...
23924 2017 OH WAYNE Codeine 1
23935 2017 KY WHITLEY Codeine 1
23949 2017 PA YORK Codeine 3
23975 2017 VA ROANOKE Codeine 2
24057 2017 VA WYTHE Codeine 1

1034 rows × 5 columns

Crotonyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
20492 2017 OH BUTLER Crotonyl fentanyl 4
20633 2017 OH CLARK Crotonyl fentanyl 1
20657 2017 OH CUYAHOGA Crotonyl fentanyl 18
22429 2017 OH MONTGOMERY Crotonyl fentanyl 14
23212 2017 OH WARREN Crotonyl fentanyl 1
23742 2017 OH MIAMI Crotonyl fentanyl 1
Cyclopentyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
21990 2017 KY FAYETTE Cyclopentyl fentanyl 1
Cyclopropyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
19745 2017 OH ADAMS Cyclopropyl fentanyl 6
19765 2017 OH ATHENS Cyclopropyl fentanyl 1
19771 2017 KY BATH Cyclopropyl fentanyl 2
19781 2017 KY BOYD Cyclopropyl fentanyl 3
20267 2016 KY WARREN Cyclopropyl fentanyl 1
... ... ... ... ... ...
23892 2017 OH STARK Cyclopropyl fentanyl 1
23942 2017 KY WOODFORD Cyclopropyl fentanyl 1
23959 2017 OH PREBLE Cyclopropyl fentanyl 1
23992 2017 KY SCOTT Cyclopropyl fentanyl 1
24022 2017 OH UNION Cyclopropyl fentanyl 1

89 rows × 5 columns

Cyclopropyl/Crotonyl Fentanyl
YYYY State COUNTY SubstanceName DrugReports
21310 2017 OH CUYAHOGA Cyclopropyl/Crotonyl Fentanyl 21
23023 2017 OH SUMMIT Cyclopropyl/Crotonyl Fentanyl 5
Desmethylprodine 
YYYY State COUNTY SubstanceName DrugReports
4542 2011 KY JEFFERSON Desmethylprodine 5
4948 2011 OH WARREN Desmethylprodine 3
5042 2011 OH SHELBY Desmethylprodine 1
5224 2011 OH MONTGOMERY Desmethylprodine 1
5467 2011 VA SCOTT Desmethylprodine 2
7555 2012 OH MIAMI Desmethylprodine 1
7562 2012 OH MONTGOMERY Desmethylprodine 1
9834 2013 OH MONTGOMERY Desmethylprodine 1
Dextropropoxyphene
YYYY State COUNTY SubstanceName DrugReports
9 2010 PA ARMSTRONG Dextropropoxyphene 1
329 2010 WV BERKELEY Dextropropoxyphene 1
663 2010 OH CLERMONT Dextropropoxyphene 1
666 2010 OH CLINTON Dextropropoxyphene 1
1038 2010 WV HARDY Dextropropoxyphene 2
1149 2010 WV JACKSON Dextropropoxyphene 1
1154 2010 WV KANAWHA Dextropropoxyphene 3
1240 2010 OH GREENE Dextropropoxyphene 2
1248 2010 OH HAMILTON Dextropropoxyphene 7
1442 2010 WV HANCOCK Dextropropoxyphene 2
1733 2010 WV MERCER Dextropropoxyphene 1
1774 2010 WV POCAHONTAS Dextropropoxyphene 1
1784 2010 WV RALEIGH Dextropropoxyphene 2
1939 2010 WV OHIO Dextropropoxyphene 2
2025 2010 OH MIAMI Dextropropoxyphene 3
2034 2010 OH MONTGOMERY Dextropropoxyphene 6
2101 2010 OH SHELBY Dextropropoxyphene 2
2165 2010 OH ROSS Dextropropoxyphene 2
2348 2010 OH UNION Dextropropoxyphene 1
2636 2010 OH WARREN Dextropropoxyphene 2
3194 2011 OH HAMILTON Dextropropoxyphene 2
3415 2010 PA SCHUYLKILL Dextropropoxyphene 1
3465 2010 WV WAYNE Dextropropoxyphene 1
4096 2011 WV MERCER Dextropropoxyphene 1
4198 2011 OH MONTGOMERY Dextropropoxyphene 1
4428 2011 WV KANAWHA Dextropropoxyphene 5
4439 2011 PA LAWRENCE Dextropropoxyphene 1
4699 2011 WV GILMER Dextropropoxyphene 2
4796 2011 WV UPSHUR Dextropropoxyphene 1
4899 2011 WV WOOD Dextropropoxyphene 1
5198 2011 WV MARION Dextropropoxyphene 1
5469 2011 OH SHELBY Dextropropoxyphene 1
5603 2011 PA WESTMORELAND Dextropropoxyphene 1
6252 2012 WV HARRISON Dextropropoxyphene 1
6587 2012 OH HAMILTON Dextropropoxyphene 1
6728 2012 WV MARION Dextropropoxyphene 1
7049 2012 WV KANAWHA Dextropropoxyphene 1
7567 2012 WV MORGAN Dextropropoxyphene 1
9048 2013 WV KANAWHA Dextropropoxyphene 1
9432 2013 WV MARION Dextropropoxyphene 1
11282 2014 PA BEAVER Dextropropoxyphene 1
12501 2014 OH HAMILTON Dextropropoxyphene 1
16489 2015 PA YORK Dextropropoxyphene 1
17672 2016 OH HAMILTON Dextropropoxyphene 3
18737 2016 WV KANAWHA Dextropropoxyphene 3
Dihydrocodeine
YYYY State COUNTY SubstanceName DrugReports
509 2010 PA ALLEGHENY Dihydrocodeine 1
1678 2010 WV OHIO Dihydrocodeine 1
2653 2010 PA YORK Dihydrocodeine 1
3344 2011 KY FRANKLIN Dihydrocodeine 1
6200 2012 VA CHESTERFIELD Dihydrocodeine 1
8258 2012 VA PRINCE WILLIAM Dihydrocodeine 1
11275 2014 OH ASHLAND Dihydrocodeine 1
12372 2014 VA HENRICO Dihydrocodeine 1
17772 2016 VA FAIRFAX Dihydrocodeine 1
19622 2016 OH LAKE Dihydrocodeine 1
21134 2017 VA FAIRFAX Dihydrocodeine 1
22371 2017 OH LICKING Dihydrocodeine 2
Dihydromorphone
YYYY State COUNTY SubstanceName DrugReports
77 2010 PA CRAWFORD Dihydromorphone 1
Fentanyl
YYYY State COUNTY SubstanceName DrugReports
31 2010 WV BOONE Fentanyl 1
36 2010 OH BROWN Fentanyl 1
72 2010 PA COLUMBIA Fentanyl 1
78 2010 PA CRAWFORD Fentanyl 2
98 2010 VA FAIRFAX Fentanyl 2
... ... ... ... ... ...
23987 2017 OH SANDUSKY Fentanyl 33
24025 2017 PA UNION Fentanyl 9
24039 2017 VA WAYNESBORO CITY Fentanyl 1
24045 2017 VA WINCHESTER CITY Fentanyl 4
24054 2017 OH WYANDOT Fentanyl 3

1484 rows × 5 columns

Fluorobutyryl fentanyl 
YYYY State COUNTY SubstanceName DrugReports
18711 2016 WV HARRISON Fluorobutyryl fentanyl 1
23018 2017 OH STARK Fluorobutyryl fentanyl 22
Fluorofentanyl
YYYY State COUNTY SubstanceName DrugReports
18136 2016 KY GRAYSON Fluorofentanyl 1
18973 2016 OH MONTGOMERY Fluorofentanyl 2
Fluoroisobutyryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
18210 2016 OH HAMILTON Fluoroisobutyryl fentanyl 2
20064 2017 PA BLAIR Fluoroisobutyryl fentanyl 4
20658 2017 OH CUYAHOGA Fluoroisobutyryl fentanyl 55
20947 2017 PA ERIE Fluoroisobutyryl fentanyl 2
21011 2017 PA BRADFORD Fluoroisobutyryl fentanyl 1
21022 2017 OH BUTLER Fluoroisobutyryl fentanyl 5
21074 2017 PA DAUPHIN Fluoroisobutyryl fentanyl 7
21083 2017 PA DELAWARE Fluoroisobutyryl fentanyl 1
21209 2017 VA CHESTERFIELD Fluoroisobutyryl fentanyl 1
21249 2017 PA ELK Fluoroisobutyryl fentanyl 1
21297 2017 OH CLERMONT Fluoroisobutyryl fentanyl 2
21346 2017 PA FAYETTE Fluoroisobutyryl fentanyl 1
21383 2017 VA FAIRFAX Fluoroisobutyryl fentanyl 5
21489 2017 PA CAMBRIA Fluoroisobutyryl fentanyl 1
21589 2017 VA FREDERICK Fluoroisobutyryl fentanyl 3
21685 2017 OH HAMILTON Fluoroisobutyryl fentanyl 34
21766 2017 VA LOUDOUN Fluoroisobutyryl fentanyl 1
21855 2017 PA LACKAWANNA Fluoroisobutyryl fentanyl 2
21942 2017 PA LEBANON Fluoroisobutyryl fentanyl 1
21967 2017 PA LUZERNE Fluoroisobutyryl fentanyl 5
22094 2017 VA HENRICO Fluoroisobutyryl fentanyl 2
22156 2017 PA LEHIGH Fluoroisobutyryl fentanyl 1
22646 2017 PA MERCER Fluoroisobutyryl fentanyl 3
22874 2017 PA JUNIATA Fluoroisobutyryl fentanyl 1
22885 2017 PA LANCASTER Fluoroisobutyryl fentanyl 2
22997 2017 OH SCIOTO Fluoroisobutyryl fentanyl 1
23250 2017 PA WYOMING Fluoroisobutyryl fentanyl 1
23275 2017 OH SUMMIT Fluoroisobutyryl fentanyl 3
23332 2017 KY MONTGOMERY Fluoroisobutyryl fentanyl 1
23563 2017 OH WAYNE Fluoroisobutyryl fentanyl 1
23626 2017 PA MONROE Fluoroisobutyryl fentanyl 1
23649 2017 PA NORTHAMPTON Fluoroisobutyryl fentanyl 1
23656 2017 KY OLDHAM Fluoroisobutyryl fentanyl 3
Furanyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
15305 2015 OH FRANKLIN Furanyl fentanyl 1
16128 2015 PA NORTHAMPTON Furanyl fentanyl 1
16499 2016 VA ALEXANDRIA CITY Furanyl fentanyl 1
16677 2016 OH AUGLAIZE Furanyl fentanyl 1
16797 2016 PA BRADFORD Furanyl fentanyl 4
... ... ... ... ... ...
23950 2017 PA YORK Furanyl fentanyl 29
23957 2017 VA PORTSMOUTH CITY Furanyl fentanyl 1
24009 2017 VA SUFFOLK CITY Furanyl fentanyl 2
24023 2017 OH UNION Furanyl fentanyl 1
24034 2017 PA WASHINGTON Furanyl fentanyl 12

341 rows × 5 columns

Furanyl/3-Furanyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
19054 2016 WV RALEIGH Furanyl/3-Furanyl fentanyl 1
22665 2017 WV MORGAN Furanyl/3-Furanyl fentanyl 2
23077 2017 WV NICHOLAS Furanyl/3-Furanyl fentanyl 3
23246 2017 WV WOOD Furanyl/3-Furanyl fentanyl 1
Heroin
YYYY State COUNTY SubstanceName DrugReports
3 2010 VA ALEXANDRIA CITY Heroin 5
7 2010 VA AMELIA Heroin 1
8 2010 VA ARLINGTON Heroin 41
14 2010 OH ATHENS Heroin 72
16 2010 OH AUGLAIZE Heroin 35
... ... ... ... ... ...
23981 2017 OH ROSS Heroin 67
24008 2017 VA STAUNTON CITY Heroin 1
24030 2017 VA WARREN Heroin 73
24056 2017 PA WYOMING Heroin 20
24061 2017 VA YORK Heroin 48

2727 rows × 5 columns

Hydrocodeinone
YYYY State COUNTY SubstanceName DrugReports
9953 2013 PA NORTHUMBERLAND Hydrocodeinone 1
9971 2013 PA LEHIGH Hydrocodeinone 1
14995 2015 PA ERIE Hydrocodeinone 1
Hydrocodone
YYYY State COUNTY SubstanceName DrugReports
13 2010 OH ASHTABULA Hydrocodone 21
22 2010 KY BATH Hydrocodone 7
23 2010 PA BEDFORD Hydrocodone 3
27 2010 WV BERKELEY Hydrocodone 6
33 2010 WV BRAXTON Hydrocodone 35
... ... ... ... ... ...
23979 2017 KY ROCKCASTLE Hydrocodone 8
23993 2017 KY SCOTT Hydrocodone 1
24028 2017 KY WARREN Hydrocodone 11
24055 2017 OH WYANDOT Hydrocodone 6
24058 2017 VA WYTHE Hydrocodone 19

2979 rows × 5 columns

Hydromorphone
YYYY State COUNTY SubstanceName DrugReports
4 2010 PA ALLEGHENY Hydromorphone 5
40 2010 PA BUCKS Hydromorphone 5
42 2010 KY BULLITT Hydromorphone 2
63 2010 VA CHESTERFIELD Hydromorphone 5
65 2010 KY CHRISTIAN Hydromorphone 7
... ... ... ... ... ...
23797 2017 OH TUSCARAWAS Hydromorphone 2
23938 2017 VA WISE Hydromorphone 1
23972 2017 VA RICHMOND Hydromorphone 2
24000 2017 VA SMYTH Hydromorphone 1
24046 2017 VA WINCHESTER CITY Hydromorphone 1

1191 rows × 5 columns

Isobutyryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
21879 2017 OH HAMILTON Isobutyryl fentanyl 1
23323 2017 WV WOOD Isobutyryl fentanyl 3
Levorphanol
YYYY State COUNTY SubstanceName DrugReports
10849 2014 PA ALLEGHENY Levorphanol 1
17362 2016 PA ALLEGHENY Levorphanol 1
MT-45
YYYY State COUNTY SubstanceName DrugReports
9986 2013 PA LYCOMING MT-45 1
19038 2016 PA PHILADELPHIA MT-45 1
Meperidine
YYYY State COUNTY SubstanceName DrugReports
24 2010 VA BEDFORD Meperidine 1
324 2010 KY BARREN Meperidine 1
602 2010 OH ADAMS Meperidine 1
617 2010 OH ATHENS Meperidine 2
746 2010 PA BUCKS Meperidine 4
... ... ... ... ... ...
20880 2017 PA DELAWARE Meperidine 1
21810 2017 VA HENRY Meperidine 1
22175 2017 KY JEFFERSON Meperidine 1
23636 2017 VA MONTGOMERY Meperidine 1
23955 2017 VA PITTSYLVANIA Meperidine 1

79 rows × 5 columns

Metazocine
YYYY State COUNTY SubstanceName DrugReports
23780 2017 PA PHILADELPHIA Metazocine 1
Methadone
YYYY State COUNTY SubstanceName DrugReports
2 2010 PA ADAMS Methadone 2
17 2010 OH AUGLAIZE Methadone 1
34 2010 KY BREATHITT Methadone 13
39 2010 VA BUCKINGHAM Methadone 2
41 2010 PA BUCKS Methadone 32
... ... ... ... ... ...
23944 2017 KY WOODFORD Methadone 2
23951 2017 PA YORK Methadone 6
23982 2017 OH ROSS Methadone 2
24031 2017 VA WARREN Methadone 4
24038 2017 PA WAYNE Methadone 1

1795 rows × 5 columns

Methorphan
YYYY State COUNTY SubstanceName DrugReports
81 2010 OH CUYAHOGA Methorphan 2
196 2010 VA FAIRFAX Methorphan 1
198 2010 VA FAIRFAX CITY Methorphan 2
290 2010 PA DELAWARE Methorphan 6
510 2010 PA ALLEGHENY Methorphan 2
... ... ... ... ... ...
22490 2017 KY OLDHAM Methorphan 1
22676 2017 KY MCCRACKEN Methorphan 1
22722 2017 VA POWHATAN Methorphan 1
22840 2017 PA PHILADELPHIA Methorphan 2
23222 2017 PA WASHINGTON Methorphan 1

134 rows × 5 columns

Methoxyacetyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
19748 2017 PA ADAMS Methoxyacetyl fentanyl 2
19850 2017 PA ALLEGHENY Methoxyacetyl fentanyl 58
20084 2017 OH BUTLER Methoxyacetyl fentanyl 5
20305 2017 VA ALBEMARLE Methoxyacetyl fentanyl 1
20345 2017 KY BOONE Methoxyacetyl fentanyl 2
20514 2017 PA ARMSTRONG Methoxyacetyl fentanyl 1
20519 2017 OH ASHLAND Methoxyacetyl fentanyl 1
20605 2017 KY CAMPBELL Methoxyacetyl fentanyl 1
20635 2017 OH CLARK Methoxyacetyl fentanyl 5
20834 2017 OH CARROLL Methoxyacetyl fentanyl 1
20865 2017 OH CUYAHOGA Methoxyacetyl fentanyl 70
20940 2017 PA DELAWARE Methoxyacetyl fentanyl 1
21047 2017 KY CLARK Methoxyacetyl fentanyl 1
21236 2017 PA DAUPHIN Methoxyacetyl fentanyl 1
21299 2017 OH CLERMONT Methoxyacetyl fentanyl 1
21332 2017 PA ERIE Methoxyacetyl fentanyl 1
21421 2017 VA GOOCHLAND Methoxyacetyl fentanyl 1
21428 2017 OH GREENE Methoxyacetyl fentanyl 8
21567 2017 OH ERIE Methoxyacetyl fentanyl 2
21658 2017 KY JESSAMINE Methoxyacetyl fentanyl 2
21824 2017 PA HUNTINGDON Methoxyacetyl fentanyl 1
21836 2017 KY JEFFERSON Methoxyacetyl fentanyl 2
21931 2017 OH LAKE Methoxyacetyl fentanyl 12
21951 2017 OH LICKING Methoxyacetyl fentanyl 3
21969 2017 PA LUZERNE Methoxyacetyl fentanyl 1
21991 2017 KY FAYETTE Methoxyacetyl fentanyl 5
22009 2017 OH FRANKLIN Methoxyacetyl fentanyl 2
22045 2017 OH HANCOCK Methoxyacetyl fentanyl 3
22046 2017 VA HANOVER Methoxyacetyl fentanyl 3
22079 2017 OH HAMILTON Methoxyacetyl fentanyl 73
22272 2017 OH FAIRFIELD Methoxyacetyl fentanyl 1
22347 2017 VA HENRY Methoxyacetyl fentanyl 2
22497 2017 KY OWEN Methoxyacetyl fentanyl 1
22601 2017 OH LORAIN Methoxyacetyl fentanyl 2
22786 2017 OH MEDINA Methoxyacetyl fentanyl 2
22941 2017 KY MASON Methoxyacetyl fentanyl 2
23032 2017 OH TRUMBULL Methoxyacetyl fentanyl 2
23047 2017 PA WASHINGTON Methoxyacetyl fentanyl 10
23240 2017 KY WOLFE Methoxyacetyl fentanyl 1
23304 2017 OH WASHINGTON Methoxyacetyl fentanyl 1
23337 2017 OH MONTGOMERY Methoxyacetyl fentanyl 8
23402 2017 OH ROSS Methoxyacetyl fentanyl 1
23435 2017 PA SNYDER Methoxyacetyl fentanyl 2
23483 2017 PA YORK Methoxyacetyl fentanyl 1
23522 2017 VA SPOTSYLVANIA Methoxyacetyl fentanyl 2
23553 2017 OH WARREN Methoxyacetyl fentanyl 1
23956 2017 OH PORTAGE Methoxyacetyl fentanyl 3
23967 2017 WV RALEIGH Methoxyacetyl fentanyl 2
23968 2017 OH RICHLAND Methoxyacetyl fentanyl 2
24011 2017 OH SUMMIT Methoxyacetyl fentanyl 9
Mitragynine
YYYY State COUNTY SubstanceName DrugReports
5159 2011 KY KENTON Mitragynine 1
5976 2012 OH CUYAHOGA Mitragynine 1
6405 2012 KY DAVIESS Mitragynine 1
6456 2012 PA BERKS Mitragynine 3
6703 2012 PA LEHIGH Mitragynine 1
7320 2012 PA SCHUYLKILL Mitragynine 1
7451 2012 OH MONTGOMERY Mitragynine 9
7647 2012 KY MCCRACKEN Mitragynine 1
8018 2013 KY BULLITT Mitragynine 1
8237 2013 OH BUTLER Mitragynine 1
9354 2013 OH GREENE Mitragynine 3
9611 2013 KY KENTON Mitragynine 5
9680 2013 KY DAVIESS Mitragynine 2
10022 2013 OH MONTGOMERY Mitragynine 35
10632 2013 OH WARREN Mitragynine 2
11963 2014 OH CUYAHOGA Mitragynine 2
12579 2014 KY DAVIESS Mitragynine 2
12663 2014 KY KENTON Mitragynine 2
12737 2014 WV MONONGALIA Mitragynine 2
12757 2014 VA NORFOLK CITY Mitragynine 3
12932 2014 OH MONTGOMERY Mitragynine 1
12987 2014 KY MADISON Mitragynine 1
14027 2014 OH MIAMI Mitragynine 2
14616 2015 PA COLUMBIA Mitragynine 1
14837 2015 OH CUYAHOGA Mitragynine 1
15345 2015 OH HOCKING Mitragynine 1
15449 2015 KY JEFFERSON Mitragynine 1
15585 2015 OH MONTGOMERY Mitragynine 1
15729 2015 KY MCCRACKEN Mitragynine 1
15947 2015 OH LAKE Mitragynine 1
15973 2015 KY LYON Mitragynine 1
17003 2015 KY WARREN Mitragynine 1
17163 2016 KY CARROLL Mitragynine 1
17568 2016 KY FAYETTE Mitragynine 2
17989 2016 KY HOPKINS Mitragynine 1
18046 2016 OH CUYAHOGA Mitragynine 8
18302 2016 OH GREENE Mitragynine 1
18536 2016 KY KENTON Mitragynine 1
18597 2016 OH MIAMI Mitragynine 1
18734 2016 KY JEFFERSON Mitragynine 29
19209 2016 KY NELSON Mitragynine 1
19329 2016 VA PORTSMOUTH CITY Mitragynine 1
19420 2016 OH MONTGOMERY Mitragynine 5
19571 2016 KY WARREN Mitragynine 1
19573 2016 OH WARREN Mitragynine 3
19623 2016 OH LAKE Mitragynine 9
21139 2017 KY FAYETTE Mitragynine 1
21538 2017 OH CUYAHOGA Mitragynine 29
21837 2017 KY JEFFERSON Mitragynine 1
22024 2017 OH GEAUGA Mitragynine 1
22138 2017 OH LAKE Mitragynine 2
22334 2017 WV HANCOCK Mitragynine 1
23813 2017 OH WARREN Mitragynine 3
23894 2017 OH SUMMIT Mitragynine 3
Morphine
YYYY State COUNTY SubstanceName DrugReports
1 2010 OH ADAMS Morphine 9
20 2010 KY BARREN Morphine 3
28 2010 PA BERKS Morphine 5
35 2010 KY BRECKINRIDGE Morphine 1
59 2010 KY CASEY Morphine 1
... ... ... ... ... ...
24003 2017 VA STAFFORD Morphine 6
24007 2017 OH STARK Morphine 16
24016 2017 PA SUSQUEHANNA Morphine 1
24043 2017 VA WESTMORELAND Morphine 3
24053 2017 KY WOODFORD Morphine 1

2102 rows × 5 columns

Nalbuphine
YYYY State COUNTY SubstanceName DrugReports
2796 2011 VA CHESTERFIELD Nalbuphine 1
Opiates
YYYY State COUNTY SubstanceName DrugReports
589 2010 OH DELAWARE Opiates 2
596 2010 OH FAIRFIELD Opiates 1
1300 2010 OH LICKING Opiates 14
2136 2010 OH PERRY Opiates 1
4565 2011 OH LICKING Opiates 10
7476 2012 OH PERRY Opiates 1
7759 2012 OH LICKING Opiates 11
8966 2013 OH COSHOCTON Opiates 1
9557 2013 OH FAIRFIELD Opiates 1
10183 2013 OH LICKING Opiates 5
10664 2013 OH PERRY Opiates 1
15668 2015 PA MONTGOMERY Opiates 1
17847 2016 OH CUYAHOGA Opiates 9
20106 2016 PA SCHUYLKILL Opiates 1
20147 2016 PA WASHINGTON Opiates 1
Opium
YYYY State COUNTY SubstanceName DrugReports
693 2010 VA FAIRFAX Opium 1
4013 2011 PA PHILADELPHIA Opium 1
4334 2011 VA LOUDOUN Opium 1
7296 2012 VA PRINCE WILLIAM Opium 1
8682 2013 VA FAIRFAX Opium 14
10303 2013 VA LOUDOUN Opium 1
10387 2013 OH PORTAGE Opium 1
11602 2014 VA FAIRFAX Opium 2
13992 2014 VA LOUDOUN Opium 1
16629 2015 VA WASHINGTON Opium 1
18278 2016 VA LOUDOUN Opium 2
21256 2017 VA FAIRFAX Opium 2
21662 2017 KY KENTON Opium 1
22243 2017 VA MANASSAS CITY Opium 1
23016 2017 VA STAFFORD Opium 1
Oxycodone
YYYY State COUNTY SubstanceName DrugReports
5 2010 KY ALLEN Oxycodone 15
10 2010 OH ASHLAND Oxycodone 45
19 2010 WV BARBOUR Oxycodone 1
25 2010 KY BELL Oxycodone 148
30 2010 KY BOONE Oxycodone 56
... ... ... ... ... ...
23997 2017 VA SHENANDOAH Oxycodone 14
24014 2017 VA SURRY Oxycodone 2
24019 2017 OH TRUMBULL Oxycodone 83
24040 2017 WV WEBSTER Oxycodone 3
24047 2017 VA WISE Oxycodone 23

3124 rows × 5 columns

Oxymorphone
YYYY State COUNTY SubstanceName DrugReports
6 2010 KY ALLEN Oxymorphone 1
11 2010 OH ASHLAND Oxymorphone 2
26 2010 KY BELL Oxymorphone 3
46 2010 PA BUTLER Oxymorphone 2
83 2010 OH CUYAHOGA Oxymorphone 4
... ... ... ... ... ...
23996 2017 OH SHELBY Oxymorphone 1
23998 2017 VA SHENANDOAH Oxymorphone 1
24004 2017 VA STAFFORD Oxymorphone 22
24012 2017 OH SUMMIT Oxymorphone 4
24048 2017 VA WISE Oxymorphone 2

1182 rows × 5 columns

Pentazocine
YYYY State COUNTY SubstanceName DrugReports
699 2010 VA FAUQUIER Pentazocine 2
1564 2010 KY JEFFERSON Pentazocine 3
1854 2010 WV NICHOLAS Pentazocine 1
2030 2010 WV MINGO Pentazocine 1
2771 2011 WV BRAXTON Pentazocine 1
2862 2011 KY BULLITT Pentazocine 1
2954 2010 KY MUHLENBERG Pentazocine 1
3073 2010 VA WYTHE Pentazocine 1
3347 2011 OH FRANKLIN Pentazocine 1
3592 2011 VA GREENSVILLE Pentazocine 1
3723 2011 PA DELAWARE Pentazocine 1
3789 2011 KY JEFFERSON Pentazocine 2
4292 2011 WV RALEIGH Pentazocine 1
5053 2011 OH SUMMIT Pentazocine 1
5468 2011 VA SCOTT Pentazocine 1
5498 2011 KY WARREN Pentazocine 1
6886 2012 KY JEFFERSON Pentazocine 1
7389 2012 OH OTTAWA Pentazocine 1
7538 2012 VA TAZEWELL Pentazocine 2
7580 2012 KY OHIO Pentazocine 2
7610 2012 VA ROANOKE Pentazocine 2
7805 2012 OH MORROW Pentazocine 1
8208 2013 OH ASHLAND Pentazocine 1
8344 2013 VA BEDFORD Pentazocine 1
8759 2012 PA WARREN Pentazocine 1
8898 2013 PA CRAWFORD Pentazocine 1
10001 2013 OH MEDINA Pentazocine 1
10300 2013 WV LOGAN Pentazocine 1
10470 2013 KY OHIO Pentazocine 1
10611 2013 VA SUFFOLK CITY Pentazocine 1
10736 2013 WV WEBSTER Pentazocine 1
11775 2014 OH CRAWFORD Pentazocine 1
11871 2014 KY BARREN Pentazocine 1
12527 2014 KY JEFFERSON Pentazocine 2
12662 2014 WV KANAWHA Pentazocine 1
12686 2014 OH LAWRENCE Pentazocine 1
13392 2014 OH PICKAWAY Pentazocine 1
13524 2014 VA TAZEWELL Pentazocine 1
13621 2014 OH WARREN Pentazocine 1
13635 2014 OH WOOD Pentazocine 1
13863 2015 PA BEAVER Pentazocine 1
14510 2015 OH BUTLER Pentazocine 1
18011 2016 VA CARROLL Pentazocine 1
18982 2016 OH MUSKINGUM Pentazocine 1
21668 2017 KY KNOTT Pentazocine 1
22242 2017 OH MAHONING Pentazocine 1
23214 2017 OH WARREN Pentazocine 1
Pethidine
YYYY State COUNTY SubstanceName DrugReports
400 2010 PA DELAWARE Pethidine 1
1058 2010 KY KENTON Pethidine 1
1416 2010 PA FAYETTE Pethidine 2
1466 2010 KY JEFFERSON Pethidine 2
2418 2010 KY TRIGG Pethidine 1
4337 2011 VA LOUDOUN Pethidine 1
4506 2011 OH NOBLE Pethidine 1
5388 2011 KY SIMPSON Pethidine 1
6540 2012 PA DAUPHIN Pethidine 1
11283 2014 PA BEAVER Pethidine 1
15255 2015 KY LIVINGSTON Pethidine 1
22205 2017 PA LANCASTER Pethidine 1
Phenyl fentanyl
YYYY State COUNTY SubstanceName DrugReports
20030 2017 PA ALLEGHENY Phenyl fentanyl 9
20868 2017 OH CUYAHOGA Phenyl fentanyl 4
21093 2017 OH COLUMBIANA Phenyl fentanyl 1
21117 2017 OH DELAWARE Phenyl fentanyl 1
21569 2017 OH ERIE Phenyl fentanyl 5
22201 2017 OH LAKE Phenyl fentanyl 26
22289 2017 OH FRANKLIN Phenyl fentanyl 1
22911 2017 OH LOGAN Phenyl fentanyl 1
23089 2017 OH OTTAWA Phenyl fentanyl 1
23198 2017 OH TRUMBULL Phenyl fentanyl 3
23230 2017 OH WAYNE Phenyl fentanyl 1
23612 2017 OH MEDINA Phenyl fentanyl 3
23655 2017 WV OHIO Phenyl fentanyl 1
23915 2017 OH WARREN Phenyl fentanyl 1
24013 2017 OH SUMMIT Phenyl fentanyl 6
Propoxyphene
YYYY State COUNTY SubstanceName DrugReports
0 2010 VA ACCOMACK Propoxyphene 1
15 2010 OH ATHENS Propoxyphene 1
18 2010 OH AUGLAIZE Propoxyphene 2
29 2010 VA BLAND Propoxyphene 1
53 2010 PA CARBON Propoxyphene 1
... ... ... ... ... ...
22212 2017 KY LAUREL Propoxyphene 1
22552 2017 VA RUSSELL Propoxyphene 1
22700 2017 OH MONTGOMERY Propoxyphene 1
22887 2017 PA LANCASTER Propoxyphene 1
23596 2017 OH MAHONING Propoxyphene 1

335 rows × 5 columns

Remifentanil
YYYY State COUNTY SubstanceName DrugReports
5099 2012 PA ALLEGHENY Remifentanil 1
Tetrahydrofuran fentanyl
YYYY State COUNTY SubstanceName DrugReports
17511 2016 OH CLARK Tetrahydrofuran fentanyl 2
19660 2016 OH MADISON Tetrahydrofuran fentanyl 1
21839 2017 OH JEFFERSON Tetrahydrofuran fentanyl 1
22906 2017 OH LICKING Tetrahydrofuran fentanyl 1
23255 2017 PA YORK Tetrahydrofuran fentanyl 1
Thebaine
YYYY State COUNTY SubstanceName DrugReports
2006 2010 WV MARION Thebaine 1
2355 2010 KY WARREN Thebaine 1
5359 2011 PA PIKE Thebaine 2
14304 2015 OH BUTLER Thebaine 2
23271 2017 OH STARK Thebaine 5
Tramadol
YYYY State COUNTY SubstanceName DrugReports
106 2010 VA ALLEGHANY Tramadol 1
150 2010 KY CARTER Tramadol 2
158 2010 VA CLARKE Tramadol 1
173 2010 PA CRAWFORD Tramadol 1
185 2010 PA DELAWARE Tramadol 1
... ... ... ... ... ...
24029 2017 KY WARREN Tramadol 1
24035 2017 VA WASHINGTON Tramadol 1
24049 2017 OH WOOD Tramadol 1
24052 2017 WV WOOD Tramadol 3
24059 2017 VA WYTHE Tramadol 5

1384 rows × 5 columns

U-47700
YYYY State COUNTY SubstanceName DrugReports
16760 2016 PA ALLEGHENY U-47700 19
16847 2016 OH ASHTABULA U-47700 5
16883 2016 OH BUTLER U-47700 2
17042 2016 VA ARLINGTON U-47700 2
17184 2016 PA CLINTON U-47700 1
... ... ... ... ... ...
23895 2017 OH SUMMIT U-47700 41
23916 2017 PA WARREN U-47700 4
23917 2017 VA WARREN U-47700 1
24002 2017 PA SNYDER U-47700 1
24037 2017 OH WAYNE U-47700 7

195 rows × 5 columns

U-48800
YYYY State COUNTY SubstanceName DrugReports
20175 2017 OH ALLEN U-48800 9
20505 2017 PA ALLEGHENY U-48800 10
20785 2017 OH AUGLAIZE U-48800 2
20935 2017 OH DEFIANCE U-48800 1
21293 2017 OH CLARK U-48800 1
21627 2017 OH HARDIN U-48800 1
21933 2017 OH LAKE U-48800 5
22663 2017 OH MORGAN U-48800 1
22833 2017 OH OTTAWA U-48800 1
23584 2017 PA LUZERNE U-48800 2
23633 2017 OH MONTGOMERY U-48800 2
U-49900
YYYY State COUNTY SubstanceName DrugReports
20325 2017 OH ASHTABULA U-49900 1
21151 2017 OH FRANKLIN U-49900 5
21447 2017 OH HANCOCK U-49900 1
21645 2017 OH HURON U-49900 2
21773 2017 OH LUCAS U-49900 4
23243 2017 OH WOOD U-49900 3
23896 2017 OH SUMMIT U-49900 1
U-51754
YYYY State COUNTY SubstanceName DrugReports
21107 2017 OH CUYAHOGA U-51754 7
Valeryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
16656 2016 PA ALLEGHENY Valeryl fentanyl 4
16684 2016 PA BERKS Valeryl fentanyl 1
17126 2016 PA BLAIR Valeryl fentanyl 1
18174 2016 OH HURON Valeryl fentanyl 2
18448 2016 PA FAYETTE Valeryl fentanyl 1
19234 2016 PA PHILADELPHIA Valeryl fentanyl 1
19734 2016 PA WESTMORELAND Valeryl fentanyl 2
19991 2016 PA VENANGO Valeryl fentanyl 1
20107 2016 PA SCHUYLKILL Valeryl fentanyl 3
20181 2017 PA ARMSTRONG Valeryl fentanyl 1
20429 2016 PA YORK Valeryl fentanyl 3
20803 2017 PA BLAIR Valeryl fentanyl 1
cis-3-methylfentanyl
YYYY State COUNTY SubstanceName DrugReports
16655 2016 PA ALLEGHENY cis-3-methylfentanyl 14
20503 2017 PA ALLEGHENY cis-3-methylfentanyl 9
20614 2017 KY CARTER cis-3-methylfentanyl 2
o-Fluorofentanyl
YYYY State COUNTY SubstanceName DrugReports
17893 2016 OH GREENE o-Fluorofentanyl 1
18227 2016 OH HIGHLAND o-Fluorofentanyl 1
18880 2016 OH MARION o-Fluorofentanyl 1
19003 2016 OH MONTGOMERY o-Fluorofentanyl 1
19297 2016 VA NELSON o-Fluorofentanyl 1
19975 2016 VA STAFFORD o-Fluorofentanyl 2
p-Fluorobutyryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
16481 2015 OH WOOD p-Fluorobutyryl fentanyl 2
17157 2016 PA CARBON p-Fluorobutyryl fentanyl 1
17944 2016 OH FRANKLIN p-Fluorobutyryl fentanyl 1
18287 2016 PA LUZERNE p-Fluorobutyryl fentanyl 1
18391 2016 KY MADISON p-Fluorobutyryl fentanyl 1
18741 2016 PA LACKAWANNA p-Fluorobutyryl fentanyl 2
21212 2017 VA CHESTERFIELD p-Fluorobutyryl fentanyl 2
21774 2017 PA LUZERNE p-Fluorobutyryl fentanyl 4
23707 2017 PA LYCOMING p-Fluorobutyryl fentanyl 2
p-Fluorofentanyl
YYYY State COUNTY SubstanceName DrugReports
17754 2016 OH CUYAHOGA p-Fluorofentanyl 6
18328 2016 WV HARRISON p-Fluorofentanyl 1
p-methoxybutyryl fentanyl
YYYY State COUNTY SubstanceName DrugReports
17419 2016 PA ALLEGHENY p-methoxybutyryl fentanyl 1
trans-3-Methylfentanyl
YYYY State COUNTY SubstanceName DrugReports
16840 2016 PA ALLEGHENY trans-3-Methylfentanyl 14
20441 2017 PA ALLEGHENY trans-3-Methylfentanyl 7
21198 2017 KY CARTER trans-3-Methylfentanyl 2
# for name ,group in df[df["YYYY"].isin([2014,2015])].groupby(['SubstanceName']):
#     df_b = group
#     df_add = df_b.groupby(['YYYY','State']).sum()
#     display(df_add)
#     a=(df_add.loc[2015,'DrugReports']-df_add.loc[2014,'DrugReports']).idxmax()
#     display(a)

df_b = df[(df['YYYY'].isin([2014,2015]))&(df['State']=='OH')]
df_add = df_b.groupby(['YYYY','SubstanceName']).sum()
display((df_add.loc[2015]-df_add.loc[2014]).idxmax()) #這裏利用了索引對齊的特點
display((df_add.loc[2015]/df_add.loc[2014]).idxmax())
DrugReports    Heroin
dtype: object



DrugReports    Acetyl fentanyl
dtype: object

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章