pandas根據dtype選擇columns
select_dtypes()
方法根據dtype
選擇columns中的子集。
import numpy as np
import pandas as pd
df = pd.DataFrame({'string': list('abc'),
'int64': list(range(1, 4)),
'uint8': np.arange(3, 6).astype('u1'),
'float64': np.arange(4.0, 7.0),
'bool1': [True, False, True],
'bool2': [False, True, False],
'dates': pd.date_range('now', periods=3),
'category': pd.Series(list("ABC")).astype('category')})
df['tdeltas'] = df.dates.diff()
df['uint64'] = np.arange(3, 6).astype('u8')
df['other_dates'] = pd.date_range('20130101', periods=3)
df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern')
df
|
string |
int64 |
uint8 |
float64 |
bool1 |
bool2 |
dates |
category |
tdeltas |
uint64 |
other_dates |
tz_aware_dates |
0 |
a |
1 |
3 |
4.0 |
True |
False |
2019-12-01 22:00:58.958571 |
A |
NaT |
3 |
2013-01-01 |
2013-01-01 00:00:00-05:00 |
1 |
b |
2 |
4 |
5.0 |
False |
True |
2019-12-02 22:00:58.958571 |
B |
1 days |
4 |
2013-01-02 |
2013-01-02 00:00:00-05:00 |
2 |
c |
3 |
5 |
6.0 |
True |
False |
2019-12-03 22:00:58.958571 |
C |
1 days |
5 |
2013-01-03 |
2013-01-03 00:00:00-05:00 |
df.dtypes
string object
int64 int64
uint8 uint8
float64 float64
bool1 bool
bool2 bool
dates datetime64[ns]
category category
tdeltas timedelta64[ns]
uint64 uint64
other_dates datetime64[ns]
tz_aware_dates datetime64[ns, US/Eastern]
dtype: object
select_dtypes()
有兩個參數include
和exclude
。
df.select_dtypes(include=[bool])
|
bool1 |
bool2 |
0 |
True |
False |
1 |
False |
True |
2 |
True |
False |
df.select_dtypes(include=['bool'])
|
bool1 |
bool2 |
0 |
True |
False |
1 |
False |
True |
2 |
True |
False |
df.select_dtypes(include=['number', 'bool'], exclude=['unsignedinteger'])
|
int64 |
float64 |
bool1 |
bool2 |
tdeltas |
0 |
1 |
4.0 |
True |
False |
NaT |
1 |
2 |
5.0 |
False |
True |
1 days |
2 |
3 |
6.0 |
True |
False |
1 days |
要選擇字符串列,你必須使用對象dtype:
df.select_dtypes(include=['object'])
要查看像numpy.number
這樣的泛型dtype的所有子dtypes。你可以定義一個返回子類型樹的函數:
def subdtypes(dtype):
subs = dtype.__subclasses__()
if not subs:
return dtype
return [dtype,[subdtypes(dt) for dt in subs]]
subdtypes(np.generic)
[numpy.generic,
[[numpy.number,
[[numpy.integer,
[[numpy.signedinteger,
[numpy.int8,
numpy.int16,
numpy.int32,
numpy.int32,
numpy.int64,
numpy.timedelta64]],
[numpy.unsignedinteger,
[numpy.uint8,
numpy.uint16,
numpy.uint32,
numpy.uint32,
numpy.uint64]]]],
[numpy.inexact,
[[numpy.floating,
[numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
[numpy.complexfloating,
[numpy.complex64, numpy.complex128, numpy.complex128]]]]]],
[numpy.flexible,
[[numpy.character, [numpy.bytes_, numpy.str_]],
[numpy.void, [numpy.record]]]],
numpy.bool_,
numpy.datetime64,
numpy.object_]]
subdtypes(np.number)
[numpy.number,
[[numpy.integer,
[[numpy.signedinteger,
[numpy.int8,
numpy.int16,
numpy.int32,
numpy.int32,
numpy.int64,
numpy.timedelta64]],
[numpy.unsignedinteger,
[numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint32, numpy.uint64]]]],
[numpy.inexact,
[[numpy.floating,
[numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
[numpy.complexfloating,
[numpy.complex64, numpy.complex128, numpy.complex128]]]]]]