pandas根據dtype選擇columns

pandas根據dtype選擇columns

select_dtypes()方法根據dtype選擇columns中的子集。

import numpy as np
import pandas as pd

df = pd.DataFrame({'string': list('abc'),
                    'int64': list(range(1, 4)),
                    'uint8': np.arange(3, 6).astype('u1'),
                    'float64': np.arange(4.0, 7.0),
                    'bool1': [True, False, True],
                    'bool2': [False, True, False],
                    'dates': pd.date_range('now', periods=3),
                    'category': pd.Series(list("ABC")).astype('category')})
df['tdeltas'] = df.dates.diff()
df['uint64'] = np.arange(3, 6).astype('u8')
 df['other_dates'] = pd.date_range('20130101', periods=3)
 df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern')
df
string int64 uint8 float64 bool1 bool2 dates category tdeltas uint64 other_dates tz_aware_dates
0 a 1 3 4.0 True False 2019-12-01 22:00:58.958571 A NaT 3 2013-01-01 2013-01-01 00:00:00-05:00
1 b 2 4 5.0 False True 2019-12-02 22:00:58.958571 B 1 days 4 2013-01-02 2013-01-02 00:00:00-05:00
2 c 3 5 6.0 True False 2019-12-03 22:00:58.958571 C 1 days 5 2013-01-03 2013-01-03 00:00:00-05:00
 df.dtypes
string                                object
int64                                  int64
uint8                                  uint8
float64                              float64
bool1                                   bool
bool2                                   bool
dates                         datetime64[ns]
category                            category
tdeltas                      timedelta64[ns]
uint64                                uint64
other_dates                   datetime64[ns]
tz_aware_dates    datetime64[ns, US/Eastern]
dtype: object

select_dtypes()有兩個參數includeexclude

df.select_dtypes(include=[bool])
bool1 bool2
0 True False
1 False True
2 True False
df.select_dtypes(include=['bool'])
bool1 bool2
0 True False
1 False True
2 True False
df.select_dtypes(include=['number', 'bool'], exclude=['unsignedinteger'])
int64 float64 bool1 bool2 tdeltas
0 1 4.0 True False NaT
1 2 5.0 False True 1 days
2 3 6.0 True False 1 days

要選擇字符串列,你必須使用對象dtype:

df.select_dtypes(include=['object'])
string
0 a
1 b
2 c

要查看像numpy.number這樣的泛型dtype的所有子dtypes。你可以定義一個返回子類型樹的函數:

def subdtypes(dtype):
    subs = dtype.__subclasses__()
    if not subs:
        return dtype
    return [dtype,[subdtypes(dt) for dt in subs]]
subdtypes(np.generic)
[numpy.generic,
 [[numpy.number,
   [[numpy.integer,
     [[numpy.signedinteger,
       [numpy.int8,
        numpy.int16,
        numpy.int32,
        numpy.int32,
        numpy.int64,
        numpy.timedelta64]],
      [numpy.unsignedinteger,
       [numpy.uint8,
        numpy.uint16,
        numpy.uint32,
        numpy.uint32,
        numpy.uint64]]]],
    [numpy.inexact,
     [[numpy.floating,
       [numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
      [numpy.complexfloating,
       [numpy.complex64, numpy.complex128, numpy.complex128]]]]]],
  [numpy.flexible,
   [[numpy.character, [numpy.bytes_, numpy.str_]],
    [numpy.void, [numpy.record]]]],
  numpy.bool_,
  numpy.datetime64,
  numpy.object_]]
subdtypes(np.number)
[numpy.number,
 [[numpy.integer,
   [[numpy.signedinteger,
     [numpy.int8,
      numpy.int16,
      numpy.int32,
      numpy.int32,
      numpy.int64,
      numpy.timedelta64]],
    [numpy.unsignedinteger,
     [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint32, numpy.uint64]]]],
  [numpy.inexact,
   [[numpy.floating,
     [numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
    [numpy.complexfloating,
     [numpy.complex64, numpy.complex128, numpy.complex128]]]]]]

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章