python code tips(持續更新中。。。)

pandas篇

1、
通常做數據分析時候會用pandas將數據轉化成dataframe數據框的格式，
如

但是如果遇到只有一行數據的情況

就會報錯：ValueError: If using all scalar values, you must pass an index
有如下兩種處理方式：

2、常用list=[]、dict1=dict{}創建一個空列表和空字典表，再用數據框的時候，也可以用pd.DataFrame()直接創建一個空數據框，當然也可以指定一些列名，out_var = pd.DataFrame(columns=var_name) # 創建一個空的dataframe

3、在用append的時候，out_var.append(pd_lt)簡單以爲這樣就連接上是錯誤滴。一定要out_var = out_var.append(pd_lt)。另外如果要在for循環外面引用內部append之後的變量。必須在外邊定義全局變量！！
global out_lst
out_lst=pd.DataFrame()

4、#改變數據類型
df = pd.DataFrame([{‘col1’:‘a’, ‘col2’:‘1’}, {‘col1’:‘b’, ‘col2’:‘2’}])
df.dtypes

df[‘col2’] = df[‘col2’].astype(‘int’)
df[‘col2’] = df[‘col2’].astype(‘float64’)

5、如何找到NaN、缺失值或者某些元素的索引名稱以及位置
np.where(np.isnan(df))[0]
是否包含缺失值：np.isnan(df).any()
6、缺失值nan
可以用pd.isnul(dt)或者np.isnan(dt)或者math.isnan(dt)或者pd.isna(dt) ，其中pd.isna()既能識別math.nan也能識別np.nan，是最好的。最好不要用x is np.nan

7、read_csv(nrows=n)
大數據的情況下先嚐試讀取幾行

numpy篇

1、np.take
a = [4, 3, 5, 7, 6, 8]
indices = [0, 1, 4]
np.take(a, indices)
　　array([4, 3, 6])
　　　　　　　　　　　　　
2、np.corrcoef(a) 計算行之間的相關係數，np.corrcoef(a, rowvar=0)計算列之間的相關係數

3、np.absolute(a) 、np.abs(a) 求a的絕對值矩陣

4、argsort、argmax、argmin、argpartition
　　argsort是對指定的array排序之後的索引，argmax返回最大值的索引，argmin返回最小值的索引，argpartition找出 N 個最大數值的索引

x = np.array([12, 10, 12, 0, 6, 8, 9, 1, 16, 4, 6, 0])
index_val = np.argpartition(x, -4)[-4:]

5、np.ravel()

6、np.allclose(a, b,0.1) 判斷a,b兩個數組在公差範圍內是否相等

array1 = np.array([0.12,0.17,0.24,0.29])
array2 = np.array([0.13,0.19,0.26,0.31])# with a tolerance of 0.1, it should return False:
np.allclose(array1,array2,0.1)
False# with a tolerance of 0.2, it should return True:
np.allclose(array1,array2,0.2)
True

7、np.random產生隨機數
其中np.random.randn(10), 生成10個0,1正態分佈隨機數
np.random.randint(0, 5, 10)生成10個在0,5之間的隨機數

np.random.uniform(0,0.1,size=(10,20)) 產生10*20 的0~0.1的均勻分佈數

8、np.ones_like(arry, dtype=bool)和np.ones(shapes=(array.shape[0],array.shape[1]), dtype=bool)；類比的還有np.zeros_like和zp.zeros

9、np.where
Where() 用於從一個數組中返回滿足特定條件的元素。比如，它會返回滿足特定條件的數值的索引位置。Where() 與 SQL 中使用的 where condition 類似，如以下示例所示：

y = np.array([1,5,6,8,1,7,3,6,9])# Where y is greater than 5, returns index position
np.where(y>5)
array([2, 3, 5, 7, 8], dtype=int64),)# First will replace the values that match the condition, 
# second will replace the values that does not
np.where(y>5, "Hit", "Miss")
array(['Miss', 'Miss', 'Hit', 'Hit', 'Miss', 'Hit', 'Miss', 'Hit', 'Hit'],dtype='<U4')

10、
Percentile() 用於計算特定軸方向上數組元素的第 n 個百分位數。

a = np.array([1,5,6,8,1,7,3,6,9])
np.percentile(a, 50, axis =0)

基本語法

1、copy()與deepcopy()的區別，總結下就是deepcopy()之後，不管原數據內部什麼格式，原數據改變也不會變；copy()是當數據內部嵌套的可變數據類型發生變化時，copy之後的數據也會發生變化。具體可以參見這篇博文

def rever(a):

    # a[0], a[1] = a[1], a[0]
    # a = a[::-1]
    a += [1, 2]
    return a

list = [4, 5, 6, 7, 8]
list_1 = rever(list)
print(list)

結果爲：[4, 5, 6, 7, 8, 1, 2]
可變對象作爲函數入參的時候list也會變化。ps:a = a[::-1]沒有改變，是在原有地址基礎上逆轉？
2、assert 斷言
assert 1>2, ‘incorrect number’

3、對列表的extend、append、+=
a=[1, 2, 3] b=[7, 8, 9]
a.append(b)=[1, 2, 3, [7, 8, 9]]
a.extend(b)=[1, 2, 3, 7, 8, 9]
+=和extend效果一樣

4、字符串翻轉
找到下面5種方法的比較:

簡單的步長爲-1, 即字符串的翻轉(常用);
交換前後字母的位置;
遞歸的方式, 每次輸出一個字符;
雙端隊列, 使用extendleft()函數;
使用for循環, 從左至右輸出;
.借用列表，使用reverse()方法

string = 'abcdef'

def string_reverse1(string):
    return string[::-1]

def string_reverse2(string):
    t = list(string)
    l = len(t)
    for i,j in zip(range(l-1, 0, -1), range(l//2)):
        t[i], t[j] = t[j], t[i]
    return "".join(t)

def string_reverse3(string):
    if len(string) <= 1:
        return string
    return string_reverse3(string[1:]) + string[0]

from collections import deque
def string_reverse4(string):
    d = deque()
    d.extendleft(string)
    return ''.join(d)

def string_reverse5(string):
    #return ''.join(string[len(string) - i] for i in range(1, len(string)+1))
    return ''.join(string[i] for i in range(len(string)-1, -1, -1))
    
def string_reverse6(string):
    return ''.join(list(string).reverse())

5、實現逆序循環 for i in range(len, -1, -1)

6、7//2=3，7%2=1，7/2=3.5

7、if條件語句後面需要跟隨bool類型的數據，即True或者False。然而，如果不是bool類型的數據，可以將其轉換成bool類型的數據，轉換的過程是隱式的。

在Python中，None、空列表[]、空字典{}、空元組()、0等一系列代表空和無的對象會被轉換成False。除此之外的其它對象都會被轉化成True。

在命令if not 1中，1便會轉換爲bool類型的True。not是邏輯運算符非，not 1則恆爲False。因此if語句if not 1之下的語句，永遠不會執行。

8、s.split(’ ')和s.split()有區別，前者可以把所有的空格都當做分隔符，後者只能把一個空格當做空格符

9、連續空格只保留一個

s = "abc  def   ghi       xy"
print(' '.join(filter(lambda x: x, s.split(' ')))
#filter的用法，filter(function, iterable)，其中function是個判斷函數，例：
/*
def is_odd(n):
    return n % 2 == 1

newlist = filter(is_odd, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(newlist) --[1, 3, 5, 7, 9]
*/
#這個判斷函數就是x不爲空

print(' '.join(s.split()))#這個方法就能達到要求

10、

def foo(a, b, name=None,*args,  **kwargs):
    print(name)
    print(args)
    print(kwargs)
A=(1, 2, 3)
B={"k1":"v1","k2":"v2"}
foo(1,2,C=6,*A,**B) #如果寫成foo(1,2,C=6,1, 2, 3,**B)則會報錯
1------->name
(2, 3)------->args
{'k2': 'v2', 'C': 6, 'k1': 'v1'}------->kwargs，C=6傳入kwargs中

還有：

def foo(a, b, k1, k2):
	print(k1, k2)

B={"k1":"v1","k2":"v2"}
foo(1, 2,**B)
這樣操作是傳入多個參數

11、裝飾器

import time

def timer(func):
    def wrapper(*args,**kwds):
        t0 = time.time()
        func(*args, **kwds)
        t1 = time.time()
        print('耗時%0.3f' % (t1 - t0,))

    return wrapper

@timer
def do_something(delay):
    print('函數do_something開始')
    time.sleep(delay)
    print('函數do_something結束')

do_something(3)

python code tips(持續更新中。。。)

pandas篇

numpy篇

基本語法

如何使用 JS 判斷用戶是否處於活躍狀態

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

理解xgboost

xgboost的原生接口與sklearn接口輸出feature_importance

Python連續變量分箱--woe值單調分箱

Python ： satasmodels & sklearn LogisticRegression

logistic regression--sas逐步迴歸推導驗證

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結