python進階12併發之八多線程與數據同步

原創博客地址:python進階12併發之八多線程與數據同步

python併發首選進程，但偶爾有場景進程無法搞定，比如有些變量是無法序列化的，就無法使用工具包manager()的工具類進行共享。如果自己實現新的共享方法，可能開發量較大，且質量難以保證。此時可考慮用線程處理，規避進程的變量共享難題，而且實際場景中,IO大概率都是瓶頸，所以使用線程其實也的確有些優勢。個人而言，選擇進程和線程較爲重視的安全性，進程數據隔離較好，互不干擾。其次就是公用數據佔比，如果大多數數據都需公用，那麼線程也會比進程更佳，避免了進程較多的數據共享問題。
線程而言，難點數據一致性，

哪些共享，哪些不共享

使用線程，大概率出現的情況，本以爲沒共享，實際共享了。由於(以爲)沒共享，所以沒做同步處理，導致最後數據一團糟。
參考代碼:

# coding=utf-8
############## 共享變量均未加鎖，僅用來演示共享問題，未考慮同步問題 ###########
############# 線程的變量共享　#############
import threading
import time

gnum = 1


class MyThread(threading.Thread):
    # 重寫 構造方法
    def __init__(self, num, num_list, sleepTime):
        threading.Thread.__init__(self)
        self.num = num
        self.sleepTime = sleepTime
        self.num_list = num_list

    def run(self):
        time.sleep(self.sleepTime)
        global gnum
        gnum += self.num
        self.num_list.append(self.num)
        self.num += 1
        print('(global)\tgnum 線程(%s) id:%s num=%d' % (self.name, id(gnum), gnum))
        print('(self)\t\tnum 線程(%s) id:%s num=%d' % (self.name, id(self.num), self.num))
        print('(self.list)\tnum_list 線程(%s) id:%s num=%s' % (self.name, id(self.num_list), self.num_list))


if __name__ == '__main__':
    mutex = threading.Lock()
    num_list = list(range(5))
    t1 = MyThread(100, num_list, 1)
    t1.start()
    t2 = MyThread(200, num_list, 5)
    t2.start()

執行結果:

/home/john/anaconda3/bin/python3 /home/john/PYTHON/scripts/concurrent/threadShare.py
(global)	gnum 線程(Thread-1) id:93930593956000 num=101
(self)		num 線程(Thread-1) id:93930593956000 num=101
(self.list)	num_list 線程(Thread-1) id:140598419056328 num=[0, 1, 2, 3, 4, 100]
(global)	gnum 線程(Thread-2) id:140598420111056 num=301
(self)		num 線程(Thread-2) id:93930593959200 num=201
(self.list)	num_list 線程(Thread-2) id:140598419056328 num=[0, 1, 2, 3, 4, 100, 200]

結果解析:

共享數據的同步(參考博文:python進階06併發之二技術點關鍵詞)

最簡單做法，凡是會在多個線程中修改的共享對象(變量)，都加鎖。這樣可能會有部分鎖多加了，但絕對好過不加，畢竟多加鎖無非導致效率低下(也可能導致死鎖)，而一旦該加的沒有加，則會導致數據錯誤，二者孰輕孰重很清楚。建議多瞭解下”原子操作“，如果不熟悉，可以按照先加鎖，再刪鎖的思路，將原子操作的鎖刪掉即可（業務邏輯開發階段，哪些會在多個線程被修改，是很難想全面的。所以一般是先開發，實現業務邏輯思路，再找共享變量，儘可能縮小臨界區間，最後再上鎖）。這樣一方面保險，另一方面也避免了過多鎖帶來的低效問題。

thread完整版和簡單版的關係

class Thread:
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs=None, *, daemon=None):
        if kwargs is None:
            kwargs = {}
        self._target = target
        self._name = str(name or _newname())
        self._args = args
        self._kwargs = kwargs

    def run(self):
        try:
            if self._target:
                self._target(*self._args, **self._kwargs)
        finally:
            del self._target, self._args, self._kwargs

線程本身就有局部變量，爲何還需要ThreadLocal？

ThreadLocal例子

import threading
 
# 創建全局ThreadLocal對象:
local_school = threading.local()
 
def process_student():
  print 'Hello, %s (in %s)' % (local_school.student, threading.current_thread().name)
 
def process_thread(name):
  # 綁定ThreadLocal的student:
  local_school.student = name
  process_student()
 
t1 = threading.Thread(target= process_thread, args=('Alice',), name='Thread-A')
t2 = threading.Thread(target= process_thread, args=('Bob',), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()

網上沒有查詢到有效資料，說明個人理解吧，
首先，如果你的線程採用了完整模式書寫(定義class繼承thread)，則的確不需要使用ThreadLocal，其init內完全可以定義對象自身的私有變量(list等引用型入參，可通過deepcopy複製出私有的一份).
如果你想採用簡潔模式, threading.Thread(target= process_thread, args=xx),那麼其實是沒有定義私有變量的地方的(也不是完全沒有，如果是int,str等，本來就是形參，如果是list()則會共享)

舉例：

############# 線程的變量共享(short mode)　#############
gnum = 1


def process(num, num_list, sleepTime):
    time.sleep(sleepTime)
    global gnum
    gnum += num
    num_list.append(num)
    num += 1
    print('(global)\tgnum 線程(%s) id:%s num=%d' % (threading.currentThread().name, id(gnum), gnum))
    print('(self)\t\tnum 線程(%s) id:%s num=%d' % (threading.currentThread().name, id(num), num))
    print('(self.list)\tnum_list 線程(%s) id:%s num=%s' % (threading.currentThread().name, id(num_list), num_list))


if __name__ == '__main__':
    mutex = threading.Lock()
    num_list = list(range(5))
    t1 = threading.Thread(target=process, args=(100, num_list, 1,))
    t1.start()
    t2 = threading.Thread(target=process, args=(200, num_list, 5,))
    t2.start()

結果:(和前面相同)

(global)	gnum 線程(Thread-1) id:94051294298272 num=101
(self)		num 線程(Thread-1) id:94051294298272 num=101
(self.list)	num_list 線程(Thread-1) id:140412783240456 num=[0, 1, 2, 3, 4, 100]
(global)	gnum 線程(Thread-2) id:140412784295536 num=301
(self)		num 線程(Thread-2) id:94051294301472 num=201
(self.list)	num_list 線程(Thread-2) id:140412783240456 num=[0, 1, 2, 3, 4, 100, 200]

可見，對於單個函數的線程，其實沒必要使用threadLocal

那麼那種情況需要使用呢？

global_dict = {}

def std_thread(name):
    std = Student(name)
    # 把std放到全局變量global_dict中：
    global_dict[threading.current_thread()] = std
    do_task_1()
    do_task_2()

def do_task_1():
    # 不傳入std，而是根據當前線程查找：
    std = global_dict[threading.current_thread()]
    ...

def do_task_2():
    # 任何函數都可以查找出當前線程的std變量：
    std = global_dict[threading.current_thread()]
    ...

對於存在調用子函數，且函數之間存在參數傳遞的情況才需要使用threadLocal
同時，如果本身thread使用的就是完整模式的thread了，那麼由於本身的self.xx已經是局部變量了，所以也不需要使用threadLocal進行中轉保存.
綜上所述，其實threadLocal的使用場景是比較有限的，必須是thread簡潔模式下，存在函數調用和傳參的情況下在有必要使用。

類鎖還是實例鎖?

由於鎖和臨界區是對應的（作爲臨界變量，臨界區的保鏢），如果臨界變量（區)是類級別信息（比如統計類實例個數），就用類鎖，否則就是實例鎖。

參考

python ThreadLocal
深入理解Python中的ThreadLocal變量（上）
Python中ThreadLocal的理解與使用

python進階12併發之八多線程與數據同步

哪些共享，哪些不共享

共享數據的同步(參考博文:python進階06併發之二技術點關鍵詞)

thread完整版和簡單版的關係

線程本身就有局部變量，爲何還需要ThreadLocal？

類鎖還是實例鎖?

參考

django進階03靜態文件和模板

[轉]形態學操作：膨脹與腐蝕

django進階04部署上線(nginx,uwsgi,supervisor)

django進階02websocket

python進階20裝飾器

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結