【python】雪花生成器解決主鍵自增唯一問題

 tidb不支持設置主鍵自增,只能通過算法生成id,保證主鍵唯一性。

 

源碼:

 

import time 

# 64位ID的劃分 
WORKER_ID_BITS = 5 
DATACENTER_ID_BITS = 5 
SEQUENCE_BITS = 12 

# 最大取值計算 
MAX_WORKER_ID = -1 ^ (-1 << WORKER_ID_BITS) # 2**5-1 0b11111 
MAX_DATACENTER_ID = -1 ^ (-1 << DATACENTER_ID_BITS) 

# 移位偏移計算 
WOKER_ID_SHIFT = SEQUENCE_BITS 
DATACENTER_ID_SHIFT = SEQUENCE_BITS + WORKER_ID_BITS 
TIMESTAMP_LEFT_SHIFT = SEQUENCE_BITS + WORKER_ID_BITS + DATACENTER_ID_BITS 

# 序號循環掩碼 
SEQUENCE_MASK = -1 ^ (-1 << SEQUENCE_BITS) 

# Twitter元年時間戳 
TWEPOCH = 1288834974657 

#--------------------------------

class IdWorker(object): 
    """ 
    用於生成IDs 
    """ 
    def __init__(self, datacenter_id, worker_id, sequence=0): 
        """ 
        初始化 
        :param datacenter_id: 數據中心(機器區域)ID 
        :param worker_id: 機器ID 
        :param sequence: 實例序號 
        """ 
        # sanity check 
        if worker_id > MAX_WORKER_ID or worker_id < 0: 
            raise ValueError('worker_id值越界') 
            
        if datacenter_id > MAX_DATACENTER_ID or datacenter_id < 0: 
            raise ValueError('datacenter_id值越界') 
        
        self.worker_id = worker_id 
        self.datacenter_id = datacenter_id 
        self.sequence = sequence 
        self.last_timestamp = -1 # 上次計算的時間戳 
        
    def _gen_timestamp(self): 
        """ 
        生成整數時間戳 
        :return:int timestamp 
        """ 
        return int(time.time() * 1000) 
    
    def get_id(self): 
        """ 
        獲取新ID :return: 
        """ 
        timestamp = self._gen_timestamp() 
        
        # 時鐘回撥 
        if timestamp < self.last_timestamp: 
            raise 
            
        if timestamp == self.last_timestamp: 
            self.sequence = (self.sequence + 1) & SEQUENCE_MASK 
            if self.sequence == 0: 
                timestamp = self._til_next_millis(self.last_timestamp) 
            else: self.sequence = 0 

        self.last_timestamp = timestamp 
        
        new_id = ((timestamp - TWEPOCH) << TIMESTAMP_LEFT_SHIFT) | (self.datacenter_id << DATACENTER_ID_SHIFT) | (self.worker_id << WOKER_ID_SHIFT) | self.sequence 
        return new_id 
        
    def _til_next_millis(self, last_timestamp): 
        """ 
        等到下一毫秒 
        """ 
        timestamp = self._gen_timestamp() 
        while timestamp <= last_timestamp: 
            timestamp = self._gen_timestamp() 
            return timestamp 
            
#================================

if __name__ == '__main__': 
    worker = IdWorker(1, 2, 0) 
    print(worker.get_id())

 

實際:

worker = IdWorker(1, 1, 0)
print(worker.get_id())

生成19位數字:1536169204380733440

 

批量調用:

from IdWorker import IdWorker
import time

 

ids = []
for i in df3.index:
time.sleep(0.00001)
ids.append(IdWorker(1, 1, 0).get_id())
df3['id'] = ids
df3.insert(0, 'id', df3.pop('id'))

 

 

table1='tb_xxx'

dtype1={'id':VARCHAR(19),}

# 更新天表
df3.to_sql(table1,engine,index=False,dtype=dtype1,if_exists='replace',chunksize=5000)

with engine.connect() as con:
con.execute(f'ALTER TABLE `{table1}` ADD PRIMARY KEY (`id`);')

 

-----------------------------

 

當前bug:每次生成1000條大概耗時22s,數據量大以後會存在重複id

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章