生活中,不同的姓氏出現的頻率大不相同。如趙、王、李等姓出現頻率很高,而像東方、慕容這樣的複姓卻很少見到,今天我們就來在python中簡單實現一下上述過程。
要求:姓氏全都按行保存在CNames中,越靠後的姓氏越少見。各行數組出現概率比爲20:15:5:2:1,對應行數組爲0-3行,4-8行,9-15行,16-20行,21-25行
CNames.txt文件包含內容如下圖所示:
加載數據
def load_file():
text_data=[]
with open('CName.txt','r',encoding='utf-8') as f:
data=f.readlines()
for line in data:
tmp=line.split(';')
text_data.append(tmp[:-1])
return text_data
產生概率行數組
probability_control=[
[0,1,2,3],
[4,5,6,7,8],
[9,10,11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]
]
loop=[20,15,5,2,1]
def probability_line(data:list,ranges:list):
if len(data)==len(ranges):
data=data.copy()
count=0
for loops in loop:
for a in range(loops-1):
data.append(data[count])
count+=1
return data
else:
print('長度不匹配')
產生隨機姓氏
import random
def generator_line(data):
name=load_file()
line_combine=data[random.randrange(len(data))]
line=line_combine[random.randrange(len(line_combine))]
random_name=name[line][random.randrange(len(name[line]))]
return random_name
全部代碼:
import random
probability_control=[
[0,1,2,3],
[4,5,6,7,8],
[9,10,11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]
]
loop=[20,15,5,2,1]
def load_file():
text_data=[]
with open('CName.txt','r',encoding='utf-8') as f:
data=f.readlines()
for line in data:
tmp=line.split(';')
text_data.append(tmp[:-1])
return text_data
def probability_line(data:list,ranges:list):
if len(data)==len(ranges):
data=data.copy()
count=0
for loops in loop:
for a in range(loops-1):
data.append(data[count])
count+=1
return data
else:
print('長度不匹配')
def generator_line(data):
name=load_file()
line_combine=data[random.randrange(len(data))]
line=line_combine[random.randrange(len(line_combine))]
random_name=name[line][random.randrange(len(name[line]))]
return random_name
def main():
data=probability_line(probability_control,loop)
with open('random_name.txt','w') as f:
for i in range(10000): # 隨機產生10000個姓氏
name=generator_line(data)
f.write(name+'\n')
if __name__=='__main__':
main()