測試數據生成工具datafaker使用

1、工具使用場景

在軟件開發測試過程,經常需要測試數據。這些場景包括:

1.1 後端開發
新建表後,需要構造數據庫測試數據,生成接口數據提供給前端使用。

1.2數據庫性能測試
生成大量測試數據,測試數據庫性能

1.3流數據測試
針對kafka流數據,需要不斷定時生成測試數據寫入kafka

2、安裝流程

  • 安裝python
  • 安裝pip install datafaker
  • 更新到最新版本:pip install datafaker --upgrade
  • 卸載工具:pip uninstall datafaker

如果是python2,需要安裝MySQLdb

如果是python3,則安裝pymysql,並在datafaker目錄下init.py中添加以下兩行

import pymysql
pymysql.install_as_MySQLdb() 

3、datafaker使用

3.1 新建meta.txt文件,將對應的表元數據寫入

id||int||自增id[:inc(id,1)]
name||varchar(20)||學生名字
school||varchar(20)||學校名字[:enum(file://names.txt)]
nickname||varchar(20)||學生小名[:enum(鬼泣, 高小王子, 歌神, 逗比)]
age||int||學生年齡[:age]
class_num||int||班級人數[:int(10, 100)]
score||decimal(4,2)||成績[:decimal(4,2,1)]
phone||bigint||電話號碼[:phone_number]
email||varchar(64)||家庭網絡郵箱[:email]
ip||varchar(32)||IP地址[:ipv4]
address||text||家庭地址[:address]

3.2命令行輸入

datafaker rdb mysql+mysqldb://user:password@localhost:3306/dw?charset=utf8 user 500  --meta meta.txt
#user是測試數據導入的表 需事先創建好
#500是測試數據生成量,可配置

3.3從本地文件meta.txt中讀取元數據,以,分隔符構造10條數據,打印在屏幕上不會在數據插入

$ datafaker rdb mysql+mysqldb://root:root@localhost:3600/test?charset=utf8 stu 10 --outprint --meta meta.txt --outspliter ,,

3.4 寫hive:產生1000條數據寫入hive的test庫,stu表中

其中yarn爲用戶名,需要hive版本支持acid,不然請生成本地文件,然後上傳到hdfs

datafaker hive hive://yarn@localhost:10000/test stu 1000 --meta data/hive_meta.txt

3.5 寫文件:產生10條json格式數據寫入到/home目錄out.txt中


datafaker file /home out.txt 10 --meta meta.txt --format json

3.6 寫kafka:從本地meta.txt參數數據,以1秒間隔輸出到kafka的topic hello中


$ datafaker kafka localhost:9092 hello 1 --meta meta.txt --interval 1
{"school": "\u4eba\u548c\u4e2d\u5fc3", "name": "\u5218\u91d1\u51e4", "ip": "192.20.103.235", "age": 9, "email": "[email protected]", "phone": "13256316424", "score": 3.45, "address": "\u5e7f\u4e1c\u7701\u5b81\u5fb7\u5e02\u6d54\u9633\u5468\u8defu\u5ea7 990262", "class_num": 24, "nickname": "\u9017\u6bd4", "id": 1}
{"school": "\u4eba\u548c\u4e2d\u5fc3", "name": "\u6768\u4e3d", "ip": "101.129.18.230", "age": 3, "email": "[email protected]", "phone": "18183286767", "score": 22.16, "address": "\u8fbd\u5b81\u7701\u592a\u539f\u5e02\u53cb\u597d\u6c55\u5c3e\u8defG\u5ea7 382777", "class_num": 30, "nickname": "\u6b4c\u795e", "id": 2}
{"school": "\u6e05\u534e\u4e2d\u5b66", "name": "\u8d75\u7ea2", "ip": "192.0.3.34", "age": 9, "email": "[email protected]", "phone": "18002235094", "score": 48.32, "address": "\u5e7f\u897f\u58ee\u65cf\u81ea\u6cbb\u533a\u65ed\u5e02\u6c88\u5317\u65b0\u6731\u8defc\u5ea7 684262", "class_num": 63, "nickname": "\u6b4c\u795e", "id": 3}
{"school": "\u6e05\u534e\u4e2d\u5b66", "name": "\u5f20\u7389\u6885", "ip": "198.20.50.222", "age": 3, "email": "[email protected]", "phone": "15518698519", "score": 85.96, "address": "\u5b81\u590f\u56de\u65cf\u81ea\u6cbb\u533a\u6d69\u53bf\u767d\u4e91\u4e4c\u9c81\u6728\u9f50\u8857s\u5ea7 184967", "class_num": 18, "nickname": "\u9017\u6bd4", "id": 4}
{"school": "\u732a\u573a", "name": "\u674e\u6842\u5170", "ip": "192.52.195.184", "age": 8, "email": "[email protected]", "phone": "18051928254", "score": 97.87, "address": "\u9ed1\u9f8d\u6c5f\u7701\u54c8\u5c14\u6ee8\u53bf\u6c38\u5ddd\u6d2a\u8857E\u5ea7 335135", "class_num": 46, "nickname": "\u9ad8\u5c0f\u738b\u5b50", "id": 5}
{"school": "\u4eba\u548c\u4e2d\u5fc3", "name": "\u5434\u60f3", "ip": "192.42.234.178", "age": 3, "email": "[email protected]", "phone": "14560810465", "score": 6.32, "address": "\u5b81\u590f\u56de\u65cf\u81ea\u6cbb\u533a\u516d\u76d8\u6c34\u5e02\u5357\u6eaa\u7f57\u8857M\u5ea7 852408", "class_num": 12, "nickname": "\u9b3c\u6ce3", "id": 6}
^Cgenerated records : 6
insert records : 6
time used: 6.285 s

json嵌套或任意數據結構(可不是jon)

datafaker kafka localhost:9092 hello 10 --metaj meta.txt

請使用–metaj指定元數據文件meta.txt:

{
    "name": [:name],
    "age": [:age],
    "school": {
        "sch_name": [:enum(file://../data/names.txt)],
        "sch_address": [:address],
        "scores": [
            {
                "class": [:enum(Math, English)],
                "score": [:decimal(4,2,1)]
            },
            {
                "class": [:enum(Chinese, Computer)],
                "score": [:decimal(4,2,1)]
            }
        ]
    }
}

datafaker會替換meta.txt內容中帶標記的字符串,並保留原格式,包括tab和空格,產生如下結果:

{
    "name": 駟俊,
    "age": 95,
    "school": {
        "sch_name": 舊大院,
        "sch_address": 湖北省濟南市上街寧德路I座 557270,
        "scores": [
            {
                "class": Math,
                "score": 83.28
            },
            {
                "class": Computer,
                "score": 52.37
            }
        ]
    }
}

如果要使用正確格式的json,將元數據文件內容壓縮

{"name":[:name],"age":[:age],"school":{"sch_name":[:enum(file://../data/names.txt)],"sch_address":[:address],"scores":[{"class":[:enum(Math,English)],"score":[:decimal(4,2,1)]},{"class":[:enum(Chinese,Computer)],"score":[:decimal(4,2,1)]}]}}

3.7 寫hbase


datafaker hbase localhost:9090 test-table 10 --meta data/hbase.txt

需要開啓hbase thrift服務,不能爲thrift2
例子中,創建一張表test-table, 列族爲Cf
元數據文件hbase.txt內容爲

rowkey||varchar(20)||sdflll
Cf:name||varchar(20)||學生名字
Cf:age||int||學生年齡[:age]

其中第一行必須爲rowkey, 可帶參數,rowkey(0,1,4)表示將rowkey值和後面第一列,第五列值用_連接

後面行爲列族中的列名,可以創建多個列族

3.8 寫入ES


datafaker es localhost:9200 example1/tp1 100 --auth elastic:elastic --meta meta.txt

其中localhost:9200爲es的連接方式,多個host用逗號分隔。如host1:9200,host2:9200

example1/tp1爲index和type,以/分隔

elastic:elastic爲賬號和密碼,若沒有,則可不帶該參數

3.9 數據寫入oracle


datafaker rdb oracle://root:[email protected]:1521/helowin stu 10 --meta meta.txt

sqlalchemy連接串必須爲oracle:形式

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章