YCSB workload工作負載參數設置

YCSB作爲一個工作負載測試工具,參數設置很重要,通過設置不同的read,update或者是insert比例得到的測試時間是不同的。

主要是通過以下命令來加載workloada工作負載文件測試具體數據庫性能:

bin/ycsb load DBname -s -P workloads/workloada

下面我提下YCSB工作負載參數具體的設置和說明,這裏是(YCSB/workloads/workloada文件中設置):

# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount = 1000(表示load和run操作中,使用的YCSB實例記錄數)
operationcount = 1000(表示load和run操作中,使用的YCSB實例操作數)
workload = com.yahoo.ycsb.workloads.CoreWorkload(要使用的工作負載類)
readproportion = 0.5(默認是0.95,表示的是進行read的操作佔所有操作的比例)
updateproportion = 0.5(默認是0.05,表示的是進行update的操作佔所有操作的比例)
insertproportion = 0(默認是0,表示的是進行insert的操作佔所有操作的比例)
scanproportion = 0(默認是0,表示的是進行scan的操作佔所有操作的比例)
requestdistribution = zipfian(默認是uniform,應該使用什麼分佈來選擇要操作的記錄:uniform, zipfian, hotspot, sequential, exponential 和 latest)
threadcount = 2(默認值是1,表示YCSB客戶端線程數)
readallfields = true(默認值是1,應該讀取讀取所有字段(true),只讀取一個(false))

下面是YCSB/workloads/wordloadb文件對應的參數設置,主要是以讀爲主要工作負載測試的參數設置(主要區別就是readproportion和updateproportion對應比例)。

# Yahoo! Cloud System Benchmark
# Workload B: Read mostly workload
#   Application example: photo tagging; add a tag is an update, but most operations are to read tags
#                        
#   Read/update ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0.05
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadc文件對應的參數設置,主要是以讀爲全部工作負載測試的參數設置(主要區別就是readproportion對應比例)。

# Yahoo! Cloud System Benchmark
# Workload C: Read only
#   Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop)
#                        
#   Read/update ratio: 100/0
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadd文件對應的參數設置,主要是以讀爲全部工作負載測試的參數設置(主要區別就是readproportion、insertproportion對應比例)。

# Yahoo! Cloud System Benchmark
# Workload D: Read latest workload
#   Application example: user status updates; people want to read the latest
#                        
#   Read/update/insert ratio: 95/0/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: latest

# The insert order for this is hashed, not ordered. The "latest" items may be 
# scattered around the keyspace if they are keyed by userid.timestamp. A workload
# which orders items purely by time, and demands the latest, is very different than 
# workload here (which we believe is more typical of how people build systems.)
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0
scanproportion=0
insertproportion=0.05
requestdistribution=latest

下面YCSB/workloads/wordloade文件對應的參數設置,針對短區間生成的測試工作負載主要設置(scanproportion參數和insertproportion參數):

# Yahoo! Cloud System Benchmark
# Workload E: Short ranges
#   Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id)                     
#   Scan/insert ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
# The insert order is hashed, not ordered. Although the scans are ordered, it does not necessarily
# follow that the data is inserted in order. For example, posts for thread 342 may not be inserted contiguously, but
# instead interspersed with posts from lots of other threads. The way the YCSB client works is that it will pick a start
# key, and then request a number of records; this works fine even for hashed insertion.
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100(默認值是1000,掃描的最大記錄數是多少)
scanlengthdistribution=uniform(默認是uniform,對於掃描應使用什麼分佈來選擇要掃描的記錄數,對於每次掃描在1和maxscanlength之間)

下面YCSB/workloads/wordloadf文件對應的參數設置,針對讀入修改寫回生成的測試工作負載主要設置(readproportion參數和readmodifywriteproportion參數):

# Yahoo! Cloud System Benchmark
# Workload F: Read-modify-write workload
#   Application example: user database, where user records are read and modified by the user or to record user activity.
#                        
#   Read/read-modify-write ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0
scanproportion=0
insertproportion=0
readmodifywriteproportion=0.5(默認值是0,指的是讀取記錄,修改它,寫回來的操作比例)
requestdistribution=zipfian

下面YCSB/workloads/tswordloada文件對應的參數設置,針對生成的測試工作負載主要設置(readproportion參數和readmodifywriteproportion參數):

# Yahoo! Cloud System Benchmark
# Workload A: Small cardinality consistent data for 2 days
#   Application example: Typical monitoring of a single compute or small 
#   sensor station where 90% of the load is write and only 10% is read 
#   (it's usually much less). All writes are inserts. No sparsity so 
#   every series will have a value at every timestamp.
#
#   Read/insert ratio: 10/90
#   Cardinality: 16 per key (field), 64 fields for a total of 1,024 
#                time series.
workload=com.yahoo.ycsb.workloads.TimeSeriesWorkload
recordcount=1474560
operationcount=2949120
fieldlength=8(默認值是100,字段的大小)
fieldcount=64(默認值是10,記錄中的字段數)
tagcount=4(每個時間系列的唯一標記組合數,如果此值爲4,則每條記錄將包含一個鍵和4個標記組合,例如A = A,B = A,C = A,D = A.)
tagcardinality=1,2,4,2(每個“度量”或字段的每個標記值的基數(唯一值的數量),以逗號分隔的列表。 每個值必須是從1到Java的Integer.MAX_VALUE的數字,並且必須有'tagcount'值。 如果有多於或少於'tagcount'的值,則忽略它或分別替換1。)
# A value from 0 to 0.999999 representing how sparse each time series
# should be. The higher this value, the greater the time interval between
# values in a single series. For example, if sparsity is 0 and there are
# 10 time series with a 'timestampinterval' of 60 seconds with a total
# time range of 10 intervals, you would see 100 values written, one per
# timestamp interval per time series. If the sparsity is 0.50 then there
# would be only about 50 values written so some time series would have
# missing values at each interval.
sparsity=0.0
# The percentage of time series that are "lagging" behind the current
# timestamp of the writer. This is used to mimic a common behavior where
# most sources (agents, sensors, etc) are writing data in sync (same timestamp)
# but a subset are running behind due to buffering, latency issues, etc.
delayedSeries=0.0
# The maximum amount of delay for delayed series in interval counts. The 
# actual delay is chosen based on a modulo of the series index.
delayedIntervals=0
timestampunits=SECONDS(時間單位)
# The amount of time between each value in every time series in
# the units of 'timestampunits'.
timestampinterval=60
# The fixed or maximum amount of time added to the start time of a 
# read or scan operation to generate a query over a range of time 
# instead of a single timestamp. Units are shared with 'timestampunits'.
# For example if the value is set to 3600 seconds (1 hour) then 
# each read would pick a random start timestamp based on the 
#'insertstart' value and number of intervals, then add 3600 seconds
# to create the end time of the query. If this value is 0 then reads
# will only provide a single timestamp. 
querytimespan=3600
readproportion=0.10
updateproportion=0.00
insertproportion=0.90

還有一些重要的設置參數在這些負載中沒有設置例如template中:

insertstart=0(第一個插入值的偏移量)
writeallfields=false(在更新的時候寫所有字段)
fieldlengthdistribution=constant(字段長度分佈形式,有constant,zipfian,uniform)insertorder=hashed(記錄是按順序插入還是僞隨機插入,hashed還是ordered)
hotspotdatafraction=0.2(構成熱集的數據項的百分比)
hotspotopnfraction=0.8(訪問熱集的操作百分比)
table=usertable(要對其運行查詢的數據庫表的名稱)
#當measurementtype設置爲raw時,測量將以以下csv格式輸出爲RAW數據點:“操作,測量的時間戳,我們的延遲”原始數據點在測試運行時收集在內存中。 每個數據點消耗大約50個字節(包括java對象開銷)。 對於典型的100萬到1000萬次操作,大多數時候這應該適合存儲器。 如果您計劃每次運行執行數百萬次操作,請考慮在使用RAW測量類型時配置具有更大RAM的計算機,或者將運行拆分爲多次運行。
#(可選)可以指定輸出文件以保存原始數據點。否則,原始數據點將寫入stdout。如果輸出文件已存在,將附加輸出文件,否則將創建新的輸出文件.measurement.raw.output_file =/tmp/ your_output_file_for_this_run
measurementtype=histogram(如何呈現延遲測量timeseries,histogram,raw)
measurement.histogram.verbose = false(使用直方圖進行測量時是否發出單獨的直方圖桶)
histogram.buckets=1000(直方圖中要跟蹤的延遲範圍(毫秒))
timeseries.granularity=1000(時間序列的粒度(以毫秒爲單位))

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章