YCSB workload工作負載參數設置

YCSB作爲一個工作負載測試工具，參數設置很重要，通過設置不同的read，update或者是insert比例得到的測試時間是不同的。

主要是通過以下命令來加載workloada工作負載文件測試具體數據庫性能：

bin/ycsb load DBname -s -P workloads/workloada

下面我提下YCSB工作負載參數具體的設置和說明，這裏是(YCSB/workloads/workloada文件中設置)：

# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount = 1000（表示load和run操作中，使用的YCSB實例記錄數）
operationcount = 1000（表示load和run操作中，使用的YCSB實例操作數）
workload = com.yahoo.ycsb.workloads.CoreWorkload（要使用的工作負載類）
readproportion = 0.5（默認是0.95，表示的是進行read的操作佔所有操作的比例）
updateproportion = 0.5（默認是0.05，表示的是進行update的操作佔所有操作的比例）
insertproportion = 0（默認是0，表示的是進行insert的操作佔所有操作的比例）
scanproportion = 0（默認是0，表示的是進行scan的操作佔所有操作的比例）
requestdistribution = zipfian（默認是uniform，應該使用什麼分佈來選擇要操作的記錄：uniform, zipfian, hotspot, sequential, exponential 和 latest）
threadcount = 2（默認值是1，表示YCSB客戶端線程數）
readallfields = true（默認值是1，應該讀取讀取所有字段（true），只讀取一個（false））

下面是YCSB/workloads/wordloadb文件對應的參數設置，主要是以讀爲主要工作負載測試的參數設置（主要區別就是readproportion和updateproportion對應比例）。

# Yahoo! Cloud System Benchmark
# Workload B: Read mostly workload
#   Application example: photo tagging; add a tag is an update, but most operations are to read tags
#                        
#   Read/update ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0.05
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadc文件對應的參數設置，主要是以讀爲全部工作負載測試的參數設置（主要區別就是readproportion對應比例）。

# Yahoo! Cloud System Benchmark
# Workload C: Read only
#   Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop)
#                        
#   Read/update ratio: 100/0
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadd文件對應的參數設置，主要是以讀爲全部工作負載測試的參數設置（主要區別就是readproportion、insertproportion對應比例）。

# Yahoo! Cloud System Benchmark
# Workload D: Read latest workload
#   Application example: user status updates; people want to read the latest
#                        
#   Read/update/insert ratio: 95/0/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: latest

# The insert order for this is hashed, not ordered. The "latest" items may be 
# scattered around the keyspace if they are keyed by userid.timestamp. A workload
# which orders items purely by time, and demands the latest, is very different than 
# workload here (which we believe is more typical of how people build systems.)
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0
scanproportion=0
insertproportion=0.05
requestdistribution=latest

下面YCSB/workloads/wordloade文件對應的參數設置，針對短區間生成的測試工作負載主要設置（scanproportion參數和insertproportion參數）：

# Yahoo! Cloud System Benchmark
# Workload E: Short ranges
#   Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id)                     
#   Scan/insert ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
# The insert order is hashed, not ordered. Although the scans are ordered, it does not necessarily
# follow that the data is inserted in order. For example, posts for thread 342 may not be inserted contiguously, but
# instead interspersed with posts from lots of other threads. The way the YCSB client works is that it will pick a start
# key, and then request a number of records; this works fine even for hashed insertion.
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100（默認值是1000，掃描的最大記錄數是多少）
scanlengthdistribution=uniform（默認是uniform，對於掃描應使用什麼分佈來選擇要掃描的記錄數，對於每次掃描在1和maxscanlength之間）

下面YCSB/workloads/wordloadf文件對應的參數設置，針對讀入修改寫回生成的測試工作負載主要設置（readproportion參數和readmodifywriteproportion參數）：

# Yahoo! Cloud System Benchmark
# Workload F: Read-modify-write workload
#   Application example: user database, where user records are read and modified by the user or to record user activity.
#                        
#   Read/read-modify-write ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0
scanproportion=0
insertproportion=0
readmodifywriteproportion=0.5（默認值是0，指的是讀取記錄，修改它，寫回來的操作比例）
requestdistribution=zipfian

下面YCSB/workloads/tswordloada文件對應的參數設置，針對生成的測試工作負載主要設置（readproportion參數和readmodifywriteproportion參數）：

# Yahoo! Cloud System Benchmark
# Workload A: Small cardinality consistent data for 2 days
#   Application example: Typical monitoring of a single compute or small 
#   sensor station where 90% of the load is write and only 10% is read 
#   (it's usually much less). All writes are inserts. No sparsity so 
#   every series will have a value at every timestamp.
#
#   Read/insert ratio: 10/90
#   Cardinality: 16 per key (field), 64 fields for a total of 1,024 
#                time series.
workload=com.yahoo.ycsb.workloads.TimeSeriesWorkload
recordcount=1474560
operationcount=2949120
fieldlength=8（默認值是100，字段的大小）
fieldcount=64(默認值是10，記錄中的字段數)
tagcount=4（每個時間系列的唯一標記組合數，如果此值爲4，則每條記錄將包含一個鍵和4個標記組合，例如A = A，B = A，C = A，D = A.）
tagcardinality=1,2,4,2（每個“度量”或字段的每個標記值的基數（唯一值的數量），以逗號分隔的列表。 每個值必須是從1到Java的Integer.MAX_VALUE的數字，並且必須有'tagcount'值。 如果有多於或少於'tagcount'的值，則忽略它或分別替換1。）
# A value from 0 to 0.999999 representing how sparse each time series
# should be. The higher this value, the greater the time interval between
# values in a single series. For example, if sparsity is 0 and there are
# 10 time series with a 'timestampinterval' of 60 seconds with a total
# time range of 10 intervals, you would see 100 values written, one per
# timestamp interval per time series. If the sparsity is 0.50 then there
# would be only about 50 values written so some time series would have
# missing values at each interval.
sparsity=0.0
# The percentage of time series that are "lagging" behind the current
# timestamp of the writer. This is used to mimic a common behavior where
# most sources (agents, sensors, etc) are writing data in sync (same timestamp)
# but a subset are running behind due to buffering, latency issues, etc.
delayedSeries=0.0
# The maximum amount of delay for delayed series in interval counts. The 
# actual delay is chosen based on a modulo of the series index.
delayedIntervals=0
timestampunits=SECONDS（時間單位）
# The amount of time between each value in every time series in
# the units of 'timestampunits'.
timestampinterval=60
# The fixed or maximum amount of time added to the start time of a 
# read or scan operation to generate a query over a range of time 
# instead of a single timestamp. Units are shared with 'timestampunits'.
# For example if the value is set to 3600 seconds (1 hour) then 
# each read would pick a random start timestamp based on the 
#'insertstart' value and number of intervals, then add 3600 seconds
# to create the end time of the query. If this value is 0 then reads
# will only provide a single timestamp. 
querytimespan=3600
readproportion=0.10
updateproportion=0.00
insertproportion=0.90

還有一些重要的設置參數在這些負載中沒有設置例如template中：

insertstart=0（第一個插入值的偏移量）
writeallfields=false（在更新的時候寫所有字段）
fieldlengthdistribution=constant（字段長度分佈形式，有constant，zipfian，uniform）insertorder=hashed（記錄是按順序插入還是僞隨機插入，hashed還是ordered）
hotspotdatafraction=0.2（構成熱集的數據項的百分比）
hotspotopnfraction=0.8（訪問熱集的操作百分比）
table=usertable（要對其運行查詢的數據庫表的名稱）
#當measurementtype設置爲raw時，測量將以以下csv格式輸出爲RAW數據點：“操作，測量的時間戳，我們的延遲”原始數據點在測試運行時收集在內存中。 每個數據點消耗大約50個字節（包括java對象開銷）。 對於典型的100萬到1000萬次操作，大多數時候這應該適合存儲器。 如果您計劃每次運行執行數百萬次操作，請考慮在使用RAW測量類型時配置具有更大RAM的計算機，或者將運行拆分爲多次運行。
#（可選）可以指定輸出文件以保存原始數據點。否則，原始數據點將寫入stdout。如果輸出文件已存在，將附加輸出文件，否則將創建新的輸出文件.measurement.raw.output_file =/tmp/ your_output_file_for_this_run
measurementtype=histogram（如何呈現延遲測量timeseries，histogram，raw）
measurement.histogram.verbose = false（使用直方圖進行測量時是否發出單獨的直方圖桶）
histogram.buckets=1000(直方圖中要跟蹤的延遲範圍（毫秒）)
timeseries.granularity=1000(時間序列的粒度（以毫秒爲單位）)

YCSB workload工作負載參數設置

高級體系結構之flash storage

自己編寫的數據庫如何和mapkeeper相連進行評測

scala sbt assembly安裝

leveldb和mapkeeper相連參數設置

hadoop中Name node is in safe mode問題

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結