記錄過程
概述
個人總結式理解,詳細的去官網看吧
- 俄羅斯搜索引擎公司Yandex研發,2016年開源的列式存儲數據庫
- 主要用於在線OLAP不支持事務所以不支持OLTP
- ClickHouse中文社區
- ClickHouse中文官網
- 優勢在於大寬表查詢,join多個大表查詢性能比不上一般的OLAP工具
- 極致性能在於極致的壓榨服務器性能
- 百億數據集的查詢都可秒級別響應
- 列式存儲,所以count等聚合查詢很快,數據壓縮比比一般存儲方式要高很多
- 建表需要指定合適的查詢引擎來達到更高的查詢性能
- 支持索引,適合在線查詢
- 併發低,官方建議100,但是有增強插件CHproxy可提高
- 任何一個sql語句都會全力使用服務器資源來執行來達到極致性能
- 近標準sql,很少部分與sql2003協議不一樣
- 副本機制保證安全
- 支持近似計算
- 支持實時數據更新,大批量更新性能更好
- 爲了提高CPU利用,設計了向量引擎
- 稀疏索引使得ClickHouse不適合通過其鍵檢索單行的點查詢
- 僅能用於批量刪除或修改數據
- 對於Ubuntu系統和Debian系統支持更好
環境
- Centos7(由於大部分生產環境用的操作系統還是Centos,故這裏也用Centos來描述)
單機安裝
ClickHouse可以在任何具有x86_64,AArch64或PowerPC64LE CPU架構的Linux,FreeBSD或Mac OS X上運行
-
檢查環境是否支持
[root@bigdata01 module]# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported" SSE 4.2 supported
-
CentOS取消打開文件數限制
分別編輯如下兩個文件
vim /etc/security/limits.conf
vim /etc/security/limits.d/20-nproc.conf
注意有些環境可能不叫20-nproc.conf,變通下,先ls /etc/security/limits.d看看叫啥名
增加如下內容,注意*號也要
* soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072
重啓服務器之後生效,用
ulimit -n
或者ulimit -a
查看設置結果 -
安裝依賴
yum install -y libtool yum install -y *unixODBC*
-
下載說明,不是要去下載,可以直接使用yum安裝,如下只是個說明
官網:https://clickhouse.yandex/
下載地址:http://repo.red-soft.biz/repos/clickhouse/stable/el7/
https://packagecloud.io/Altinity/clickhouse
這裏下載半年前的,clickHouse版本更新很快,需注意更新內容
安裝的版本:*-19.15.5.18-1.el7.x86_64.rpm
包括:
- clickhouse-test-19.15.5.18-1.el7.x86_64.rpm (測試模塊可不必安裝)
- clickhouse-server-common-19.15.5.18-1.el7.x86_64.rpm
- clickhouse-server-19.15.5.18-1.el7.x86_64.rpm
- clickhouse-debuginfo-19.15.5.18-1.el7.x86_64.rpm
- clickhouse-common-static-19.15.5.18-1.el7.x86_64.rpm
- clickhouse-client-19.15.5.18-1.el7.x86_64.rpm
-
下載yum源
curl -s https://packagecloud.io/install/repositories/Altinity/clickhouse/script.rpm.sh | sudo bash
-
Yum安裝
如下是安裝指定版本,若安裝最新版則可直接 sudo yum install -y clickhouse-server clickhouse-client
sudo yum install clickhouse-server-common-19.15.5.18-1.el7.x86_64 sudo yum install clickhouse-server-19.15.5.18-1.el7.x86_64 注意:這個會同時依賴安裝 clickhouse-common-static sudo yum install clickhouse-debuginfo-19.15.5.18-1.el7.x86_64 sudo yum install clickhouse-client-19.15.5.18-1.el7.x86_64
檢查安裝情況:
sudo yum list installed 'clickhouse*'
-
各個安裝的組件文件分佈情況
可以從https://packagecloud.io/Altinity/clickhouse點進去對應版本對應組建裏看到File的分佈情況,這裏列舉幾個關注度較高的文件目錄
/etc/clickhouse-client/config.xml /usr/bin/clickhouse-client /usr/bin/clickhouse-benchmark /etc/clickhouse-server/users.xml /etc/clickhouse-server/config.xml /usr/bin/clickhouse-server /etc/security/limits.d/clickhouse.conf /etc/init.d/clickhouse-server /etc/cron.d/clickhouse-server
常用配置
-
服務端配置
注意修改了服務端配置要重啓服務哦
配置文件在
/etc/clickhouse-server
目錄下-
users.xml 用戶配置信息。默認有個default用戶無密碼。
增加用戶的話直接參考default用戶的配置方式,也就是標籤配置方式去增加即可
-
config.xml 服務的配置信息。可修改端口號、綁定IP、安全信息等
-
-
客戶端配置
-
執行clickhouse命令時,默認會讀取/etc/clickhouse-client/config.xml配置文件進行啓動客戶端
-
可通過
-c
參數指定config.xml位置如clickhouse-client -c /opt/software/config.xml
-
/etc/clickhouse-client/config.xml記錄的是連接服務端的一些信息
-
啓動/檢查服務
service clickhouse-server start
service clickhouse-server status
[root@bigdata01 ~]# service clickhouse-server start
Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/
DONE
[root@bigdata01 ~]# service clickhouse-server status
clickhouse-server service is running
命令行客戶端
[root@bigdata01 ~]# clickhouse-client
ClickHouse client version 19.15.5.18.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.15.5 revision 54426.
bigdata01 :) show databases;
SHOW DATABASES
┌─name────┐
│ default │
│ system │
└─────────┘
2 rows in set. Elapsed: 0.001 sec.
指定端口或服務地址加參數 --port 8080 --host 127.0.0.1
分佈式集羣安裝
每臺機器都按如上單機安裝步驟安裝好的前提下
-
每臺機器修改
/etc/clickhouse-server/config.xml
<listen_host>::</listen_host> <!-- <listen_host>::1</listen_host> --> <!-- <listen_host>127.0.0.1</listen_host> -->
-
每臺機器etc目錄下新建metrika.xml文件
vim /etc/metrika.xml
添加如下內容 <yandex> <clickhouse_remote_servers> <!-- 如果是3臺集羣1個副本就叫如下標籤 --> <perftest_3shards_1replicas> <!-- 每臺機器配置 --> <shard> <internal_replication>true</internal_replication> <replica> <host>bigdata01</host> <port>19000</port> </replica> </shard> <shard> <replica> <internal_replication>true</internal_replication> <host>bigdata02</host> <port>19000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>bigdata03</host> <port>19000</port> </replica> </shard> </perftest_3shards_1replicas> </clickhouse_remote_servers> <!-- zookeeper集羣配置 --> <zookeeper-servers> <node index="1"> <host>bigdata01</host> <port>32181</port> </node> <node index="2"> <host>bigdata02</host> <port>32181</port> </node> <node index="3"> <host>bigdata03</host> <port>32181</port> </node> </zookeeper-servers> <!-- macros配置,寫當前機器host --> <macros> <replica>bigdata01</replica> </macros> <networks> <ip>::/0</ip> </networks> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
-
啓動每臺機器
注意先啓動zookeeper
卸載
-
列舉安裝了哪些模塊
[root@bigdata01 ~]# yum list installed | grep clickhouse clickhouse-client.x86_64 19.15.5.18-1.el7 @Altinity_clickhouse clickhouse-common-static.x86_64 19.15.5.18-1.el7 @Altinity_clickhouse clickhouse-debuginfo.x86_64 19.15.5.18-1.el7 @Altinity_clickhouse clickhouse-server.x86_64 19.15.5.18-1.el7 @Altinity_clickhouse clickhouse-server-common.x86_64 19.15.5.18-1.el7 @Altinity_clickhouse
-
依次卸載模塊
yum remove -y clickhouse-client.x86_64 clickhouse-common-static.x86_64 clickhouse-debuginfo.x86_64 clickhouse-server.x86_64 clickhouse-server-common.x86_64
-
再次全局檢查剩餘文件然後刪除
find / -name 'clickhouse' rm -rf 查出來的結果
-
卸載報錯時強制刪除
# 刪除rpm包的時候不調用卸載腳本 sudo rpm -e clickhouse-server.x86_64 --noscripts
性能測試
使用官網提供的航班飛行數據進行測試:19872017年的。由於存儲空間有限,故只用20002017年的數據進行測試
測試機器情況:百度雲服務器:2核/4GB/40GB/計算型c3 1Mbps
經測試如下大數據集並沒有達到機器性能極限。
如下測試,官網都有介紹
-
官網下載數據參考:https://clickhouse.tech/docs/zh/getting_started/example_datasets/ontime/
-
創建表結構(注意登陸時
clickhouse -m
如果不加-m啓用多行會報錯)
CREATE TABLE `ontime` (
`Year` UInt16,
`Quarter` UInt8,
`Month` UInt8,
`DayofMonth` UInt8,
`DayOfWeek` UInt8,
`FlightDate` Date,
`UniqueCarrier` FixedString(7),
`AirlineID` Int32,
`Carrier` FixedString(2),
`TailNum` String,
`FlightNum` String,
`OriginAirportID` Int32,
`OriginAirportSeqID` Int32,
`OriginCityMarketID` Int32,
`Origin` FixedString(5),
`OriginCityName` String,
`OriginState` FixedString(2),
`OriginStateFips` String,
`OriginStateName` String,
`OriginWac` Int32,
`DestAirportID` Int32,
`DestAirportSeqID` Int32,
`DestCityMarketID` Int32,
`Dest` FixedString(5),
`DestCityName` String,
`DestState` FixedString(2),
`DestStateFips` String,
`DestStateName` String,
`DestWac` Int32,
`CRSDepTime` Int32,
`DepTime` Int32,
`DepDelay` Int32,
`DepDelayMinutes` Int32,
`DepDel15` Int32,
`DepartureDelayGroups` String,
`DepTimeBlk` String,
`TaxiOut` Int32,
`WheelsOff` Int32,
`WheelsOn` Int32,
`TaxiIn` Int32,
`CRSArrTime` Int32,
`ArrTime` Int32,
`ArrDelay` Int32,
`ArrDelayMinutes` Int32,
`ArrDel15` Int32,
`ArrivalDelayGroups` Int32,
`ArrTimeBlk` String,
`Cancelled` UInt8,
`CancellationCode` FixedString(1),
`Diverted` UInt8,
`CRSElapsedTime` Int32,
`ActualElapsedTime` Int32,
`AirTime` Int32,
`Flights` Int32,
`Distance` Int32,
`DistanceGroup` UInt8,
`CarrierDelay` Int32,
`WeatherDelay` Int32,
`NASDelay` Int32,
`SecurityDelay` Int32,
`LateAircraftDelay` Int32,
`FirstDepTime` String,
`TotalAddGTime` String,
`LongestAddGTime` String,
`DivAirportLandings` String,
`DivReachedDest` String,
`DivActualElapsedTime` String,
`DivArrDelay` String,
`DivDistance` String,
`Div1Airport` String,
`Div1AirportID` Int32,
`Div1AirportSeqID` Int32,
`Div1WheelsOn` String,
`Div1TotalGTime` String,
`Div1LongestGTime` String,
`Div1WheelsOff` String,
`Div1TailNum` String,
`Div2Airport` String,
`Div2AirportID` Int32,
`Div2AirportSeqID` Int32,
`Div2WheelsOn` String,
`Div2TotalGTime` String,
`Div2LongestGTime` String,
`Div2WheelsOff` String,
`Div2TailNum` String,
`Div3Airport` String,
`Div3AirportID` Int32,
`Div3AirportSeqID` Int32,
`Div3WheelsOn` String,
`Div3TotalGTime` String,
`Div3LongestGTime` String,
`Div3WheelsOff` String,
`Div3TailNum` String,
`Div4Airport` String,
`Div4AirportID` Int32,
`Div4AirportSeqID` Int32,
`Div4WheelsOn` String,
`Div4TotalGTime` String,
`Div4LongestGTime` String,
`Div4WheelsOff` String,
`Div4TailNum` String,
`Div5Airport` String,
`Div5AirportID` Int32,
`Div5AirportSeqID` Int32,
`Div5WheelsOn` String,
`Div5TotalGTime` String,
`Div5LongestGTime` String,
`Div5WheelsOff` String,
`Div5TailNum` String
) ENGINE = MergeTree
PARTITION BY Year
ORDER BY (Carrier, FlightDate)
SETTINGS index_granularity = 8192;
- 下載數據(官方提供)
for s in `seq 1987 2017`
do
for m in `seq 1 12`
do
wget http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_${s}_${m}.zip
done
done
- 加載數據下載的數據(個人使用per_test庫來進行測試故注意下語句,注意host和端口)
for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | clickhouse-client --host=127.0.0.1 --p 19000 --query="INSERT INTO per_test.ontime FORMAT CSVWithNames"; done
-
查詢從2000年到2008年每天的航班數
SELECT DayOfWeek, count(*) AS c FROM ontime WHERE (Year >= 2000) AND (Year <= 2008) GROUP BY DayOfWeek ORDER BY c DESC ┌─DayOfWeek─┬───────c─┐ │ 1 │ 1024694 │ │ 3 │ 1019282 │ │ 2 │ 1015141 │ │ 5 │ 1014324 │ │ 4 │ 1013083 │ │ 7 │ 979170 │ │ 6 │ 908404 │ └───────────┴─────────┘ 7 rows in set. Elapsed: 0.042 sec. Processed 6.97 million rows, 20.92 MB (167.64 million rows/s., 502.92 MB/s.)
-
查詢從2000年到2008年每週延誤超過10分鐘的航班數
SELECT DayOfWeek, count(*) AS c FROM ontime WHERE (DepDelay > 10) AND (Year >= 2000) AND (Year <= 2008) GROUP BY DayOfWeek ORDER BY c DESC ┌─DayOfWeek─┬──────c─┐ │ 5 │ 274999 │ │ 4 │ 254490 │ │ 7 │ 238941 │ │ 1 │ 209985 │ │ 3 │ 201997 │ │ 6 │ 183685 │ │ 2 │ 178767 │ └───────────┴────────┘ 7 rows in set. Elapsed: 0.156 sec. Processed 6.97 million rows, 48.82 MB (44.71 million rows/s., 313.00 MB/s.)
-
查詢2000年到2008年每個機場延誤超過10分鐘以上的次數
SELECT Origin, count(*) AS c FROM ontime WHERE (DepDelay > 10) AND (Year >= 2000) AND (Year <= 2008) GROUP BY Origin ORDER BY c DESC LIMIT 10 ┌─Origin─┬──────c─┐ │ ORD │ 105023 │ │ ATL │ 73496 │ │ DFW │ 67485 │ │ PHX │ 66968 │ │ LAX │ 66964 │ │ LAS │ 50462 │ │ STL │ 47812 │ │ DEN │ 46164 │ │ SFO │ 43537 │ │ DTW │ 43341 │ └────────┴────────┘ 10 rows in set. Elapsed: 0.156 sec. Processed 6.97 million rows, 76.72 MB (44.59 million rows/s., 490.50 MB/s.)
-
查詢2000至2008年各航空公司延誤超過10分鐘以上的百分比
SELECT Carrier, c, c2, (c * 100) / c2 AS c3 FROM ( SELECT Carrier, count(*) AS c FROM ontime WHERE (DepDelay > 10) AND (Year >= 2000) AND (Year <= 2008) GROUP BY Carrier ) INNER JOIN ( SELECT Carrier, count(*) AS c2 FROM ontime WHERE (Year >= 2000) AND (Year <= 2008) GROUP BY Carrier ) USING (Carrier) ORDER BY c3 DESC ┌─Carrier─┬──────c─┬──────c2─┬─────────────────c3─┐ │ UA │ 262451 │ 915911 │ 28.654640025067938 │ │ AS │ 51977 │ 188884 │ 27.51794752334766 │ │ WN │ 314159 │ 1148649 │ 27.350304575200955 │ │ HP │ 69859 │ 264180 │ 26.44371262018321 │ │ US │ 185689 │ 886115 │ 20.95540646530078 │ │ AA │ 181789 │ 896349 │ 20.281051242317446 │ │ TW │ 64220 │ 319764 │ 20.08356162669969 │ │ DL │ 199886 │ 1089116 │ 18.353049629240594 │ │ NW │ 115102 │ 667317 │ 17.24847411350228 │ │ CO │ 78593 │ 474145 │ 16.575731052737034 │ │ MQ │ 17229 │ 108410 │ 15.89244534637026 │ │ AQ │ 1910 │ 15258 │ 12.518023332022546 │ └─────────┴────────┴─────────┴────────────────────┘ 12 rows in set. Elapsed: 0.186 sec. Processed 13.95 million rows, 83.69 MB (75.06 million rows/s., 450.37 MB/s.)
更好的查詢語句版本
SELECT Carrier, avg(DepDelay > 10) * 100 AS c3 FROM ontime WHERE (Year >= 2000) AND (Year <= 2008) GROUP BY Carrier ORDER BY c3 DESC ┌─Carrier─┬─────────────────c3─┐ │ UA │ 28.65464002506794 │ │ AS │ 27.517947523347665 │ │ WN │ 27.35030457520095 │ │ HP │ 26.443712620183206 │ │ US │ 20.95540646530078 │ │ AA │ 20.281051242317446 │ │ TW │ 20.08356162669969 │ │ DL │ 18.353049629240594 │ │ NW │ 17.24847411350228 │ │ CO │ 16.575731052737034 │ │ MQ │ 15.892445346370259 │ │ AQ │ 12.518023332022546 │ └─────────┴────────────────────┘ 12 rows in set. Elapsed: 0.129 sec. Processed 6.97 million rows, 55.79 MB (53.97 million rows/s., 431.75 MB/s.)
-
每年航班延誤超過10分鐘的百分比
SELECT
Year,
avg(DepDelay > 10) * 100
FROM ontime
GROUP BY Year
ORDER BY Year ASC
┌─Year─┬─multiply(avg(greater(DepDelay, 10)), 100)─┐
│ 2000 │ 23.17167181619297 │
│ 2001 │ 17.505660117222323 │
└──────┴───────────────────────────────────────────┘
2 rows in set. Elapsed: 0.084 sec. Processed 6.97 million rows, 41.84 MB (83.21 million rows/s., 499.26 MB/s.)
-
每年更受人們喜愛的目的地
SELECT DestCityName, uniqExact(OriginCityName) AS u FROM ontime WHERE (Year >= 2000) AND (Year <= 2010) GROUP BY DestCityName ORDER BY u DESC LIMIT 10 ┌─DestCityName──────────┬───u─┐ │ Chicago, IL │ 117 │ │ Dallas/Fort Worth, TX │ 115 │ │ Atlanta, GA │ 100 │ │ Minneapolis, MN │ 88 │ │ Houston, TX │ 81 │ │ Detroit, MI │ 81 │ │ St. Louis, MO │ 76 │ │ Charlotte, NC │ 70 │ │ Pittsburgh, PA │ 69 │ │ Newark, NJ │ 67 │ └───────────────────────┴─────┘ 10 rows in set. Elapsed: 0.559 sec. Processed 6.97 million rows, 322.81 MB (12.47 million rows/s., 577.07 MB/s.)
-
Q10
SELECT min(Year), max(Year), Carrier, count(*) AS cnt, sum(ArrDelayMinutes > 30) AS flights_delayed, round(sum(ArrDelayMinutes > 30) / count(*), 2) AS rate FROM ontime WHERE (DayOfWeek NOT IN (6, 7)) AND (OriginState NOT IN ('AK', 'HI', 'PR', 'VI')) AND (DestState NOT IN ('AK', 'HI', 'PR', 'VI')) AND (FlightDate < '2010-01-01') GROUP BY Carrier HAVING (cnt > 100000) AND (max(Year) > 1990) ORDER BY rate DESC LIMIT 1000 ┌─min(Year)─┬─max(Year)─┬─Carrier─┬────cnt─┬─flights_delayed─┬─rate─┐ │ 2000 │ 2001 │ UA │ 649862 │ 121889 │ 0.19 │ │ 2000 │ 2001 │ HP │ 192068 │ 27480 │ 0.14 │ │ 2000 │ 2001 │ AA │ 615877 │ 79539 │ 0.13 │ │ 2000 │ 2001 │ US │ 638984 │ 79708 │ 0.12 │ │ 2000 │ 2001 │ TW │ 224711 │ 25778 │ 0.11 │ │ 2000 │ 2001 │ WN │ 855501 │ 97260 │ 0.11 │ │ 2000 │ 2001 │ NW │ 483807 │ 52931 │ 0.11 │ │ 2000 │ 2001 │ CO │ 349268 │ 38560 │ 0.11 │ │ 2000 │ 2001 │ DL │ 764713 │ 79954 │ 0.1 │ └───────────┴───────────┴─────────┴────────┴─────────────────┴──────┘ 9 rows in set. Elapsed: 0.370 sec. Processed 6.97 million rows, 84.86 MB (18.86 million rows/s., 229.45 MB/s.)
-
Q多維度1
SELECT Year, OriginCityName, DepartureDelayGroups, CancellationCode, avg(DepDelay > 10) * 100 FROM ontime GROUP BY Year, OriginCityName, DepartureDelayGroups, CancellationCode ORDER BY Year ASC LIMIT 1 ┌─Year─┬─OriginCityName───┬─DepartureDelayGroups─┬─CancellationCode─┬─multiply(avg(greater(DepDelay, 10)), 100)─┐ │ 2000 │ Fayetteville, NC │ 7 │ │ 100 │ └──────┴──────────────────┴──────────────────────┴──────────────────┴───────────────────────────────────────────┘ 1 rows in set. Elapsed: 0.613 sec. Processed 6.97 million rows, 275.48 MB (11.38 million rows/s., 449.40 MB/s.)
綜上測試,性能極佳。OLAP分析一大神器