1. Hadoop
Hadoop是Apache軟件基金會旗下的一個開源分佈式計算平臺
由三個核心子系統組成: HDFS, YARN ,MapReduce其中HDFS是一套分佈式文件系統;YARN是資源管理系統,MapReduce是運行在YARN上的應用,負責分佈式處理管理。
HDFS:一個高度容錯性的分佈式文件系統,適合部署在大量廉價的機器上,提供高吞吐量的數據訪問。
YARN(Yet Another Resource Negotiator):資源管理器,可爲上層應用提供統一的資源管理和調度,兼容多計算框架。
MapReduce: 是一種分佈式編程模型,把對大規模數據集的處理分發(Map)給網絡上的多個節點 ,之後收集處理結果進行規約 Reduce
包括HBase(列數據庫),Cassandra(分佈式數據庫),Hive(支持SQL語句),Pig(流處理引擎),Zookeeper(分佈式應用協調服務)
1,1基於官方鏡像
docker pull sequenceiq/hadoop-docker
docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
[root@kubernetes /data/docker/elasticsearch]# docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
/
Starting sshd: [ OK ]
Starting namenodes on [9e6e76143d3d]
9e6e76143d3d: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-9e6e76143d3d.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-9e6e76143d3d.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-9e6e76143d3d.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-9e6e76143d3d.out
bash-4.1# cat /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out
ulimit -a for user root
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 6945
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
cd $HADOOP_PREFIX
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
bin/hdfs dfs -cat output/*
2,Storm
試試計算框架,Storm及羣中有兩種節點: 主節點和工作節點,主節點運行一個叫 "Nimbud"的守護進程(daemon),與Hadoop 的“任務跟蹤器” (Jobtracker )類似 .Nimbus 負責向集羣中分發代 碼,向各機器分配任務,以及監測故障 工作節點運行“Supervisor ,守護進程 ,負責監聽Nimbus 指派到機器的任務,根據指派信息來管理工作者進程( worker process ),每一個工作 者進程執行 topology 的任務子集。
2.1 使用compose搭建Storm集羣
包含如下容器
zoo keeper: Apache Zookeeper 三節點部署
nimbus: Storm Nimbus;
ui: Storm UI;
supervisor: Storm Supervisor (一個或多個)
topology: Topology 部署工具,其中示例應用基於官方示例 storm-starter 代碼構建
2.2 下載代碼
git clone https://github.com/denverdino/docker-storm.git
2.3 docker-compose.yml描述典型的Storm應用架構
version: '2'
services:
zookeeper1:
image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
container_name: zk1.cloud
environment:
- SERVER_ID=1
- ADDITIONAL_ZOOKEEPER_1=server.1=0.0.0.0:2888:3888
- ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
zookeeper2:
image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
container_name: zk2.cloud
environment:
- SERVER_ID=2
- ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_2=server.2=0.0.0.0:2888:3888
- ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
zookeeper3:
image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
container_name: zk3.cloud
environment:
- SERVER_ID=3
- ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_3=server.3=0.0.0.0:2888:3888
ui:
image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
command: ui -c nimbus.host=nimbus
environment:
- STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
restart: always
container_name: ui
ports:
- 8080:8080
depends_on:
- nimbus
nimbus:
image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
command: nimbus -c nimbus.host=nimbus
restart: always
environment:
- STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
container_name: nimbus
ports:
- 6627:6627
supervisor:
image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
command: supervisor -c nimbus.host=nimbus -c supervisor.slots.ports=[6700,6701,6702,6703]
restart: always
environment:
- affinity:role!=supervisor
- STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
depends_on:
- nimbus
topology:
build: ../storm-starter
command: -c nimbus.host=nimbus jar /topology.jar org.apache.storm.starter.RollingTopWords production-topology remote
depends_on:
- nimbus
networks:
default:
external:
name: test-storm
2.4 構建測試鏡像
docker-compose build
2.5 部署
docker-compose up -d
2.6 部署完成檢查
docker-compose ps
2.7 伸縮實例 伸縮到3個
docker-compose scale supervisor=3
2.8 確認是否正常運行
docker-compose start redis
3. Elasticsearch
Elasticsearch 支持實時分佈式數據存儲和分析查詢功能,可以輕鬆擴展到上百臺服務器,同時支持處理 PB 級結構化或非結構化數據
3.1 基於官方鏡像
docker run -d elasticsearch:7.6.1
docker run -d elasticsearch:7.6.1 elasticsearch -Des.node.name="TestNode"
3.2 使用自定義配置
docker run -d -v "$PWD/config":/usr/share/elasticsearch/config elasticsearch:7.6.1
3.3 數據持久化,需要使用數據卷
docker run -d -v "$PWD/esdata":/usr/share/elasticsearch/data elasticsearch:7.6.1
3.4 使用docker-compose搭建elasticsearch
version: '3.1'
services:
elasticsearch:
image: elasticsearch
kibana:
image: kibana
ports:
- 5601:5601