分佈式處理與大數據平臺

1. Hadoop

    Hadoop是Apache軟件基金會旗下的一個開源分佈式計算平臺

    由三個核心子系統組成: HDFS, YARN ,MapReduce其中HDFS是一套分佈式文件系統;YARN是資源管理系統,MapReduce是運行在YARN上的應用,負責分佈式處理管理。

HDFS:一個高度容錯性的分佈式文件系統,適合部署在大量廉價的機器上,提供高吞吐量的數據訪問。

YARN(Yet Another Resource Negotiator):資源管理器,可爲上層應用提供統一的資源管理和調度,兼容多計算框架。

MapReduce: 是一種分佈式編程模型,把對大規模數據集的處理分發(Map)給網絡上的多個節點 ,之後收集處理結果進行規約 Reduce

    包括HBase(列數據庫),Cassandra(分佈式數據庫),Hive(支持SQL語句),Pig(流處理引擎),Zookeeper(分佈式應用協調服務)

1,1基於官方鏡像

docker pull sequenceiq/hadoop-docker
docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
[root@kubernetes /data/docker/elasticsearch]# docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
/
Starting sshd:                                             [  OK  ]
Starting namenodes on [9e6e76143d3d]
9e6e76143d3d: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-9e6e76143d3d.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-9e6e76143d3d.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-9e6e76143d3d.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-9e6e76143d3d.out

bash-4.1# cat /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out 
ulimit -a for user root
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 6945
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
cd $HADOOP_PREFIX

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'

bin/hdfs   dfs -cat output/*

 

2,Storm

    試試計算框架,Storm及羣中有兩種節點: 主節點和工作節點,主節點運行一個叫 "Nimbud"的守護進程(daemon),與Hadoop 的“任務跟蹤器” (Jobtracker )類似 .Nimbus 負責向集羣中分發代 碼,向各機器分配任務,以及監測故障 工作節點運行“Supervisor ,守護進程 ,負責監聽Nimbus 指派到機器的任務,根據指派信息來管理工作者進程( worker process ),每一個工作 者進程執行 topology 的任務子集。

2.1 使用compose搭建Storm集羣

包含如下容器

zoo keeper: Apache Zookeeper 三節點部署

nimbus: Storm Nimbus;

ui: Storm UI;

supervisor: Storm Supervisor (一個或多個)

topology: Topology 部署工具,其中示例應用基於官方示例 storm-starter 代碼構建

2.2 下載代碼

git clone https://github.com/denverdino/docker-storm.git

2.3 docker-compose.yml描述典型的Storm應用架構

version: '2'
services:
  zookeeper1:
    image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
    container_name: zk1.cloud
    environment:
      - SERVER_ID=1
      - ADDITIONAL_ZOOKEEPER_1=server.1=0.0.0.0:2888:3888
      - ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888 
      - ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
  zookeeper2:
    image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
    container_name: zk2.cloud
    environment:
      - SERVER_ID=2
      - ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
      - ADDITIONAL_ZOOKEEPER_2=server.2=0.0.0.0:2888:3888 
      - ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
  zookeeper3:
    image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
    container_name: zk3.cloud
    environment:
      - SERVER_ID=3
      - ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
      - ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888 
      - ADDITIONAL_ZOOKEEPER_3=server.3=0.0.0.0:2888:3888
  ui:
    image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
    command: ui -c nimbus.host=nimbus
    environment:
      - STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
    restart: always
    container_name: ui
    ports:
      - 8080:8080
    depends_on:
      - nimbus
  nimbus:
    image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
    command: nimbus -c nimbus.host=nimbus
    restart: always
    environment:
      - STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
    container_name: nimbus
    ports:
      - 6627:6627
  supervisor:
    image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
    command: supervisor -c nimbus.host=nimbus -c supervisor.slots.ports=[6700,6701,6702,6703]
    restart: always
    environment:
      - affinity:role!=supervisor
      - STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
    depends_on:
      - nimbus
  topology:
    build: ../storm-starter
    command: -c nimbus.host=nimbus jar /topology.jar org.apache.storm.starter.RollingTopWords production-topology remote
    depends_on:
      - nimbus
networks:
  default:
    external: 
      name: test-storm

2.4 構建測試鏡像

docker-compose build

2.5 部署

docker-compose up -d

2.6 部署完成檢查

docker-compose ps 

2.7 伸縮實例 伸縮到3個

docker-compose scale supervisor=3

2.8 確認是否正常運行

 docker-compose  start redis

 

3. Elasticsearch

     Elasticsearch 支持實時分佈式數據存儲和分析查詢功能,可以輕鬆擴展到上百臺服務器,同時支持處理 PB 級結構化或非結構化數據
3.1 基於官方鏡像

docker run -d elasticsearch:7.6.1
docker run -d elasticsearch:7.6.1 elasticsearch -Des.node.name="TestNode"

3.2 使用自定義配置

docker run -d -v "$PWD/config":/usr/share/elasticsearch/config elasticsearch:7.6.1

3.3 數據持久化,需要使用數據卷

docker run -d -v "$PWD/esdata":/usr/share/elasticsearch/data elasticsearch:7.6.1

3.4 使用docker-compose搭建elasticsearch

version: '3.1'
services:
elasticsearch:
    image: elasticsearch
kibana:
    image: kibana
    ports:
        - 5601:5601

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章