Skwalking高可用集羣安裝部署(支持Nacos動態配置)

本文只針對基於linux虛擬機的skywalking安裝部署,jdk8、elasticsearch、nacos的安裝請參考其他文章。

一、簡介

Skywalking 是一個APM系統,即應用性能監控系統,爲微服務架構和雲原生架構系統設計。它通過探針自動收集所需的指標,並進行分佈式追蹤。通過這些調用鏈路以及指標,Skywalking APM會感知應用間關係和服務間關係,並進行相應的指標統計。

官網地址: http://skywalking.apache.org/zh/

github地址: https://github.com/apache/skywalking

 

二、安裝準備

1、集羣準備

集羣 別名 IP 版本 官網地址
elasticsearch elasticsearch1 10.0.8.115 es7.6.2

https://www.elastic.co/cn/downloads/elasticsearch

elasticsearch2 10.0.8.116
elasticsearch3 10.0.8.117
skywalking skywalking1 10.0.36.211 skywalking6.6.0

https://skywalking.apache.org/downloads/

skywalking2 10.0.36.229
skywalking3 10.0.36.197
nacos nacos1 10.0.31.131 nacos1.1.3 https://nacos.io/zh-cn/docs/quick-start.html
nacos2 10.0.31.71
nacos3 10.0.31.119

 

 

三、源碼包目錄解析

1、目錄解析

apache-skywalking-apm-bin-es7/

├── agent                    #skywalking agent目錄,用於agent部署安裝 
├── bin                      #啓動腳本,內含skywakling
│    ├── oapService.bat              #oap初始化啓動腳本windows
│    ├── oapServiceInit.bat          #oap初始化腳本windows
│    ├── oapServiceInit.sh           #oap初始化腳本linux
│    ├── oapServiceNoInit.bat        #oap無需初始化啓動腳本windows
│    ├── oapServiceNoInit.sh         #oap無需初始化啓動腳本linux
│    ├── oapService.sh               #oap初始化啓動腳本windows
│    ├── startup.bat                 #skywalking啓動腳本windows
│    ├── startup.sh                  #skywalking啓動腳本linux
│    ├── webappService.bat           #UI啓動腳本windows
│    └── webappService.sh            #UI啓動腳本linux
├── config                   #配置文件目錄
│    ├── alarm-settings-sample.yml       #告警配置示例(不生效)
│    ├── alarm-settings.yml              #告警配置
│    ├── application.yml                 #oap服務配置
│    ├── component-libraries.yml         #組件庫配置,定義被監控應用中使用的所有組件庫
│    ├── gateways.yml                    #網關配置
│    ├── log4j2.xml                      #日誌配置
│    ├── official_analysis.oal           #數據分析指標配置
│    └── service-apdex-threshold.yml     #閥值配置
├── LICENSE
├── licenses 
├── NOTICE
├── oap-libs                 #oap依賴,不作展開
├── README.txt  
└── webapp                   #UI jar包
      ├── skywalking-webapp.jar            #UI jar包
      └── webapp.yml                       #UI配置文件

四、集羣架構

1、架構說明

此架構採用es7集羣作爲數據存儲、nacos集羣作爲服務註冊中心兼配置中心。

優點:集羣高可用,任意一節點損壞集羣運行正常,確保服務不會因故障而停止運行

缺點:若elasticsearch集羣節點故障且數據量龐大的情況下,elasticsearch節點完全恢復比較長

五、集羣部署

1、elasticsearch安裝部署(略)

2、nacos安裝部署(略)

3、skywalking集羣部署

(1)下載安裝包

我們使用的es集羣是7.6.2,所以需要下載支持es7的版本安裝包image.png

1、下載 
# wget https://mirror.bit.edu.cn/apache/skywalking/6.6.0/apache-skywalking-apm-es7-6.6.0.tar.gz

2、解壓
# tar -xvzf apache-skywalking-apm-es7-6.6.0.tar.gz

3、路徑調整
# mkdir -p /data/skywalking
# mv apache-skywalking-apm-bin-es7/* /data/skywalking

(2)配置JDK8(略)

(3)編輯hosts文件

vim /etc/hosts

10.0.8.115     elasticsearch1
10.0.8.116     elasticsearch2
10.0.8.117     elasticsearch3
10.0.36.211    skywalking1
10.0.36.229    skywalking2
10.0.36.197    skywalking3
10.0.31.131    nacos1
10.0.31.71     nacos2
10.0.31.119    nacos3

(4)配置oap

I、編輯配置

#vim application.yml
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

cluster:
#######standalone是單機模式,需要註釋
#  standalone:   
  # Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+
  # library the oap-libs folder with your ZooKeeper 3.4.x library.
#  zookeeper:
#    nameSpace: ${SW_NAMESPACE:""}
#    hostPort: ${SW_CLUSTER_ZK_HOST_PORT:localhost:2181}
#    #Retry Policy
#    baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:1000} # initial amount of time to wait between retries
#    maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:3} # max number of times to retry
#    # Enable ACL
#    enableACL: ${SW_ZK_ENABLE_ACL:false} # disable ACL in default
#    schema: ${SW_ZK_SCHEMA:digest} # only support digest schema
#    expression: ${SW_ZK_EXPRESSION:skywalking:skywalking}
#  kubernetes:
#    watchTimeoutSeconds: ${SW_CLUSTER_K8S_WATCH_TIMEOUT:60}
#    namespace: ${SW_CLUSTER_K8S_NAMESPACE:default}
#    labelSelector: ${SW_CLUSTER_K8S_LABEL:app=collector,release=skywalking}
#    uidEnvName: ${SW_CLUSTER_K8S_UID:SKYWALKING_COLLECTOR_UID}
#  consul:
#    serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
#     Consul cluster nodes, example: 10.0.0.1:8500,10.0.0.2:8500,10.0.0.3:8500
#    hostPort: ${SW_CLUSTER_CONSUL_HOST_PORT:localhost:8500}
#####打開nacos設置
  nacos:
    serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
#####此處若無域名,可以設置"nacos1:8848,nacos2:8848,nacos3:8848"
    hostPort: ${SW_CLUSTER_NACOS_HOST_PORT:nacos4.abc.com:80}
#  # Nacos Configuration namespace
    namespace: 'public'
#  etcd:
#    serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
#     etcd cluster nodes, example: 10.0.0.1:2379,10.0.0.2:2379,10.0.0.3:2379
#    hostPort: ${SW_CLUSTER_ETCD_HOST_PORT:localhost:2379}
core:
  default:
    # Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate
    # Receiver: Receive agent data, Level 1 aggregate
    # Aggregator: Level 2 aggregate
    role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
    restHost: ${SW_CORE_REST_HOST:0.0.0.0}
    restPort: ${SW_CORE_REST_PORT:12800}
    restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/}
    gRPCHost: ${SW_CORE_GRPC_HOST:skywalking2}
    gRPCPort: ${SW_CORE_GRPC_PORT:11800}
    downsampling:
      - Hour
      - Day
      - Month
    # Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
    enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
    dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} # How often the data keeper executor runs periodically, unit is minute
    recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
    minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
    hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
    dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
    monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
    # Cache metric data for 1 minute to reduce database queries, and if the OAP cluster changes within that minute,
    # the metrics may not be accurate within that minute.
    enableDatabaseSession: ${SW_CORE_ENABLE_DATABASE_SESSION:true}
    topNReportPeriod: ${SW_CORE_TOPN_REPORT_PERIOD:10} # top_n record worker report cycle, unit is minute
storage:
#  elasticsearch:
#    nameSpace: ${SW_NAMESPACE:""}
#    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
#    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
#    trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:"../es_keystore.jks"}
#    trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
#    user: ${SW_ES_USER:""}
#    password: ${SW_ES_PASSWORD:""}
#    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
#    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
#    # Those data TTL settings will override the same settings in core module.
#    recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
#    otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
#    monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
#    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
#    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the bulk every 1000 requests
#    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
#    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
#    resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
#    metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
#    segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
######開啓elasticsearch7設置
  elasticsearch7:
    nameSpace: ${SW_NAMESPACE:"skywalking_prod"}
    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:elasticsearch1:9200,elasticsearch2:9200,elasticsearch3:9200}
#    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
#    trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:"../es_keystore.jks"}
#    trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
#    user: ${SW_ES_USER:""}
#    password: ${SW_ES_PASSWORD:""}
    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:1}
    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
#    # Those data TTL settings will override the same settings in core module.
    recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
    otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
    monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
#    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the bulk every 1000 requests
#    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
    resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:1000}
    metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
    segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
########此處默認開啓,需要註釋
#  h2:
#    driver: ${SW_STORAGE_H2_DRIVER:org.h2.jdbcx.JdbcDataSource}
#    url: ${SW_STORAGE_H2_URL:jdbc:h2:mem:skywalking-oap-db}
#    user: ${SW_STORAGE_H2_USER:sa}
#    metadataQueryMaxSize: ${SW_STORAGE_H2_QUERY_MAX_SIZE:5000}
#  mysql:
#    properties:
#      jdbcUrl: ${SW_JDBC_URL:"jdbc:mysql://localhost:3306/swtest"}
#      dataSource.user: ${SW_DATA_SOURCE_USER:root}
#      dataSource.password: ${SW_DATA_SOURCE_PASSWORD:root@1234}
#      dataSource.cachePrepStmts: ${SW_DATA_SOURCE_CACHE_PREP_STMTS:true}
#      dataSource.prepStmtCacheSize: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_SIZE:250}
#      dataSource.prepStmtCacheSqlLimit: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_LIMIT:2048}
#      dataSource.useServerPrepStmts: ${SW_DATA_SOURCE_USE_SERVER_PREP_STMTS:true}
#    metadataQueryMaxSize: ${SW_STORAGE_MYSQL_QUERY_MAX_SIZE:5000}
receiver-sharing-server:
  default:
receiver-register:
  default:
receiver-trace:
  default:
######設置爲絕對,否則會相對啓動服務時所在目錄創建文件夾
    bufferPath: ${SW_RECEIVER_BUFFER_PATH:/data/skywalking/trace-buffer/}  # Path to trace buffer files, suggest to use absolute path
    bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
    bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
    bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
    sampleRate: ${SW_TRACE_SAMPLE_RATE:1000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
    slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100} # The slow database access thresholds. Unit ms.
receiver-jvm:
  default:
receiver-clr:
  default:
service-mesh:
  default:
######設置爲絕對,否則會相對啓動服務時所在目錄創建文件夾
    bufferPath: ${SW_SERVICE_MESH_BUFFER_PATH:/data/skywalking/mesh-buffer/}  # Path to trace buffer files, suggest to use absolute path
    bufferOffsetMaxFileSize: ${SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
    bufferDataMaxFileSize: ${SW_SERVICE_MESH_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
    bufferFileCleanWhenRestart: ${SW_SERVICE_MESH_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
istio-telemetry:
  default:
envoy-metric:
  default:
#    alsHTTPAnalysis: ${SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS:k8s-mesh}
#receiver_zipkin:
#  default:
#    host: ${SW_RECEIVER_ZIPKIN_HOST:0.0.0.0}
#    port: ${SW_RECEIVER_ZIPKIN_PORT:9411}
#    contextPath: ${SW_RECEIVER_ZIPKIN_CONTEXT_PATH:/}
query:
  graphql:
    path: ${SW_QUERY_GRAPHQL_PATH:/graphql}
alarm:
  default:
telemetry:
  none:
configuration:
#####此處設置配置方式,採用nacos配置,註釋none
#  none:
#  apollo:
#    apolloMeta: http://106.12.25.204:8080
#    apolloCluster: default
#    # apolloEnv: # defaults to null
#    appId: skywalking
#    period: 5

######開啓nacos配置
  nacos:
    # Nacos Server Host
    serverAddr: nacos1,nacos2,nacos3
    # Nacos Server Port
    port: 8848
    # Nacos Configuration Group
    group: 'skywalking'
    # Nacos Configuration namespace
    namespace: ''
    # Unit seconds, sync period. Default fetch every 60 seconds.
    period : 60
    # the name of current cluster, set the name if you want to upstream system known.
    clusterName: "default"
#  zookeeper:
#    period : 60 # Unit seconds, sync period. Default fetch every 60 seconds.
#    nameSpace: /default
#    hostPort: localhost:2181
#    #Retry Policy
#    baseSleepTimeMs: 1000 # initial amount of time to wait between retries
#    maxRetries: 3 # max number of times to retry
#  etcd:
#    period : 60 # Unit seconds, sync period. Default fetch every 60 seconds.
#    group :  'skywalking'
#    serverAddr: localhost:2379
#    clusterName: "default"
#  consul:
#    # Consul host and ports, separated by comma, e.g. 1.2.3.4:8500,2.3.4.5:8500
#    hostAndPorts: ${consul.address}
#    # Sync period in seconds. Defaults to 60 seconds.
#    period: 1

#exporter:
#  grpc:
#    targetHost: ${SW_EXPORTER_GRPC_HOST:127.0.0.1}
#    targetPort: ${SW_EXPORTER_GRPC_PORT:9870}

II、nacos添加配置

配置規則

Config Key

Value Description

Value Format Example

receiver-trace.default.slowDBAccessThreshold

Thresholds of slow Database statement, override receiver-trace/default/slowDBAccessThreshold of applciation.yml.

default:200,mongodb:50

receiver-trace.default.uninstrumentedGateways

The uninstrumented gateways, override gateways.yml.

same as gateways.yml

alarm.default.alarm-settings

The alarm settings, will override alarm-settings.yml.

same as alarm-settings.yml

core.default.apdexThreshold

The apdex threshold settings, will override service-apdex-threshold.yml.

same as service-apdex-threshold.yml

image.png

註釋:receiver-trace.default.slowDBAccessThreshold使用text格式,其餘使用yml格式

III、執行初始化腳本

#只需要一臺服務器執行
# sh /data/skywalking/bin/oapServiceInit.sh

IV、實時查看初始化情況

# tail -f logs/skywalking-oap-server.log

此圖表示oap初始化成功

image.png

(5)配置UI

#vim webapp/webapp.yml

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

server:
  port: 8080

collector:
  path: /graphql
  ribbon:
    ReadTimeout: 10000
    # Point to all backend's restHost:restPort, split by ,
    #配置oap集羣
    listOfServers:  skywalking1:12800,skywalking2:12800,skywalking3:12800

(6)調整JVM參數

編輯如下啓動腳本:bin/oapServiceInit.sh  bin/oapServiceNoInit.sh  bin/oapService.sh

設置JAVA_OPTS=" -Xms2048M -Xmx4096M"

註釋:bin/webappService.sh的JAVA_OPTS可自行調整

(7)啓動

#啓動oap服務
#sh bin/oapServiceNoInit.sh

#啓動ui服務
#sh bin/webappService.sh

(8)驗證

打開瀏覽器訪問 http://{skywalking}:8080/

image.png

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章