前言
最近公司有對服務器進行性能監控的需求在查閱大量資料後。本人將從零開始一步步演示在Centos上搭建監控系統平臺
1. node_exporter
爲監控線上服務器CPU、內存、磁盤、IO等信息需要藉助node_exporter完成以上機器信息收集在下面你將會瞭解到:
- 本地搭建一個node_exporter
- 將搭建好的node_exporter與prometheus配合使用
我們可以從github查找自己需要的node_exporter版本進行下載。
下載地址如下:
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
解壓安裝:
tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz -C /soft/service
mv node_exporter-0.18.1.linux-amd64 node_exporter
進入node_exporter目錄運行服務:
cd node_exporter
./node_exporter
啓動後你將會看到下面的顯示的界面且可以清晰看到此服務運行綁定端口號爲9100如下所示:
瀏覽器可以訪問此端口以查看效果(需開放此端口或者關閉防火牆):
2. prometheus
Prometheus是一個開源的系統監控和警報工具包,rometheus使用Go語言開發,是Google BorgMon監控系統的開源版本。
官方文檔展示的架構圖如下所示:
官方下載地址:https://prometheus.io/download/
下載prometheus安裝包:
wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz
解壓安裝:
tar -zxvf prometheus-2.18.1.linux-amd64.tar.gz -C /soft/service
mv prometheus-2.18.1.linux-amd64 prometheus
配置 prometheus監控目標:
cd prometheus
vim prometheus.yml
在配置文件中添加如下內容:
- job_name: 'export_test2'
static_configs:
- targets: ['10.211.55.5:9100']
labels:
instance: 'node2'
然後我們瀏覽器訪問此9100端口如果出現以下界面,恭喜你已經成功安裝好了prometheus。
然後我們輸入一個 node_cpu_seconds_total
命令並點擊Execute 查看是否有數據輸出如下所示:
注意事項
-
prometheus默認採用的是本地磁盤做數據存儲,本地存儲的優勢就是運維簡單但是缺點就是無法海量的metrics持久化和數據存在丟失的風險,數據寫入可能造成wal文件損壞導致採集數據無法再寫入的問題。
-
爲了解決單節點存儲的限制,prometheus沒有自己實現集羣存儲,而是提供了遠程讀寫的接口,讓用戶自己選擇合適的時序數據庫來實現prometheus的擴展性。
-
Prometheus 提供接口將數據寫入到第三方存儲系統亦提供接口讀取第三方存儲系統存儲的數據原理如下所示:
接下來我們將node_exporter收集到Prometheus的數據持久化到influxdb數據庫中。
3. influxdb
InfluxDB(時序數據庫)常用的一種使用場景:服務器監控數據統計然後將數據統計彙總並藉助Grafana進行圖形化展示
官方下載地址:https://portal.influxdata.com/downloads/
下載安裝包:
wget https://dl.influxdata.com/influxdb/releases/influxdb-1.5.2.x86_64.rpm
安裝rpm包
sudo yum localinstal linfluxdb-1.5.2.x86_64.rpm
啓動服務並設置開機啓動:
# 啓動InfluxDB服務、添加開機啓動:
systemctl start influxdb
systemctl enable influxdb
當安裝完畢後輸入influx 然後就彈出如下界面:
接下來可以創建數據庫以及用戶
# 創建名稱爲prometheus的數據庫實例
1.create database prometheus
# 切換數據庫實例prometheus
2. use prometheus
# 創建用戶名和密碼都爲node的用戶,注意密碼只能用''字符否則influxdb將會報錯
3.create user "node" with password 'node'
4. Remote storage adapter
這個是prometheus 官方提供的寫適配器插件,通過Prometheus的遠程寫協議接收樣本,並將它們存儲在Graphite, InfluxDB, or OpenTSDB 中。
下載此插件需要機器擁有go環境,這樣就可以自主編譯remote_storage_adapter插件,關於go環境配置這裏就不過多介紹讀者可以從網上很多博客找到相關素材。
編譯好後即可運行插件,如果沒有go環境也不想編譯此組件,也可以下載這個編譯好的組件:remote_storage_adapter:
./remote_storage_adapter --influxdb-url=http://127.0.0.1:8086/ --influxdb.database="prometheus" --influxdb.retention-policy=autogen
配置 prometheus.yml
# 遠程寫配置
remote_write:
- url: "http://localhost:9201/write"
# 配置連接influxdb連接的用戶名與密碼
basic_auth:
username: node
password: node
# 遠程讀配置
remote_read:
- url: "http://localhost:9201/read"
basic_auth:
username: node
password: node
然後重啓prometheus 然後可以在啓動日誌可以看到如下輸出:
然後我們在進入influx查看是否已擁有數據:
[root@donniegao prometheus-2.17.1.linux-amd64]# influx
Connected to http://localhost:8086 version 1.5.2
InfluxDB shell version: 1.5.2
> use prometheus
Using database prometheus
> show measurements
name: measurements
name
----
_
go_gc_duration_seconds
go_gc_duration_seconds_count
go_gc_duration_seconds_sum
go_goroutines
go_info
go_memstats_alloc_bytes
go_memstats_alloc_bytes_total
go_memstats_buck_hash_sys_bytes
go_memstats_frees_total
go_memstats_gc_cpu_fraction
go_memstats_gc_sys_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_idle_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_objects
go_memstats_heap_released_bytes
go_memstats_heap_sys_bytes
go_memstats_last_gc_time_seconds
go_memstats_lookups_total
go_memstats_mallocs_total
go_memstats_mcache_inuse_bytes
go_memstats_mcache_sys_bytes
go_memstats_mspan_inuse_bytes
go_memstats_mspan_sys_bytes
go_memstats_next_gc_bytes
go_memstats_other_sys_bytes
go_memstats_stack_inuse_bytes
go_memstats_stack_sys_bytes
go_memstats_sys_bytes
go_threads
net_conntrack_dialer_conn_attempted_total
net_conntrack_dialer_conn_closed_total
net_conntrack_dialer_conn_established_total
net_conntrack_dialer_conn_failed_total
net_conntrack_listener_conn_accepted_total
net_conntrack_listener_conn_closed_total
node_arp_entries
node_boot_time_seconds
node_context_switches_total
node_cpu_guest_seconds_total
node_cpu_seconds_total
node_disk_io_now
node_disk_io_time_seconds_total
node_disk_io_time_weighted_seconds_total
node_disk_read_bytes_total
node_disk_read_time_seconds_total
node_disk_reads_completed_total
node_disk_reads_merged_total
node_disk_write_time_seconds_total
node_disk_writes_completed_total
node_disk_writes_merged_total
node_disk_written_bytes_total
node_entropy_available_bits
node_exporter_build_info
node_filefd_allocated
node_filefd_maximum
node_filesystem_avail_bytes
node_filesystem_device_error
node_filesystem_files
node_filesystem_files_free
node_filesystem_free_bytes
node_filesystem_readonly
node_filesystem_size_bytes
node_forks_total
node_hwmon_chip_names
node_hwmon_sensor_label
node_hwmon_temp_celsius
node_hwmon_temp_crit_alarm_celsius
node_hwmon_temp_crit_celsius
node_hwmon_temp_max_celsius
node_intr_total
node_load1
node_load15
node_load5
node_memory_Active_anon_bytes
node_memory_Active_bytes
node_memory_Active_file_bytes
node_memory_AnonHugePages_bytes
node_memory_AnonPages_bytes
node_memory_Bounce_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes
node_memory_CmaFree_bytes
node_memory_CmaTotal_bytes
node_memory_CommitLimit_bytes
node_memory_Committed_AS_bytes
node_memory_DirectMap2M_bytes
node_memory_DirectMap4k_bytes
node_memory_Dirty_bytes
node_memory_HardwareCorrupted_bytes
node_memory_HugePages_Free
node_memory_HugePages_Rsvd
node_memory_HugePages_Surp
node_memory_HugePages_Total
node_memory_Hugepagesize_bytes
node_memory_Inactive_anon_bytes
node_memory_Inactive_bytes
node_memory_Inactive_file_bytes
node_memory_KernelStack_bytes
node_memory_Mapped_bytes
node_memory_MemAvailable_bytes
node_memory_MemFree_bytes
node_memory_MemTotal_bytes
node_memory_Mlocked_bytes
node_memory_NFS_Unstable_bytes
node_memory_PageTables_bytes
node_memory_SReclaimable_bytes
node_memory_SUnreclaim_bytes
node_memory_Shmem_bytes
node_memory_Slab_bytes
node_memory_SwapCached_bytes
node_memory_SwapFree_bytes
node_memory_SwapTotal_bytes
node_memory_Unevictable_bytes
node_memory_VmallocChunk_bytes
node_memory_VmallocTotal_bytes
node_memory_VmallocUsed_bytes
node_memory_WritebackTmp_bytes
node_memory_Writeback_bytes
node_netstat_Icmp6_InErrors
node_netstat_Icmp6_InMsgs
node_netstat_Icmp6_OutMsgs
node_netstat_Icmp_InErrors
node_netstat_Icmp_InMsgs
node_netstat_Icmp_OutMsgs
node_netstat_Ip6_InOctets
node_netstat_Ip6_OutOctets
node_netstat_IpExt_InOctets
node_netstat_IpExt_OutOctets
node_netstat_Ip_Forwarding
node_netstat_TcpExt_ListenDrops
node_netstat_TcpExt_ListenOverflows
node_netstat_TcpExt_SyncookiesFailed
node_netstat_TcpExt_SyncookiesRecv
node_netstat_TcpExt_SyncookiesSent
node_netstat_TcpExt_TCPSynRetrans
node_netstat_Tcp_ActiveOpens
node_netstat_Tcp_CurrEstab
node_netstat_Tcp_InErrs
node_netstat_Tcp_InSegs
node_netstat_Tcp_OutSegs
node_netstat_Tcp_PassiveOpens
node_netstat_Tcp_RetransSegs
node_netstat_Udp6_InDatagrams
node_netstat_Udp6_InErrors
node_netstat_Udp6_NoPorts
node_netstat_Udp6_OutDatagrams
node_netstat_UdpLite6_InErrors
node_netstat_UdpLite_InErrors
node_netstat_Udp_InDatagrams
node_netstat_Udp_InErrors
node_netstat_Udp_NoPorts
node_netstat_Udp_OutDatagrams
node_network_address_assign_type
node_network_carrier
node_network_carrier_changes_total
node_network_device_id
node_network_dormant
node_network_flags
node_network_iface_id
node_network_iface_link
node_network_iface_link_mode
node_network_info
node_network_mtu_bytes
node_network_net_dev_group
node_network_protocol_type
node_network_receive_bytes_total
node_network_receive_compressed_total
node_network_receive_drop_total
node_network_receive_errs_total
node_network_receive_fifo_total
node_network_receive_frame_total
node_network_receive_multicast_total
node_network_receive_packets_total
node_network_transmit_bytes_total
node_network_transmit_carrier_total
node_network_transmit_colls_total
node_network_transmit_compressed_total
node_network_transmit_drop_total
node_network_transmit_errs_total
node_network_transmit_fifo_total
node_network_transmit_packets_total
node_network_transmit_queue_length
node_network_up
node_procs_blocked
node_procs_running
node_scrape_collector_duration_seconds
node_scrape_collector_success
node_sockstat_FRAG_inuse
node_sockstat_FRAG_memory
node_sockstat_RAW_inuse
node_sockstat_TCP_alloc
node_sockstat_TCP_inuse
node_sockstat_TCP_mem
node_sockstat_TCP_mem_bytes
node_sockstat_TCP_orphan
node_sockstat_TCP_tw
node_sockstat_UDPLITE_inuse
node_sockstat_UDP_inuse
node_sockstat_UDP_mem
node_sockstat_UDP_mem_bytes
node_sockstat_sockets_used
node_textfile_scrape_error
node_time_seconds
node_timex_estimated_error_seconds
node_timex_frequency_adjustment_ratio
node_timex_loop_time_constant
node_timex_maxerror_seconds
node_timex_offset_seconds
node_timex_pps_calibration_total
node_timex_pps_error_total
node_timex_pps_frequency_hertz
node_timex_pps_jitter_seconds
node_timex_pps_jitter_total
node_timex_pps_shift_seconds
node_timex_pps_stability_exceeded_total
node_timex_pps_stability_hertz
node_timex_status
node_timex_sync_status
node_timex_tai_offset_seconds
node_timex_tick_seconds
node_uname_info
node_vmstat_pgfault
node_vmstat_pgmajfault
node_vmstat_pgpgin
node_vmstat_pgpgout
node_vmstat_pswpin
node_vmstat_pswpout
node_xfs_allocation_btree_compares_total
node_xfs_allocation_btree_lookups_total
node_xfs_allocation_btree_records_deleted_total
node_xfs_allocation_btree_records_inserted_total
node_xfs_block_map_btree_compares_total
node_xfs_block_map_btree_lookups_total
node_xfs_block_map_btree_records_deleted_total
node_xfs_block_map_btree_records_inserted_total
node_xfs_block_mapping_extent_list_compares_total
node_xfs_block_mapping_extent_list_deletions_total
node_xfs_block_mapping_extent_list_insertions_total
node_xfs_block_mapping_extent_list_lookups_total
node_xfs_block_mapping_reads_total
node_xfs_block_mapping_unmaps_total
node_xfs_block_mapping_writes_total
node_xfs_extent_allocation_blocks_allocated_total
node_xfs_extent_allocation_blocks_freed_total
node_xfs_extent_allocation_extents_allocated_total
node_xfs_extent_allocation_extents_freed_total
process_cpu_seconds_total
process_max_fds
process_open_fds
process_resident_memory_bytes
process_start_time_seconds
process_virtual_memory_bytes
process_virtual_memory_max_bytes
prometheus_api_remote_read_queries
prometheus_build_info
prometheus_config_last_reload_success_timestamp_seconds
prometheus_config_last_reload_successful
prometheus_engine_queries
prometheus_engine_queries_concurrent_max
prometheus_engine_query_duration_seconds
prometheus_engine_query_duration_seconds_count
prometheus_engine_query_duration_seconds_sum
prometheus_engine_query_log_enabled
prometheus_engine_query_log_failures_total
prometheus_http_request_duration_seconds_bucket
prometheus_http_request_duration_seconds_count
prometheus_http_request_duration_seconds_sum
prometheus_http_requests_total
prometheus_http_response_size_bytes_bucket
prometheus_http_response_size_bytes_count
prometheus_http_response_size_bytes_sum
prometheus_notifications_alertmanagers_discovered
prometheus_notifications_dropped_total
prometheus_notifications_queue_capacity
prometheus_notifications_queue_length
prometheus_remote_storage_dropped_samples_total
prometheus_remote_storage_enqueue_retries_total
prometheus_remote_storage_failed_samples_total
prometheus_remote_storage_highest_timestamp_in_seconds
prometheus_remote_storage_pending_samples
prometheus_remote_storage_queue_highest_sent_timestamp_seconds
prometheus_remote_storage_remote_read_queries
prometheus_remote_storage_retried_samples_total
prometheus_remote_storage_samples_in_total
prometheus_remote_storage_sent_batch_duration_seconds_bucket
prometheus_remote_storage_sent_batch_duration_seconds_count
prometheus_remote_storage_sent_batch_duration_seconds_sum
prometheus_remote_storage_sent_bytes_total
prometheus_remote_storage_shard_capacity
prometheus_remote_storage_shards
prometheus_remote_storage_shards_desired
prometheus_remote_storage_shards_max
prometheus_remote_storage_shards_min
prometheus_remote_storage_string_interner_zero_reference_releases_total
prometheus_remote_storage_succeeded_samples_total
prometheus_rule_evaluation_duration_seconds_count
prometheus_rule_evaluation_duration_seconds_sum
prometheus_rule_evaluation_failures_total
prometheus_rule_evaluations_total
prometheus_rule_group_duration_seconds_count
prometheus_rule_group_duration_seconds_sum
prometheus_rule_group_iterations_missed_total
prometheus_rule_group_iterations_total
prometheus_sd_consul_rpc_duration_seconds_count
prometheus_sd_consul_rpc_duration_seconds_sum
prometheus_sd_consul_rpc_failures_total
prometheus_sd_discovered_targets
prometheus_sd_dns_lookup_failures_total
prometheus_sd_dns_lookups_total
prometheus_sd_failed_configs
prometheus_sd_file_read_errors_total
prometheus_sd_file_scan_duration_seconds_count
prometheus_sd_file_scan_duration_seconds_sum
prometheus_sd_kubernetes_events_total
prometheus_sd_received_updates_total
prometheus_sd_updates_total
prometheus_target_interval_length_seconds
prometheus_target_interval_length_seconds_count
prometheus_target_interval_length_seconds_sum
prometheus_target_metadata_cache_bytes
prometheus_target_metadata_cache_entries
prometheus_target_scrape_pool_reloads_failed_total
prometheus_target_scrape_pool_reloads_total
prometheus_target_scrape_pool_sync_total
prometheus_target_scrape_pools_failed_total
prometheus_target_scrape_pools_total
prometheus_target_scrapes_cache_flush_forced_total
prometheus_target_scrapes_exceeded_sample_limit_total
prometheus_target_scrapes_sample_duplicate_timestamp_total
prometheus_target_scrapes_sample_out_of_bounds_total
prometheus_target_scrapes_sample_out_of_order_total
prometheus_target_sync_length_seconds
prometheus_target_sync_length_seconds_count
prometheus_target_sync_length_seconds_sum
prometheus_template_text_expansion_failures_total
prometheus_template_text_expansions_total
prometheus_treecache_watcher_goroutines
prometheus_treecache_zookeeper_failures_total
prometheus_tsdb_blocks_loaded
prometheus_tsdb_checkpoint_creations_failed_total
prometheus_tsdb_checkpoint_creations_total
prometheus_tsdb_checkpoint_deletions_failed_total
prometheus_tsdb_checkpoint_deletions_total
prometheus_tsdb_compaction_chunk_range_seconds_bucket
prometheus_tsdb_compaction_chunk_range_seconds_count
prometheus_tsdb_compaction_chunk_range_seconds_sum
prometheus_tsdb_compaction_chunk_samples_bucket
prometheus_tsdb_compaction_chunk_samples_count
prometheus_tsdb_compaction_chunk_samples_sum
prometheus_tsdb_compaction_chunk_size_bytes_bucket
prometheus_tsdb_compaction_chunk_size_bytes_count
prometheus_tsdb_compaction_chunk_size_bytes_sum
prometheus_tsdb_compaction_duration_seconds_bucket
prometheus_tsdb_compaction_duration_seconds_count
prometheus_tsdb_compaction_duration_seconds_sum
prometheus_tsdb_compaction_populating_block
prometheus_tsdb_compactions_failed_total
prometheus_tsdb_compactions_skipped_total
prometheus_tsdb_compactions_total
prometheus_tsdb_compactions_triggered_total
prometheus_tsdb_head_active_appenders
prometheus_tsdb_head_chunks
prometheus_tsdb_head_chunks_created_total
prometheus_tsdb_head_chunks_removed_total
prometheus_tsdb_head_gc_duration_seconds_count
prometheus_tsdb_head_gc_duration_seconds_sum
prometheus_tsdb_head_max_time
prometheus_tsdb_head_max_time_seconds
prometheus_tsdb_head_min_time
prometheus_tsdb_head_min_time_seconds
prometheus_tsdb_head_samples_appended_total
prometheus_tsdb_head_series
prometheus_tsdb_head_series_created_total
prometheus_tsdb_head_series_not_found_total
prometheus_tsdb_head_series_removed_total
prometheus_tsdb_head_truncations_failed_total
prometheus_tsdb_head_truncations_total
prometheus_tsdb_isolation_high_watermark
prometheus_tsdb_isolation_low_watermark
prometheus_tsdb_lowest_timestamp
prometheus_tsdb_lowest_timestamp_seconds
prometheus_tsdb_reloads_failures_total
prometheus_tsdb_reloads_total
prometheus_tsdb_retention_limit_bytes
prometheus_tsdb_size_retentions_total
prometheus_tsdb_storage_blocks_bytes
prometheus_tsdb_symbol_table_size_bytes
prometheus_tsdb_time_retentions_total
prometheus_tsdb_tombstone_cleanup_seconds_bucket
prometheus_tsdb_tombstone_cleanup_seconds_count
prometheus_tsdb_tombstone_cleanup_seconds_sum
prometheus_tsdb_vertical_compactions_total
prometheus_tsdb_wal_completed_pages_total
prometheus_tsdb_wal_corruptions_total
prometheus_tsdb_wal_fsync_duration_seconds
prometheus_tsdb_wal_fsync_duration_seconds_count
prometheus_tsdb_wal_fsync_duration_seconds_sum
prometheus_tsdb_wal_page_flushes_total
prometheus_tsdb_wal_segment_current
prometheus_tsdb_wal_truncate_duration_seconds_count
prometheus_tsdb_wal_truncate_duration_seconds_sum
prometheus_tsdb_wal_truncations_failed_total
prometheus_tsdb_wal_truncations_total
prometheus_tsdb_wal_writes_failed_total
prometheus_wal_watcher_current_segment
prometheus_wal_watcher_record_decode_failures_total
prometheus_wal_watcher_records_read_total
prometheus_wal_watcher_samples_sent_pre_tailing_total
promhttp_metric_handler_requests_in_flight
promhttp_metric_handler_requests_total
scrape_duration_seconds
scrape_samples_post_metric_relabeling
scrape_samples_scraped
scrape_series_added
up
>
如果你的influxdb數據庫有數據那麼接下來就可以安裝grafana了。
5. grafana
grafana 是一款採用 go 語言編寫的開源應用,主要用於大規模指標數據的可視化展現,是網絡架構和應用分析中最流行的時序數據展示工具,
官方下載地址:https://grafana.com/grafana/download
下載安裝包:
wget https://dl.grafana.com/oss/release/grafana-6.7.2-1.x86_64.rpm
安裝grafana
yum localinstall -y grafana-6.7.2-1.x86_64.rpm
啓動服務並添加開機啓動
systemctl start grafana-server
systemctl enablegrafana-server.service
瀏覽器訪問3000端口如下:
接下來需配置一下grafana的prometheus數據源
這裏我已經進行了配置如下:
然後我們在配置influxdb
選擇influxdb並設置連接屬性如下:
接下來我們訪問 https://grafana.com/grafana/dashboards下載自己需要的dashborad模版如下所示:
我們這裏點擊上圖 Node Exporter for Prometheus Dashboard CN v20191102
並下載這個模版
接下來我們需要導入此模版詳細步驟截圖如下所示:
選擇上傳json文件如下所示:
然後這裏我們使用剛剛下載的模版json文件如下所示:
模版導入後可以查看grafana模版效果如下所示:
到這裏一個完整的監控平臺已經全部搭建成功。