大数据只是汇总

所有架构的介绍
过往记忆：https://www.iteblog.com/archives/2607.html

防火墙
1.查看防火墙状态：
firewall-cmd --state
2.启动防火墙
systemctl start firewalld
3.关闭防火墙
systemctl stop firewalld
4.检查防火墙开放的端口
firewall-cmd --permanent --zone=public --list-ports
5.开放一个新的端口
firewall-cmd --zone=public --add-port=8080/tcp --permanent
6.重启防火墙
firewall-cmd --reload
7.验证新增加端口是否生效
firewall-cmd --zone=public --query-port=8080/tcp
8.防火墙开机自启动
systemctl enable firewalld.service
9.防火墙取消某一开放端口
firewall-cmd --zone=public --remove-port=9200/tcp --permanent
yum源
https://www.jianshu.com/p/0f09204cb9de
这个是数据源
http://192.168.9.124:8081/repository/yum-group/

maven
https://www.cnblogs.com/jtnote/p/9982185.html

google安装
https://www.jianshu.com/p/39d0b8f578d9

分发脚本
#!/bin/bash
#1 获取输入参数个数，如果没有参数直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi
#2 获取文件名称
p1=$1
fname=$(basename $p1)
echo fname=$fname
#3 获取上级目录到绝对路径
pdir=$(cd -P $(dirname $p1); pwd)
echo pdir=$pdir
#4 获取当前用户名称
user=$(whoami)
#5 循环 ip 103 到 105
for((host=128; host<131; host++)); do
echo ------------------------------- hadoop$host --------------------------
rsync -rvl $pdir/$fname $user@hadoop$host:$pdir
done

ArrayBlockQueue参考模板
DemoOne
package com.mongo.until;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;

public class Test {
public static int i = 0;
public static int q = 0;
static ArrayBlockingQueue<String> arrayBlockingQueue = new ArrayBlockingQueue<String>(10, true);
// 有界队列
static BlockingQueue<Runnable> workQueue = new ArrayBlockingQueue<Runnable>(10, true);
// 放弃拒绝的任务并抛出异常
static RejectedExecutionHandler discardPolicyHandler = new ThreadPoolExecutor.DiscardPolicy();
static ThreadPoolExecutor threadPool = new ThreadPoolExecutor(5, 10, 60, TimeUnit.SECONDS, workQueue, discardPolicyHandler);

public static void main(String[] args) {
long start = System.currentTimeMillis();
System.out.println("增加值之前" + workQueue.size());
for (int i = 0; i < 10; i++) {
//每次执行这个方法
threadPool.execute(new MyTask2());
System.out.println("核心线程数" + threadPool.getCorePoolSize());
System.out.println("线程池数" + threadPool.getPoolSize());
System.out.println("队列任务数" + threadPool.getQueue().size());
}
System.out.println("增加值之后" + workQueue.size());
System.out.println(arrayBlockingQueue.toString());
System.out.println(System.currentTimeMillis() - start);//执行时常
threadPool.shutdown();
try {
//awaitTermination()：用于等待子线程结束，再继续执行下面的代码。该例中我设置一直等着子线程结束。
if (threadPool.awaitTermination(6, TimeUnit.SECONDS)) {
threadPool.shutdownNow();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}

static class MyTask2 implements Runnable {
public static int i = 0;

ArrayList<String> list = new ArrayList<>();

@Override
public void run() {
try {
arrayBlockingQueue.add(str());
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + ", hello");
}

public static String str() {
String str = "这是测试数据";
return str;
}
}
}

DemoTwo
public class ArrayBlockQueueMain {

static ArrayBlockingQueue<List<Document>> arrayBlockingQueue = new ArrayBlockingQueue<List<Document>>(1200, true);
//创建线程池take
static ExecutorService executor = Executors.newFixedThreadPool(16);
//创建线程池put
static ExecutorService executorPool = Executors.newFixedThreadPool(30);
//创建数据
static DataUntil dataUntil = new DataUntil();
static MongoDBUntil mongoDBUntil = new MongoDBUntil();

public static void main(String[] args) {
mongoDBUntil.createOnce();
long start = System.currentTimeMillis();
System.out.println("操作开始"+Common.getData());
for (int i = 0; i < 1200; i++) {
//每次执行这个方法
executorPool.submit(() -> {
arrayBlockingQueue.add(dataUntil.createData());
});
}
System.out.println("取值开始："+arrayBlockingQueue.size());
for (int i = 0; i < 1200; i++) {
executor.submit(() -> {
List<Document> documentList = null;
try {
documentList = arrayBlockingQueue.take();
mongoDBUntil.addCllection(documentList);
} catch (Exception e) {
e.printStackTrace();
}
});
}
executorPool.shutdown();
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
mongoDBUntil.mgdbClose();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("所有的子线程都结束了！"+(System.currentTimeMillis() - start) + " ms");
}
}

反射
//单纯的进行造数据
public List<Document> createData(){
System.out.println("thread id is: " + Thread.currentThread().getId()+"造数据开始时间"+Common.getData());
//创建要批量传数据的集合
//List<ProgramLogsPO> list = new ArrayList<ProgramLogsPO>();
List<Document> documents = new ArrayList<>();
//创建节目库-日志表
//ProgramLogsPO programLogsPO = new ProgramLogsPO();
Document document = new Document();
for (int i = 0; i < 5000; i++) {
//Common.increment();
//document.put("_id",Thread.currentThread().getId()+"_"+ Common.getDataStr()+"_"+ Common.countNum.getAndIncrement());
Document tmp = new Document();
tmp.putAll(document);
document = tmp;
if(i==1) {
//通过反射获取类中的所有
Field[] declaredFields = ProgramLogsPO.class.getDeclaredFields();
for (Field field : declaredFields) {
//可访问私有变量
field.setAccessible(true);
// 获取属性类型
String type = field.getGenericType().toString();
Method method = null;
//获取字段名称
String name = field.getName();
if (name.equals("c_createtime")) {
long createtime = System.currentTimeMillis();
document.append(name, createtime);
continue;
}
// 将属性的首字母大写
//name = name.replaceFirst(name.substring(0, 1), name.substring(0, 1).toUpperCase());
if (type.equals("class java.lang.String")) {
//method = programLogsPO.getClass().getMethod("set" + name, String.class);
//给对象的这个属性赋值 String 此处赋值
//method.invoke(programLogsPO, Common.createUUID());
document.append(name, Common.createUUID());
continue;
}
if (type.equals("int")) {
//method = programLogsPO.getClass().getMethod("set" + name, int.class);
//给对象的这个属性赋值 String 此处赋值
//method.invoke(programLogsPO, Common.createInt());
document.append(name, Common.createInt());
continue;
}
if (type.equals("float")) {
//method = programLogsPO.getClass().getMethod("set" + name, float.class);
//给对象的这个属性赋值 String 此处赋值
//method.invoke(programLogsPO, Common.createFloat());
document.append(name, Common.createFloat());
continue;
}
if (type.equals("long")) {
//method = programLogsPO.getClass().getMethod("set" + name, long.class);
//给对象的这个属性赋值 String 此处赋值
//method.invoke(programLogsPO, Common.createLong());
document.append(name, Common.createLong());
continue;
}
if (type.equals("java.util.List<java.lang.String>")) {
//method = programLogsPO.getClass().getMethod("set" + name, List.class);
//给对象的这个属性赋值 String 此处赋值
//method.invoke(programLogsPO, Common.createListUUID());
document.append(name, Common.createListUUID());
continue;
}
if (type.equals("java.util.List<java.lang.Integer>")) {
//method = programLogsPO.getClass().getMethod("set" + name, List.class);
//给对象的这个属性赋值 String 此处赋值
//method.invoke(programLogsPO, Common.createListInt());
document.append(name, Common.createListInt());
continue;
}
}
}
//list.add(programLogsPO);
documents.add(document);
}
System.out.println("thread id is: " + Thread.currentThread().getId()+"造数据结束时间"+Common.getData());
System.out.println("执行数量为:"+count.getAndIncrement());
return documents;
}

AtomicInteger原子变量
DemoOne
//int 原子变量
AtomicInteger count = new AtomicInteger();

DemoTwo

public static AtomicInteger countNum = new AtomicInteger(0);

public static void increment() {
countNum.getAndIncrement();
}

mongo
集群安装教程
复制集
https://www.jianshu.com/p/dbffdb466534

分片
https://blog.csdn.net/liver_life/article/details/100562949
开始和停止
//开始
[root@master mongodb-4.2.2]# bin/mongod -f mongodb.conf

//停止
> use admin;
switched to db admin
> db.shutdownServer();
server should be down...
选举机制
原网址:https://www.cnblogs.com/zyfd/p/9811528.html
Primary（主）是MongoDB复制集中的最重要的角色，是能够接受客户端/Driver写请求的节点，（读请求也是默认路由到Primary节点）。在复制集中，与Primary相对应的有Secondary节点和Arbiter节点，分别表示从节点（可以接受读请求）和投票节点（仅用于投票选出新的Primary）。复制集是MongoDB的高可用框架，同时可以作为业务读写分离的一种方式。复制集提供了自动故障处理功能（当然还有其他功能，本文不展开），能够自动检测Primary节点是否宕机，进而选取新的Primary节点，并通过数据回追或数据回滚等方式实现复制集中数据一致。本文借助蜂巢MongoDB云服务的运行日志查看功能，来简要介绍Primary的选举过程。
MongoDB提供了强大的SystemLog模块，相比MySQL，MongoDB的运行日志模块做得更为贴心，通过日志能够有效跟踪MongoDB内部是如何进行一个个操作的。下面的图都截取自蜂巢MongoDB云服务的运行日志模块，从中能够看到了一串的MongoDB选主日志，非常清晰明了。

1、什么时候会发起选举？
图中所示，该节点（我）发现在过去的10s中时间内，复制集中没有Primary，
那么我怎么知道这段时间没有主呢，因为我每2s会给复制集中的其他节点发送心跳，

有些节点不回我

在超时时间内（默认10s）我会一直发。
除了心跳，我还会发送其他的命令，另外我还需要跟着Primary的opLog做复制，但是我发现没法再跟他做复制了，也找不到其他节点做复制

既然没有Primary。。。
2、我能不能被选为Primary呢？
我先试探性的问大家愿不愿意让我当Primary。于是我打算先发起 “dry election”，让人惊喜的是另一个节点竟然同意了，开心：）。由于复制集中一共3个节点。除了自己外另一个节点也同意了，那么我就有资格当Primary；注意此时term 没有更新，还是0（看第一个图~~）。因为这个是非正式选举

3、既然这样，那我就发起正式选举吧
结果当然是十拿九稳了，那么为什么要先有dry呢，为了保证选举成功率，相比正式选举，dry阶段检查的东西少，效率更高些。此时term已经自豪地更新为1。

4、我果然被大家选为Primary
一切尽在掌握中的感觉真爽！！
5、那我就把自己的角色切换为Primary呗
等等，这个时候我还不能马上接受客户端的写请求，因为我得看看自己的数据是不是最新的，怎么办呢，oplog里面的optime。看看大家的状态（数据新旧情况）

我等大家回复我：

好了，节点202回我了（他把他自己的rs.status()发给我，看看在他的世界里这个复制集是什么情况），（200连不上），从这些信息我可以知道，我的数据是最新的。而且我从202知道200确实挂了。
6、既然我的数据是最新的，那么我就不需要从其他节点拷贝数据了
这里跟raft不一样，从raft的论文中，可以确定raft选为primary是必须要求数据最新的。但MongoDB选出的Primary，数据不一定要最新，只需要满足一个约定条件即可（oplog落后10s以内）。如果数据落后集群中的某个/些存活节点（这个情况一般出现在当前节点的priority比拥有更新数据的节点高的时候），在我对外提供写服务前，我先把这些数据从其他节点从抓过来，应用到我自己这里。但是我这个是有原则的，我不会那么贪婪，给我2s（catchUpTimeoutMillis）就好了。我能追上多少就追多少。如果时间到了，我还没有完全追上咋办呢，那也没有办法，让这些节点把没追上的数据回滚掉好了。
7、现在我的数据是最新的了，我开始作为Primary对外提供写服务。你们把写请求发过来吧~~~
也就是说，并不是成为Primary后马上就会提供写服务，而是会有个追数据的过程。我觉得这个特性如果大家么有正确理解，很容易出现问题。比如用户设置了writeconcern是majority，在主从切换的场景下，可能还未写到大多数节点的请求因为主挂了返回失败，但其实这个数据会被持久化到新主上。而严格的raft不会出现这个情况。
以上用第一张图大体介绍了选举过程。然后每一点的仔细介绍时，我将MongoDB的SystemLog级别通过db.setLogLevel()从0设置为2，重演了一遍选举。让大家看到更多的细节。
最后安利下，网易蜂巢MongoDB云服务已经重磅上线，蜂巢MongoDB由业界著名的数据库专家姜承尧亲自把关架构设计，免费提供售前技术支持。要知道姜大神的出台费可是业界最贵的：），欢迎大家注册试用。有任何意见和建议，请随时提出。
配置文件
https://www.cnblogs.com/danhuangpai/p/10571158.html
理论压缩
MongoDB3.0引入WiredTiger，支持压缩一个新的存储引擎。 WiredTiger使用页面管理磁盘I / O。每个页面都包含很多BSON文件。页面被写入磁盘时就被默认压缩，当在磁盘中被读入高速缓存时它们就被解压。

压缩的基本概念之一是重复值确切的值以及形式可以一次以压缩的格式被存储，减少了空间的总量。较大的数据单元倾向于更有效地压缩，因为有更多重复值。通过在页面级别压缩通常称为数据块压缩 WiredTiger可以更有效地压缩数据。

WiredTiger支持多种压缩库。你可以决定哪个选项是最适合你的集合水平。这是一个重要的选择，你的访问模式和数据可能在集合间大不相同。例如，如果你使用GridFS的存储较大的文件，如图片和视频，MongoDB自动把大文件分成许多较小的“块”，当需要的时候再重新组合。 GridFS的实施维护两个集合：fs.files，包含大文件的元数据和其相关的块；以及fs.chunks，包含大数据分成的255KB的块。对于图像和视频，压缩可能会有益于fs.files集合，但在fs.chunks的数据可能已经被压缩，因此它对于这个集合可能需要禁用压缩。

MongoDB3.0中的压缩选项
在MongoDB 3.0中，WiredTiger为集合提供三个压缩选项：
无压缩
Snappy（默认启用）很不错的压缩，有效利用资源
zlib（类似gzip）出色的压缩，但需要占用更多资源

有索引的两个压缩选项：
无压缩
前缀（默认启用）良好的压缩，资源的有效利用
添加索引
db.test.ensureIndex({"c_name":1})//单一添加索引
db.user.ensureIndex({"name":1},{"name":"IX_name"})//联合索引
db.user.ensureIndex({"name":1},{"unique":true})//唯一索引

查询所有的库
show dbs

进入对应的数据库
use mgdb

删除数据库
db.dropDatabase

查看库里的所有表
show tables

删除test表中的所有数据
db.test.remove({})

查看test表中的数据
db.test.find().pretty()

磁盘IO进行测试
dd 命令
我们可以利用 dd 命令的复制功能，测试某个磁盘的 IO 性能，须要注意的是 dd 命令只能大致测出磁盘的 IO 性能，不是非常准确。

测试写性能命令：
[[email protected] var ]$ time dd if=/dev/zero of=test.file bs=1G count=2 oflag=direct
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 13.5487 s, 159 MB/s
real 0m13.556s
user 0m0.000s
sys 0m0.888s
可以看到，该分区磁盘写入速率为 159M/s，其中：
/dev/zero 伪设备，会产生空字符流，对它不会产生 IO 。
if 参数用来指定 dd 命令读取的文件。
of 参数用来指定 dd 命令写入的文件。
bs 参数代表每次写入的块的大小。
count 参数用来指定写入的块的个数。
offlag=direc 参数测试 IO 时必须指定，代表直接写如磁盘，不使用 cache 。

测试读性能命令：
[[email protected] var ]$ dd if=test.file of=/dev/null iflag=direct
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB) copied, 4.87976 s, 440 MB/s
可以看到，该分区的读取速率为 440MB/s

sar 命令
sar 命令是分析系统瓶颈的神器，可以用来查看 CPU 、内存、磁盘、网络等性能。

sar 命令查看当前磁盘性能的命令为：

复制代码
[[email protected] var ]$ sar -d -p 1 2
Linux 3.10.0-693.5.2.el7.x86_64 (server-68) 03/11/2019 _x86_64_ (64 CPU)

02:28:54 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
02:28:55 PM sda 1.00 0.00 3.00 3.00 0.01 9.00 9.00 0.90
02:28:55 PM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:28:55 PM polex_pv-rootvol 1.00 0.00 3.00 3.00 0.01 9.00 9.00 0.90
02:28:55 PM polex_pv-varvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:28:55 PM polex_pv-homevol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

02:28:55 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
02:28:56 PM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:28:56 PM sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:28:56 PM polex_pv-rootvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:28:56 PM polex_pv-varvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:28:56 PM polex_pv-homevol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Average: DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
Average: sda 0.50 0.00 1.50 3.00 0.00 9.00 9.00 0.45
Average: sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: polex_pv-rootvol 0.50 0.00 1.50 3.00 0.00 9.00 9.00 0.45
Average: polex_pv-varvol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: polex_pv-homevol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
复制代码
其中， “-d”参数代表查看磁盘性能，“-p”参数代表将 dev 设备按照 sda，sdb……名称显示，“1”代表每隔1s采取一次数值，“2”代表总共采取2次数值。

await：平均每次设备 I/O 操作的等待时间（以毫秒为单位）。

svctm：平均每次设备 I/O 操作的服务时间（以毫秒为单位）。

%util：一秒中有百分之几的时间用于 I/O 操作。

对于磁盘 IO 性能，一般有如下评判标准：

正常情况下 svctm 应该是小于 await 值的，而 svctm 的大小和磁盘性能有关，CPU 、内存的负荷也会对 svctm 值造成影响，过多的请求也会间接的导致 svctm 值的增加。

await 值的大小一般取决与 svctm 的值和 I/O 队列长度以及I/O 请求模式，如果 svctm 的值与 await 很接近，表示几乎没有 I/O 等待，磁盘性能很好，如果 await 的值远高于 svctm 的值，则表示 I/O 队列等待太长，系统上运行的应用程序将变慢，此时可以通过更换更快的硬盘来解决问题。

%util 项的值也是衡量磁盘 I/O 的一个重要指标，如果 %util 接近 100% ，表示磁盘产生的 I/O 请求太多，I/O 系统已经满负荷的在工作，该磁盘可能存在瓶颈。长期下去，势必影响系统的性能，可以通过优化程序或者通过更换更高、更快的磁盘来解决此问题。

ES
https://blog.csdn.net/liuyanglglg/article/details/94367160

ElasticSearch设置用户名密码访问
https://www.cnblogs.com/snail90/p/11444393.html
#做用户管理的配置
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true

elasticsearch-head 设置用户名密码访问

https://blog.csdn.net/vah101/article/details/81335951
#elasticsearch-head有密码下可以进行查看
http.cors.allow-headers: Authorization

查看url为：http://192.168.121.128:9100/?auth_user=elastic&auth_password=123456
安装和部署
单节点安装和部署
创建ela用户并分组
   groupadd ela
   useradd -f ela eal
   passwd ela
填入密码
安装
安装前准备
[root@ localhost elasticsearch-6.1.1]# cp /etc/security/limits.conf /etc/security/limits.conf.bak

[root@ localhost elasticsearch-6.1.1]# cp /etc/sysctl.conf /etc/sysctl.conf.bak
vi /etc/security/limits.conf
在文件末尾添加如下内容:
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096

vi /etc/sysctl.conf
添加下面配置：
vm.max_map_count=655360

并执行命令sysctl -p让其修改生效：
[root@ localhost elasticsearch-6.1.1]# sysctl -p
vm.max_map_count = 655360

开始安装
tar -zxvf elasticsearch-7.5.1-linux-x86_64.tar.gz -C /opt/module/
[root@localhost moudle]# cd elasticsearch-7.5.1/config
[root@node2 config]# cat elasticsearch.yml
#配置ES集群名字
cluster.name: ela
#配置节点名字
node.name: hadoop130
#配置数据路径
path.data: /opt/moudle/elasticsearch-7.5.1/data
#配置日志路径
path.logs: /opt/moudle/elasticsearch-7.5.1/logs
#开启内存锁定
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#允许外网访问#配置为本地ip，监听主机
network.host: 0.0.0.0
. #设置http访问端口号
http.port: 9200
. #设置tcp访问端口号
transport.tcp.port: 9300
discovery.seed_hosts: ["hadoop130"]
cluster.initial_master_nodes: ["hadoop130"]
http.cors.enabled: true
http.cors.allow-origin: "*"

启动并验证
[root@localhost elasticsearch-7.5.1]# bin/elasticsearch
curl http://192.168.9.130:9200
{
"name" : "node-1",
"cluster_name" : "ela",
"cluster_uuid" : "gv8DzvZeR1W-aOM_UTEdVA",
"version" : {
"number" : "6.1.1",
"build_hash" : "bd92e7f",
"build_date" : "2017-12-17T20:23:25.338Z",
"build_snapshot" : false,
"lucene_version" : "7.1.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
浏览器查询验证：

分布式安装和部署
ES版本：7.5.1
服务器三台
192.168.9.130
192.168.9.131
192.168.9.162

前提:需要安装java

部署ES集群，三台机器同样的操作
1、添加普通用户启动es
useradd 用户名

2、安装ES
1)解压缩
tar xf elasticsearch-7.5.1.tar.gz -C /opt/module/
2)创建文件夹下面配置会指定该路径
mkdir /opt/module/ elasticsearch-7.5.1/data
mkdir /opt/module/ elasticsearch-7.5.1/logs
3、配置elasticsearch.yml文件内容

cp /opt/module/ elasticsearch-7.5.1/config/elasticsearch.yml /opt/module/ elasticsearch-7.5.1/config/elasticsearch.yml.bak

vim elasticsearch.yml

cluster.name: ela
node.name: hadoop130
node.master: true
node.data: true
path.data: /opt/module/ elasticsearch-7.5.1/data
path.logs: /opt/module/ elasticsearch-7.5.1/logs
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
http.port: 9200
transport.tcp.port: 9300
network.host: 0.0.0.0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping_timeout: 3s
discovery.zen.ping.unicast.hosts: ["192.168.9.130:9300","192.168.9.131:9300","192.168.9.162:9300"]
三台机器不一样的配置点如下

node.name: hadoop130 ===》192.168.9.130
node.name: hadoop131 ===》192.168.9.131
node.name: hadoop162 ===》192.168.9.162
4、配置文件重点参数解析
（1）cluster.name
集群名字，三台集群的集群名字都必须一致

（2）node.name
节点名字，三台ES节点字都必须不一样

（3）discovery.zen.minimum_master_nodes:2
表示集群最少的master数，如果集群的最少master数据少于指定的数，将无法启动，官方推荐node master数设置为集群数/2+1，我这里三台ES服务器，配置最少需要两台master，整个集群才可正常运行，

（4）node.master该节点是否有资格选举为master，如果上面设了两个mater_node 2，也就是最少两个master节点，则集群中必须有两台es服务器的配置为node.master: true的配置，配置了2个节点的话，如果主服务器宕机，整个集群会不可用，所以三台服务器，需要配置3个node.masdter为true,这样三个master，宕了一个主节点的话，他又会选举新的master，还有两个节点可以用，只要配了node master为true的ES服务器数正在运行的数量不少于master_node的配置数，则整个集群继续可用，我这里则配置三台es node.master都为true，也就是三个master，master服务器主要管理集群状态，负责元数据处理，比如索引增加删除分片分配等，数据存储和查询都不会走主节点，压力较小，jvm内存可分配较低一点

（5）node.data
存储索引数据，三台都设为true即可

（6）bootstrap.memory_lock: false
不锁住物理内存，使用swap内存，有swap内存的可以开启此项

（7）discovery.zen.ping_timeout: 3000s
自动发现拼其他节点超时时间

（8）discovery.zen.ping.unicast.hosts: ["192.168.9.130:9300","192.168.9.131:9300","192.168.9.162:9300"]
设置集群的初始节点列表，集群互通端口为9300

5、jvm调优
vim /data/elasticsearch/config/jvm.options

-Xms1g 修改为 ===> -Xms2g
-Xmx1g 修改为 ===> -Xmx2g
设置为物理内存一半最佳，可根据服务器内存去选择调

6、设置权限
chown -R ysj: /opt/module/ elasticsearch-7.5.1

7、操作系统调优（必须配置，否则ES起不来）
【1】内存优化
在/etc/sysctl.conf添加如下内容

fs.file-max=655360
vm.max_map_count=655360
sysctl -p生效

解释：
（1）vm.max_map_count=655360
系统最大打开文件描述符数

（2）vm.max_map_count=655360
限制一个进程拥有虚拟内存区域的大小

【2】修改vim /etc/security/limits.conf

* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited
解释:
(nofile)最大开打开文件描述符
(nproc)最大用户进程数
(memlock)最大锁定内存地址空间

【3】修改/etc/security/limits.d/90-nproc.conf
将1024修改为65536

* soft nproc 1024 修改前
* soft nproc 65536 修改后

启动es:bin/ elasticsearch

查看集群状态:192.168.9.*:9200/?pretty

注意事项:
1. 集群必须关闭防火墙
2. 启动命令必须由非root用户启动

Kibana
安装和部署
https://blog.csdn.net/zou79189747/article/details/81118915
https://blog.csdn.net/zou79189747/article/details/80111219
https://blog.csdn.net/niuchenliang524/article/details/82868221
https://www.jianshu.com/p/4d65ed957e62

增删改查
https://blog.csdn.net/baidu_24545901/article/details/79031291

中文分词器插件
https://github.com/medcl/elasticsearch-analysis-ik/releases
https://www.cnblogs.com/zzming/p/11733378.html

JAVA对应的API
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.5/_changing_the_client_8217_s_initialization_code.html

大数据只是汇总

Python 潮流周刊#52：Python 处理 Excel 的资源

shell簡單學習（一）

編寫mysql虛擬機主從複製

systemctl status systemd-logind.service

JUC(包含1.8新特性)

JAVA JVM+GC

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結