使用 AWS CLI 來快速使用Amazon 提供的 S3、EMR、ES 等服務

安裝 AWS CLI 工具

安裝條件：Python 2 version 2.7+ or Python 3 version 3.4+

安裝 AWS CLI 工具的命令

pip3 install -U --user awscli aws_role_credentials oktaauth
# -U （update）表示更新所有的包到最新
# --user 表示安裝到用戶目錄下，例如 ~/.local
# 如果在國內，網絡很慢，可以在安裝包名前加上 -i https://pypi.tuna.tsinghua.edu.cn/simple	使用清華源加速

# 驗證安裝是否成功
aws --version

官網有通過 aws configure來授權，但也可以通過Okta來獲得cli的授權(不需要跳過)

oktaauth \	
--username [[email protected](replace this)] \
--server yourcompany.okta.com(replace this) \
--apptype amazon_aws \
--appid exxxxaaaWewefw(replace this) | \
aws_role_credentials saml --profile profile_tw(replace this)

官網上的安裝教程

創建 S3(simple store service)

通過 AWS CLI 來創建 S3 存儲

$ aws s3api create-bucket \
     --bucket my-second-emr-bucket \
     --region us-east-2 \
     --create-bucket-configuration LocationConstraint=us-east-2

在開始使用S3命令行之前，可以先熟悉下S3的help命令

參考資料：

簡而言之，就是有兩個接口API， s3api 更底層，能夠爲Dev提供更豐富多樣的開發能力。S3 提供更加易用的封裝好藉口。看情況選用

使用 s3api 創建bucket，其它參數詳見

$ aws s3api create-bucket \
     --bucket your-bucket-name(replace this) \
     --region ap-southeast-1 \
     --create-bucket-configuration LocationConstraint=ap-southeast-1

# --bucket my-second-emr-bucket 創建的Bucket名稱
# --region ap-southeast-1 指定Bucket所分配的的服務器區域
# --create-bucket-configuration 對於 bucket 的一些配置信息以K-V的形式添加

如果使用oktaauth 來授權驗證的，則需要在每次運行命令的時候加上 --profile your-profile-name

使用 aws s3 sync 來上傳文件到S3。（例如，同步需要運行的 Spark Jar 文件）
```
aws s3 sync s3-or-local-source-file-path/ s3:/your-bucket-name/destination
```

創建 EMR 集羣

如何用 AWS CLI 來創建Spark的 EMR集羣

創建EMR Cluster需要使用的Roles，如果Role已經存在則會返回 []
```
aws emr create-default-roles
```

如果想將集羣連入已經存在的EC2子網，則可以增加Subnet選項

aws ec2 describe-subnets \
     --filters "Name=availabilityZone,Values=ap-southeast-1"

創建集羣 Cluster，並且提交Spark 程序

aws emr create-cluster \
  --name your-cluster-name \
  --release-label emr-5.29.0 \
  --instance-type m4.large \
  --instance-count 3 \
  --use-default-roles \
  --applications Name=Spark \
  --log-uri s3://your-bucket-name/logs \
  --steps '[{"Name":"your-project-name","Type":"Spark","Args":["--deploy-mode","cluster","--class","top.ilovestudy.data.GdeltProcessor","--conf","spark.es.nodes.discovery=false","--conf", "spark.es.nodes=https://search-your-data-project-amazon-es-endpoint.ap-southeast-1.es.amazonaws.com","--conf", "spark.es.port=443","s3://your-bucket-name/data-project/libs/processer-0.0.1-all.jar","s3://your-bucket-name/data/update/2020-02-21T00:00:00+00:00/"],"ActionOnFailure":"CONTINUE"}]' \
  --auto-terminate \
  --region ap-southeast-1

創建 EMR 集羣的命令，詳閱create-cluster 。說明下 --steps 內指定的參數。該參數說明，在創建好EMR集羣之後，執行的一系列操作。Value表示的是一個K-V結構的數組。

[
    {
        "Name": "Mirco-Project（replaced this）",
        "Type": "CUSTOM_JAR"|"STREAMING"|"HIVE"|"PIG"|"IMPALA|Spark", // 
        "Args": [
            "--deploy-mode cluster",
            "--master yarn",
            "--class top.ilovestudy.data.GdeltProcessor",
            "--conf spark.es.index.auto.create=true"
        ],
        "Jar": "s3://jinghui-s3/data-project/libs/processer-0.0.1-all.jar",
        "ActionOnFailure":  "TERMINATE_CLUSTER"|"CANCEL_AND_WAIT"|"CONTINUE"
    }
]

從官網上，沒有瀏覽到對於Step參數的一些詳細解釋，通過實驗發現，如果 Type 指定爲 CUSTOM_JAR，則Args參數拼接的結果如下：

hadoop jar your-jar your-args
# 例如
hadoop jar /mnt/var/lib/hadoop/steps/s-3NK85TMMYQGT/processer-0.0.1-all.jar /home/hadoop/spark/bin/spark-submit

如果 Type 指定爲Spark，則執行的真正命令結構如下：（回到了熟悉的spark-submit腳本）

spark-submit your-args

每個操作都可以通過web 界面，aws cli工具、SDK等實現，建議都通過界面親自實現了一遍之後，可以加深理解AWS CLI中的每個參數含義。

使用 Amazon Elasticsearch Service

創建一個 Amazon ES domain。

aws es create-elasticsearch-domain --domain-name your-domain-name --elasticsearch-version 7.1 --elasticsearch-cluster-config InstanceType=t2.small.elasticsearch,InstanceCount=1 --ebs-options EBSEnabled=true,VolumeType=standard,VolumeSize=10 --access-policies '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"*"},"Action":["es:*"],"Condition":{"IpAddress":{"aws:SourceIp":["your_ip_address"]}}}]}'

記得替換，其中的 domain-name，以及 aws:SourceIp 參數中指定的你訪問外網的IP地址（不是你本機的192.x.x.x的地址，可以通過搜索引擎搜索“我的ip地址”查看）。

通過運行 curl https://checkip.amazonaws.com 可以直接查詢獲得你的公共IP。

（初始化很慢，大概10分鐘）查看剛剛新建的ES Domain的狀況

aws es describe-elasticsearch-domain --domain your-domain-name
# 或者列出指定區域內容所有es服務
aws es list-domain-names --region ap-southeast-1

上傳文件到 Amazon ES domain中

curl -XPUT elasticsearch_domain_endpoint/movies/_doc/1 -d '{"director": "Burton, Tim", "genre": ["Comedy","Sci-Fi"], "year": 1996, "actor": ["Jack Nicholson","Pierce Brosnan","Sarah Jessica Parker"], "title": "Mars Attacks!"}' -H 'Content-Type: application/json'

也可以批量上傳文件，例如文件名爲 bulk_movies.json

curl -XPOST elasticsearch_domain_endpoint/_bulk --data-binary @bulk_movies.json -H 'Content-Type: application/json'

從Amazon ES domain中搜尋文件.
```
curl -XGET 'elasticsearch_domain_endpoint/movies/_search?q=mars'
```
Amazon中ES服務配置了一個Kibana插件，可以在Web UI界面點擊使用。
刪除 Amazon ES domain
```
aws es delete-elasticsearch-domain --domain-name movies
```
因爲是按時間收費，所以一定要記得刪除！！一定要記得刪除！！一定要記得刪除！！。如果需要重新恢復ES集羣的化，可以使用提供的快照功能。

對於EC2的一些命令行操作

希望能夠將Airflow部署在EC2的機器上

新建一個 Key-Pair 用來連接EC2

# 創建一個 新的 Key-pair
aws ec2 create-key-pair --key-name MyKeyPair --query 'KeyMaterial' --output text > MyKeyPair.pem
# 展示創建好的 Key-pair
aws ec2 describe-key-pairs --key-name MyKeyPair
# 刪除建好的 Key-pair
aws ec2 delete-key-pair --key-name MyKeyPair

EC2 的安全組

創建一個新的安全組用來控制EC2的輸入和輸出，以下示例顯示如何爲指定的VPC創建安全組。

aws ec2 create-security-group --group-name my-sg --description "My security group" --vpc-id vpc-1a2b3c4d

同樣的可以通過 describe-security-groups 命令來查看初始化信息，只能通過 vpc-id （而不是名字）來查看。vpc-id 會在創建的安全組的時候返回。

aws ec2 describe-security-groups --group-ids sg-903004f8

啓動運行實例

從AMI（Amazon machine Image）中選中一個操作系統模板。指定前面是生成的Key-pair 和安全組 Security-Group。還有一點需要注意的就是，如果你需要綁定VPC，指定了VPC之後可以不用指定Subnet（子網），但是如果沒有指定VPC，則一定需要指定 Subnet。

其它相關參考資料

使用 AWS CLI 來快速使用Amazon 提供的 S3、EMR、ES 等服務

安裝 AWS CLI 工具

創建 S3(simple store service)

創建 EMR 集羣

使用 Amazon Elasticsearch Service

對於EC2的一些命令行操作

EC2 的安全組

啓動運行實例

記一次 .NET某工業設計軟件崩潰分析

創建 Vue3 項目

TS + Webpack 整合 Jest

分享5款.NET開源免費的Redis客戶端組件庫

安卓手機如何登錄抖音境外版

golang開發 gorilla websocket的使用

面試官：如果不允許線程池丟棄任務，應該選擇哪個拒絕策略？

Mac卸載 Node npm，升級 Node

嵌入式汽車電子學習路線

uni.showModel內容換行

推薦一個簡單好用的命令行終端錄製工具

如何在Thymeleaf3標籤中使用嵌套標籤

做了一個自啓動 ssh 服務的 spark單機環境的鏡像（alpine）

Vim編輯器的最常用的用法

Confluent Platform 的快速上手

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結