Hadoop集羣問題集

1、bigdata is not allowed to impersonate xxx

原因：用戶代理未生效。檢查core-site.xml文件是否正確配置。

<property>
  <name>hadoop.proxyuser.bigdata.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.bigdata.groups</name>
 <value>*</value>
</property>

備註hadoop.proxyuser.XXX.hosts 與 hadoop.proxyuser.XXX.groups 中XXX爲異常信息中User:* 中的用戶名部分

<property> 
    <name>hadoop.proxyuser.bigdata.hosts</name> 
    <value>*</value> 
    <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property> 
<property> 
    <name>hadoop.proxyuser.bigdata.groups</name> 
    <value>*</value> 
    <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
</property>

增加以上配置後，無需重啓集羣，可以直接在namenode節點上使用管理員賬號重新加載這兩個屬性值，命令爲：

$ hdfs dfsadmin -refreshSuperUserGroupsConfiguration
Refresh super user groups configuration successful

$ yarn rmadmin -refreshSuperUserGroupsConfiguration 
19/01/16 15:02:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8033

如果集羣配置了HA，執行如下命令namenode節點全部重新加載：

# hadoop dfsadmin -fs hdfs://ns -refreshSuperUserGroupsConfiguration
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Refresh super user groups configuration successful for master/192.168.99.219:9000
Refresh super user groups configuration successful for node01/192.168.99.173:9000

2、org.apache.hadoop.hbase.exceptions.ConnectionClosingException

現象：使用beeline、jdbc、python調用hiveserver2時，無法查詢、建表等Hbase關聯表，

  <property>
        <name>hive.server2.enable.doAs</name>
        <value>false</value>
        <description>
      Setting this property to true will have HiveServer2 execute
      Hive operations as the user making the calls to it.
        </description>
  </property>

在hive創建Hbase關聯表

# Hive中的表名test_tb
CREATE TABLE test_tb(key int, value string) 
# 指定存儲處理器
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
# 聲明列族,列名
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") 
# hbase.table.name聲明HBase表名,爲可選屬性默認與Hive的表名相同
# hbase.mapred.output.outputtable指定插入數據時寫入的表,如果以後需要往該表插入數據就需要指定該值
TBLPROPERTIES ("hbase.table.name" = "test_tb", "hbase.mapred.output.outputtable" = "test_tb");

Spark work目錄定時清理

使用spark standalone模式執行任務，沒提交一次任務，在每個節點work目錄下都會生成一個文件夾，命名規則app-xxxxxxx-xxxx。該文件夾下是任務提交時，各節點從主節點下載的程序所需要的資源文件。這些目錄每次執行都會生成，且不會自動清理，執行任務過多會將內存撐爆。

每一個application的目錄中都是該spark任務運行所需要的依賴包：

export SPARK_WORKER_OPTS="  
-Dspark.worker.cleanup.enabled=true  # 是否開啓自動清理
-Dspark.worker.cleanup.interval=1800  # 清理週期，每隔多長時間清理一次，單位秒
-Dspark.worker.cleanup.appDataTtl=3600"  # 保留最近多長時間的數據

zookeeper連接數過多導致hbase、hive無法連接

2019-01-25 03:26:41,627 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /172.17.0.1 - max is 60

根據線上環境修改hbase、hive連接Zookeeper配置

```hbase-site.xml
hbase.zookeeper.property.maxClientCnxns


##### hive-site.xml
```shell
hive.server2.thrift.min.worker.threads
hive.server2.thrift.max.worker.threads
hive.zookeeper.session.timeout

zoo.cfg

# Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address
maxClientCnxns=200
# The minimum session timeout in milliseconds that the server will allow the client to negotiate
minSessionTimeout=1000
# The maximum session timeout in milliseconds that the server will allow the client to negotiate
maxSessionTimeout=60000

持續更新....

Hadoop集羣問題集

1、bigdata is not allowed to impersonate xxx

2、org.apache.hadoop.hbase.exceptions.ConnectionClosingException

Spark work目錄定時清理

zookeeper連接數過多導致hbase、hive無法連接

根據線上環境修改hbase、hive連接Zookeeper配置

zoo.cfg

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

Dokcer創建私有倉庫、私有倉庫Web管理

Mysql佔用大量寫I/O

Zabbix自動添加Mysql多實例監控

salt 2016.3.3版本 cp.push拉取minion端文件異常問題

運維管理平臺開發思路

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結