1、需求背景
在數據庫集成巡檢中發現oracle rac其中有一個節點的內存使用率非常高,而且42core的cpu5分鐘負載已經達到219,167,131,swap free已經爲0了。由於該主機內存比較大230g以上,所以當內存耗盡的時候,會導致頻繁的cpu調度產生,所以導致了上述cpu5分鐘負載也過高的問題。
2、hugepages簡介
HugePages是Linux內核的一項功能,它允許較大的頁面來管理內存,以替代較小的4KB頁面大小。有關詳細介紹,請參見文檔361323.1
如果您有較大的RAM和SGA,則HugePages對於在Linux上提高Oracle數據庫性能至關重要。如果組合的數據庫SGA很大(例如超過8GB,甚至對於較小的數據庫甚至很重要),則需要配置HugePages。請注意,SGA的大小很重要。HugePages的優點是:
- 較大的頁面大小和較少的頁面數:默認頁面大小爲4K,而HugeTLB大小爲2048K。這意味着系統將需要處理少512倍的頁面。
- 減少頁面表遍歷:由於HugePage比常規大小的頁面覆蓋更大的連續虛擬地址範圍,因此使用HugePages的每個TLB條目獲得TLB命中的可能性高於常規頁面。這減少了遍歷頁表以從虛擬地址獲取物理地址的次數。
- 減少內存操作的開銷:在虛擬內存系統(任何現代OS)上,每個內存操作實際上是兩個抽象內存操作。使用HugePages,由於要處理的頁面數量較少,因此可以避免頁面表訪問中可能出現的瓶頸。
- 更少的內存使用:從Oracle數據庫的角度來看,與常規大小的頁面相比,Linux內核通過HugePages將使用更少的內存來創建頁表來維護SGA地址範圍的虛擬到物理映射。這使更多的內存可用於進程專用計算或PGA使用。
- 不交換:我們必須避免在所有文檔1295478.1上的Linux OS上進行交換。HugePages是不可交換的(而常規頁面是可交換的)。因此,沒有頁面替換機制的開銷。HugePages通常被認爲是固定的。
- 沒有“ kswapd”操作:如果要分頁的區域很大(例如,用於50GB內存的1300萬頁表項),則kswapd將變得非常繁忙,並且將使用大量CPU資源。使用HugePages時,kswapd不會參與管理它們。另見文檔361670.1
3、開啓步驟
3.1、前提條件檢查
3.1.1、檢查操作系統Transparent HugePages是否關閉
root@rac1[/root]#cat /sys/kernel/mm/transparent_hugepage/defrag
always madvise [never]
root@rac1[/root]#cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
root@rac1[/root]#
root@rac1[/root]#grep HugePages /proc/meminfo
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
如果都是never,表示操作系統禁用了transparent_hugepage,條件滿足。
3.1.2、關閉操作系統transparent_hugepage
否則就需要關閉操作系統的transparent_hugepage:
備份啓動文件
# cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.bak
# cp /etc/default/grub /etc/default/grub.bak
編輯啓動文件
vi /etc/default/grub
GRUB_CMDLINE_LINUX="crashkernel=128M rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet transparent_hugepage=never"
重新生成啓動文件
# grub2-mkconfig -o /boot/grub2/grub.cfg
確認啓動文件已添加transparent_hugepage=never選項
cat /boot/grub2/grub.cfg
添加啓動項
# vi /etc/rc.d/rc.local
添加如下內容:
# Disable Transparent HugePages
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
賦予可執行權限
#chmod +x /etc/rc.d/rc.local
重啓虛機
# su - grid
$crsctl stop instance -d cc -n rac1
$crsctl stop instance -d rb -n rac1
$exit
#locate crsctl
# /u01/app/grid/product/bin/crsctl stop crs
#reboot
再次回到3.1.1檢查transparent_hugepage是否已經關閉。
3.1.3、檢查memory_target參數
oraclerac@rac1[/home/oraclerac]$export ORACLE_SID=test1
oraclerac@rac1[/home/oraclerac]$sqlplus / as sysdba
SQL> show parameter MEMORY_TARGET
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
memory_target big integer 0
SQL> show parameter MEMORY_MAX_TARGET
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
memory_max_target big integer 0
SQL>
如果兩個都配置爲0,表明未開啓自動內存管理,條件滿足。
3.1.4、檢查use_large_pages參數
SQL> show parameter USE_LARGE_PAGES
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
use_large_pages string TRUE
SQL>
如果值爲TRUE,表示支持開啓oracle hugepages,條件滿足。
3.2、配置os資源限制
此處配置值略小於系統安裝物理內存,大概可以配置90%安裝物理內存。此處以安裝物理內存爲230G爲例。單位kb。
vi /etc/security/limits.conf
* soft memlock 217055232
* hard memlock 217055232
3.3、重新登陸oracle用戶執行檢查
oracle@rac1[/home/oracle]$ulimit -l
2217055232
應該是你剛纔設置的值。
3.4、檢查所有的數據庫實例
確保所有的數據庫實例(包括ASM實例)均已啓動,並且可以在生產環境中運行。
root@rac1[/root]#su - grid
grid@rac1[/home/grid]$crsctl stat res -t
…
ora.asm
1 ONLINE ONLINE rac1 Started,STABLE
2 ONLINE ONLINE rac2 Started,STABLE
3 OFFLINE OFFLINE STABLE
…
ora.joyce.db
2 ONLINE ONLINE rac1 Open,HOME=/u01/app/o
racle/product/12.1.0
,STABLE
…
ora.test.db
1 ONLINE ONLINE rac1 Open,HOME=/u01/app/o
racle/product/12.1.0
,STABLE
2 ONLINE ONLINE rac2 Open,HOME=/u01/app/o
racle/product/12.1.0
,STABLE
檢查所有oracle實例和asm實例都已經啓動運行
3.5、計算hugepages內核參數大小
使用腳本hugepages_settings.sh來計算vm.nr_hugepages內核參數。腳本內容如下:
#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
# on Oracle Linux
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.
#
# This script is provided by Doc ID 401749.1 from My Oracle Support
# http://support.oracle.com
# Welcome text
echo "
This script is provided by Doc ID 401749.1 from My Oracle Support
(http://support.oracle.com) where it is intended to compute values for
the recommended HugePages/HugeTLB configuration for the current shared
memory segments on Oracle Linux. Before proceeding with the execution please note following:
* For ASM instance, it needs to configure ASMM instead of AMM.
* The 'pga_aggregate_target' is outside the SGA and
you should accommodate this while calculating the overall size.
* In case you changes the DB SGA size,
as the new SGA will not fit in the previous HugePages configuration,
it had better disable the whole HugePages,
start the DB with new SGA size and run the script again.
And make sure that:
* Oracle Database instance(s) are up and running
* Oracle Database 11g Automatic Memory Management (AMM) is not setup
(See Doc ID 749851.1)
* The shared memory segments can be listed by command:
# ipcs -m
Press Enter to proceed..."
read
# Check for the kernel version
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`
# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`
if [ -z "$HPG_SZ" ];then
echo "The hugepages may not be supported in the system where the script is being executed."
exit 1
fi
# Initialize the counter
NUM_PG=0
# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"`
do
MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
if [ $MIN_PG -gt 0 ]; then
NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
fi
done
RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q`
# An SGA less than 100MB does not make sense
# Bail out if that is the case
if [ $RES_BYTES -lt 100000000 ]; then
echo "***********"
echo "** ERROR **"
echo "***********"
echo "Sorry! There are not enough total of shared memory segments allocated for
HugePages configuration. HugePages can only be used for shared memory segments
that you can list by command:
# ipcs -m
of a size that can match an Oracle Database SGA. Please make sure that:
* Oracle Database instance is up and running
* Oracle Database 11g Automatic Memory Management (AMM) is not configured"
exit 1
fi
# Finish with results
case $KERN in
'2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
'2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
'3.8') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
'3.10') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
'4.1') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
'4.14') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
*) echo "Kernel version $KERN is not supported by this script (yet). Exiting." ;;
esac
# End
修改腳本權限
root@rac1[/root]#chmod 755 hugepages_settings.sh
root@rac1[/root]#./hugepages_settings.sh
Press Enter to proceed...
Recommended setting: vm.nr_hugepages = 4325
3.6、新增內核參數
根據上面算出來的值,新增內核參數vm.nr_hugepages
root@rac1[/root]#vi /etc/sysctl.conf
vm.nr_hugepages = 4325
3.7、重啓虛機
# su - grid
$srvctl stop instance -d cc -n rac1
$srvctl stop instance -d rb -n rac1
$exit
#locate crsctl
# /u01/app/grid/product/bin/crsctl stop crs
#reboot
3.8、檢查hugepages
root@rac1[/root]#grep HugePages /proc/meminfo
AnonHugePages: 0 kB
HugePages_Total: 4325
HugePages_Free: 4304
HugePages_Rsvd: 71
HugePages_Surp: 0
3.9、啓動數據庫實例
#locate crsctl
# /u01/app/grid/product/bin/crsctl start crs
#su - grid
$srvctl start instance -d cc -n rac1
$srvctl start instance -d rb -n rac1
$crsctl stat res -t
確保所有的數據庫實例和asm實例都起了。
3.10、檢查hugepages
root@rac1[/root]#grep HugePages /proc/meminfo
AnonHugePages: 0 kB
HugePages_Total: 4325
HugePages_Free: 1272
HugePages_Rsvd: 72
HugePages_Surp: 0
4、影響說明
由於使用的是oracle12.2版本的rac版本,我們執行操作的時候是兩個節點滾動執行該操作,在同一個時刻只會停機一個節點,另一個節點仍然可以提供服務,所以業務不會有停機時間。