常用MPI 的進程綁定方法

1. 介紹
我們常通過CPU 進程綁定（binding or affinity）的方法來提高MPI 程序的性能。通過CPU 進
程綁定，可以避免進程在CPU 核之間切換帶來的開銷，可以減輕cache 爭搶現象。特別是當
進程數爲CPU 總核數一半左右時，有時會發現測試結果不穩定，時好時壞，很可能是因爲
進程切換造成的，這時不妨嘗試進行進程CPU 綁定。
MPI 程序的進程CPU 綁定可以通過兩種方式實現，一是使用MPI 發行版自帶的進程綁定功
能，二是通過操作系統層的工具，如numactl 實現。第一種實現方式使用起來比較簡單，第
二種的可靠性更高。本文簡單總結常用的一些MPI 發行版實現CPU 進程綁定的方法，包括
OpenMPI、MPICH2、MVAPICH、MVAPICH2、Intel MPI 和HP MPI，MPICH 暫沒找到實現進程
綁定的方法。
2. OpenMPI
OpenMPI 實現進程CPU 綁定的方法很多，基本上實現方式越複雜，綁定的可靠性越高，可
以根據情況選用。
2.1. 通過MCA 參數“mpi_paffinity_alone”，打開隱性CPU 綁定支持，即自動綁定
mpirun ‐np <N> ‐machinefile <machinefile> ‐‐mca mpi_paffinity_alone 1 <executable>
2.2. 通過mpirun 的“‐‐bind‐to‐core”或“‐‐bind‐to‐socket”參數，實現不同的綁定方式
本方法適用於1.4 以上版本。
mpirun ‐np <N> ‐machinefile <machinefile> ‐‐bind‐to‐core ‐‐bycore <executable>
將進程綁定到CPU 核上，綁定順序按照CPU 核連續分佈。
mpirun ‐np <N> ‐machinefile <machinefile> ‐‐bind‐to‐core ‐‐bysocket <executable>
將進程綁定到CPU 核上，綁定順序按照物理CPU 分佈，即分散到各物理CPU，這種綁定方
式能減輕cache 爭搶。
mpirun ‐np <N> ‐machinefile <machinefile> ‐‐bind‐to‐socket ‐‐bysocket <executable>
只將進程綁定到socket，即物理CPU 上。
另外，給mpirun 加上“‐‐report‐bindings”參數，可以輸出綁定信息。

2.3. 通過mpirun 的“‐‐slot‐list”參數，按照指定CPUID 號進行綁定
mpirun ‐np 4 ‐‐slot‐list 0,4,8,12 <executable>
上面的例子中，一臺4 路4 核機器上運行4 個進程，分別綁定到每個物理CPU 的第一個核
上。
2.4. 通過mpirun 的“‐‐rankfile”參數，限定每個進程的綁定方式
mpirun ‐np 4 ‐‐rankfile <myrank> <executable>
cat myrank
rank 0=node1 slot=0
rank 1=node2 slot=4
rank 2=node3 slot=4‐7
rank 3=node4 slot=0:0,1
上面的例子中，第1 個進程綁定到node1 的第一個核上；第二個進程綁定到node2 的4 號
CPU 核上；第三個進程綁定到node3 的4 到7 號CPU 核上,；第四個進程綁定到node4 的0
號socket 的第0 和1 號核上（等同於node4 的前兩個核）。
2.5. 通過操作系統numactl 調用MPI 程序，實現更可靠的綁定
通過mpirun 的“appfile”功能實現。舉例說明，比如需要兩節點、4 進程運行一個MPI 程
序/home/test/a.out，分別綁定到每個節點的0 和4 號CPU 核上。
mpirun ‐‐appfile <myapp>
cat myapp
‐np 1 ‐host node1 /home/test/1.sh
‐np 1 ‐host node1 /home/test/2.sh
‐np 1 ‐host node2 /home/test/3.sh
‐np 1 ‐host node2 /home/test/4.sh
或者
mpirun ‐np 1 ‐host node1 /home/test/1.sh : ‐np 1 ‐host node1 /home/test/2.sh : ‐np 1 ‐host
node2 /home/test/3.sh : ‐np 1 ‐host node2 /home/test/4.sh
[1‐4].sh 的內容如下：
chmod +x 1.sh; cat 1.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 2.sh; cat 2.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out
chmod +x 3.sh; cat 3.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 4.sh; cat 4.sh
#!/bin/sh

numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out
3. MPICH2
MPICH2 自身不支持進程CPU 綁定功能，但可以通過調用系統的numactl 實現。
舉例說明，比如需要兩節點、4 進程運行一個MPI 程序/home/test/a.out，分別綁定到每個節
點的0 和4 號CPU 核上。
mpiexec ‐configfile <myconfigfile>
cat myconfigfile
‐n 1 ‐host node1 /home/test/1.sh
‐n 1 ‐host node1 /home/test/2.sh
‐n 1 ‐host node2 /home/test/3.sh
‐n 1 ‐host node2 /home/test/4.sh
或者
mpiexec ‐n 1 ‐host node1 /home/test/1.sh : ‐n 1 ‐host node1 /home/test/2.sh : ‐n 1 ‐host node2
/home/test/3.sh : ‐n 1 ‐host node2 /home/test/4.sh
[1‐4].sh 的內容如下：
chmod +x 1.sh; cat 1.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 2.sh; cat 2.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out
chmod +x 3.sh; cat 3.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 4.sh; cat 4.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out
4. MVAPICH
MVAPICH 默認已經打開隱性的進程CPU 綁定，這個默認行爲可以通過環境變量
VIADEV_USE_AFFINITY 設置
export VIADEV_USE_AFFINITY=1（默認已打開）
export VIADEV_USE_AFFINITY=0（對於MPI+OpenMP 程序，需要關閉默認綁定，否則多個線
程會綁到同一CPU 核上）
顯式指定CPU 核進行綁定，可以通過環境變量VIADEV_CPU_MAPPING 實現。

mpirun_rsh ‐ssh ‐np <N> ‐hostfile <hostfile> <executable> VIADEV_CPU_MAPPING=0,4,8,12
將進程按順序綁定到每節點的0,4,8,12 號CPU 核上，當實際進程數超過指定綁定核數時，按
照輪循方式進行綁定。
5. MVAPICH2
MVAPICH2 默認已經打開隱性的進程CPU 綁定，這個默認行爲可以通過環境變量
MV2_ENABLE_AFFINITY 設置
export MV2_ENABLE_AFFINITY=1（默認已打開）
export MV2_ENABLE_AFFINITY=0（對於MPI+OpenMP 程序，需要關閉默認綁定，否則多個線
程會綁到同一CPU 核上）
5.1. 通過環境變量MV2_CPU_MAPPING 實現綁定
指定CPU 核進行綁定，可以通過環境變量MV2_CPU_MAPPING 實現。
mpiexec ‐genv MV2_CPU_MAPPING=0:4:8:12 ‐n <N> <executable>
將進程按順序綁定到每節點的0,4,8,12 號CPU 核上，當實際進程數超過指定綁定核數時，按
照輪循方式進行綁定。
5.2. 通過操作系統numactl 調用MPI 程序，實現更可靠的綁定
與MPICH2 的調用方法完全一樣，參見第3 節。
6. Intel MPI
Intel MPI 是在MVAPICH2 基礎上開發的，默認已經打開隱性的進程CPU 綁定。這個默認行爲
可以通過環境變量I_MPI_PIN 設置
export I_MPI_PIN=1（默認已打開）
export I_MPI_PIN=0（關閉）
對於MPI+OpenMP 程序，需要關閉默認綁定功能，否則多個線程會綁到同一CPU 核上，Intel
MPI 可以設置I_MPI_PIN_DOMAIN=omp，來屏蔽默認的進程綁定功能。
6.1. 通過環境變量I_MPI_PIN_PROCESSOR_LIST 實現綁定
mpirun ‐r ssh ‐f <machinefile> ‐genv I_MPI_PIN_PROCESSOR_LIST 0,4,8,12 ‐n <N> <executable>
將進程按順序綁定到每節點的0,4,8,12 號CPU 核上，當實際進程數超過指定綁定核數時，按
照輪循方式進行綁定。
mpirun ‐r ssh ‐f <machinefile> ‐genv I_MPI_PIN_PROCESSOR_LIST bunch ‐n <N> <executable>
綁定順序：儘可能綁定到相同物理CPU 上
mpirun ‐r ssh ‐f <machinefile> ‐genv I_MPI_PIN_PROCESSOR_LIST scatter ‐n <N> <executable>

綁定順序：儘可能分散到不同物理CPU 上
另外，設置環境變量參數“‐genv I_MPI_DEBUG 4”，可以輸出綁定信息。
6.2. 通過操作系統numactl 調用MPI 程序，實現更可靠的綁定
通過“configfile”功能實現，與MPICH2 和MVAPICH2 的使用方法一樣。
舉例說明，比如需要兩節點、4 進程運行一個MPI 程序/home/test/a.out，分別綁定到每個節
點的0 和4 號CPU 核上。
mpirun ‐r ssh ‐f <machinefile> ‐configfile <myconfigfile>
cat myconfigfile
‐n 1 ‐host node1 /home/test/1.sh
‐n 1 ‐host node1 /home/test/2.sh
‐n 1 ‐host node2 /home/test/3.sh
‐n 1 ‐host node2 /home/test/4.sh
或者
mpirun ‐r ssh ‐f <machinefile> ‐n 1 ‐host node1 /home/test/1.sh : ‐n 1 ‐host node1
/home/test/2.sh : ‐n 1 ‐host node2 /home/test/3.sh : ‐n 1 ‐host node2 /home/test/4.sh
[1‐4].sh 的內容如下：
chmod +x 1.sh; cat 1.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 2.sh; cat 2.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out
chmod +x 3.sh; cat 3.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 4.sh; cat 4.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out
7. HP MPI
7.1. HP MPI 自帶的綁定功能
HP MPI 的進程CPU 綁定可以以下方式實現。
mpirun ‐np <N> ‐hostfile <hostfile> ‐cpu_bind=v ‐cpu_bind=MAP_CPU:0,4,8,12 <executable>
或者
mpirun ‐np <N> ‐hostfile <hostfile> ‐cpu_bind=v ‐e MPI_BIND_MAP=0,4,8,12 <executable>
其中，“‐cpu_bind=v”用於輸出綁定信息。
7.2. 通過操作系統numactl 調用MPI 程序，實現更可靠的綁定
通過mpirun 的“appfile”功能實現。舉例說明，比如需要兩節點、4 進程運行一個MPI 程
序/home/test/a.out，分別綁定到每個節點的0 和4 號CPU 核上。
mpirun ‐f <myapp>
cat myapp
‐np 1 ‐h node1 /home/test/1.sh
‐np 1 ‐h node1 /home/test/2.sh
‐np 1 ‐h node2 /home/test/3.sh
‐np 1 ‐h node2 /home/test/4.sh
[1‐4].sh 的內容如下：
chmod +x 1.sh; cat 1.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 2.sh; cat 2.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out
chmod +x 3.sh; cat 3.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 0 /home/test/a.out
chmod +x 4.sh; cat 4.sh
#!/bin/sh
numactl ‐‐localalloc ‐‐physcpubind 4 /home/test/a.out