question: the job 可以提交,可以進隊,但是一直處於 Q 狀態,不會被調度。
作業處於Q狀態不執行最初的錯誤是:沒有配置 server的 scheduling 屬性:這個屬性可以在qmgr這個命令下配置:具體命令是:set server scheduling=true 但是在執行這個命令的跳出了以下錯誤:qmgr obj= svr=default: Illegal attribute or resource value for scheduling 屬性一直配置不上,然後就把之前的隊列都清空了,用命令:pbs server -t create 在這之後,重新配置了隊列屬性:
Qmgr:
create queue myque queue type=execution
Qmgr: set server default queue=myque
Qmgr: set queue
myque started=true
Qmgr: set queue
myque enabled=true
Qmgr: set server scheduling=true
配置以後,作業提交還是Q狀態,並且用astat -f 查看 作業提交了以後不給分配 執行節點,強制執行qrun 作業以後,作業會分配到當前處於free狀態的節點,但是還是不執行
qstat以後顯示:
1.node90 STDIN admin 0 Q myque
3.node90 testpbs freeman 0 Q myque
qrun 1.node90 然後 qstat -f 後顯示:
Job Id:
1.node90
Job_Name = STDIN
Job_Owner = admin@node90
job_state = Q
queue = myque
server = node90
Checkpoint = u
ctime = Sat Jun 7 21:29:40 2014
Error_Path = node90:/var/spool/torque/STDIN.e32
exec_host = nodelhj/0
exec_port = 15003
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Sat Jun 7 21:44:48 2014
Output_Path = node90:/var/spool/torque/STDIN.o32
Priority = 0
qtime = Sat Jun 7 21:29:40 2014
Rerunable = True
substate = 10
Variable_List = PBS_O_QUEUE=myque,PBS_O_HOME=/home/admin,
後面次要的信息 沒有給顯示 太長了,qrun的作業陪分配了exec_host‘ 但是依舊不執行;而沒有qrun的作業 還是沒有執行節點。
tracejob 1.node90 之後顯示:
06/08/2014 10:35:18 S enqueuing into myque, state 1 hop 1
06/08/2014
10:35:18 A queue=myque
06/08/2014 10:45:47 S enqueuing into myque, state 1 hop 1
06/08/2014 10:45:47 S Requeueing job, substate: 10 Requeued in queue: myque
06/08/2014 10:51:55 S enqueuing into myque, state 1 hop 1
06/08/2014 10:51:55 S Requeueing job, substate: 10 Requeued in queue: myque
06/08/2014 10:52:38 S Job Run at request of root@node90
06/08/2014 10:52:38 S unable to run job, MOM rejected/rc=-1
06/08/2014 10:52:38 S unable to run job, send to MOM '168036859' failed
然後查看server_logs 會發現有以下錯誤:
06/08/2014 16:27:36;0001;PBS_Server.31118;Svr;PBS_Server;LOG_ERROR::Operation now in progress (115) in tcp_connect_sockaddr, Failed when trying to open tcp connection - connect() failed [rc = -2] [addr = 10.4.9.251:15003]
06/08/2014 16:27:36;0001;PBS_Server.31118;Svr;PBS_Server;LOG_ERROR::send_hierarchy, Could not send mom hierarchy to host nodelhj:15003
這是計算節點拒絕,後來有人提示可能是因爲不是ssh爲I密碼登陸問題,然後設置了ssh無密碼登錄,這個配置 詳見:http://blog.csdn.net/leexide/article/details/17252369
然後問題還是沒有解決,後來發現mom節點的時間是美國時間,修改了時區,然後qrun的作業可以正確執行,修改時區方法:
[root@nodelhj torque]# date
Thu Jun 5 06:01:59 PDT 2014
[root@nodelhj torque]# set date
[root@nodelhj torque]# tzselect
Please identify a location so that time zone rules can be set correctly.
Please select a continent or ocean.
1) Africa
2) Americas
3) Antarctica
4) Arctic Ocean
5) Asia
6) Atlantic Ocean
7) Australia
8) Europe
9) Indian Ocean
10) Pacific Ocean
11) none - I want to specify the time zone using the Posix TZ format.
#? 5
Please select a country.
1) Afghanistan 18) Israel
35) Palestine
2) Armenia 19) Japan
36) Philippines
3) Azerbaijan 20) Jordan
37) Qatar
4) Bahrain 21) Kazakhstan
38) Russia
5) Bangladesh 22) Korea (North)
39) Saudi Arabia
6) Bhutan 23) Korea (South)
40) Singapore
7) Brunei 24) Kuwait
41) Sri Lanka
8) Cambodia 25) Kyrgyzstan
42) Syria
9) China 26) Laos
43) Taiwan
10) Cyprus 27) Lebanon
44) Tajikistan
11) East Timor 28) Macau
45) Thailand
12) Georgia 29) Malaysia
46) Turkmenistan
13) Hong Kong 30) Mongolia
47) United Arab Emirates
14) India 31) Myanmar (Burma)
48) Uzbekistan
15) Indonesia 32) Nepal
49) Vietnam
16) Iran 33) Oman
50) Yemen
17) Iraq 34) Pakistan
#? 9
Please select one of the following time zone regions.
1) east China - Beijing, Guangdong, Shanghai, etc.
2) Heilongjiang (except Mohe), Jilin
3) central China - Sichuan, Yunnan, Guangxi, Shaanxi, Guizhou, etc.
4) most of Tibet & Xinjiang
5) west Tibet & Xinjiang
#? 1
The following information has been given:
China
east China - Beijing, Guangdong, Shanghai, etc.
Therefore TZ='Asia/Shanghai' will be used.
Local time is now: Thu Jun 5 21:04:44 CST 2014.
Universal Time is now: Thu Jun 5 13:04:44 UTC 2014.
Is the above information OK?
1) Yes
2) No
#? 1
You can make this change permanent for yourself by appending the line
TZ='Asia/Shanghai'; export TZ
to the file '.profile' in your home directory; then log out and log in again.
Here is that TZ value again, this time on standard output so that you
can use the /usr/bin/tzselect command in shell scripts:
Asia/Shanghai
但是 重啓機器以後沒有時區沒有修改成功,於是用了手工修改的方法(進入localtime文件修改時間 保存修改即可生效):
vi /etc/sysconfig/clock ZONE=Asia/Shanghai(查/usr/share/zoneinfo下面的文件) UTC=false ARC=false
rm /etc/localtime
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime。
然後在解決了這一問題以後,作業都可以在qrun的命令下執行,但是 作業還是不會自己被調度:
但是現在sched_logs的調度日誌那個縱慾有了日誌,但是調度依舊沒有發生。於是開始安裝maui 期待買可以調度執行作業。