Auto-Sync tips


Please find the following guide which is used when there are auto-sync failures 

1. nmc@nza:/$ show appliance version (from both nodes) 
If you are not at v3.1.3 version on both nodes, you should upgrade rrdaemon had some stability fixes that went into 3.1.3. If you can't upgrade, then both nodes needs to be same. 

2. Whenever you see a data-set lock and there is no real data-getting transferred 
nmc@nza:/$ show appliance nms locks –v (to check locks on both nodes) 
If this shows locks, please run the following from bash --> #svcadm restart nms (No output means no lock which is good) 

3. Must for v3.1.x - nmc@nza:/$ setup trigger autosync-check disable (on source) 

4. Must for v3.1.x - nmc@nza:/$ setup auto-sync serialize-all (on source) 

5. setup auto-sync :job_name property force (on source) 

and pick Use_any_snapshots, rollback, destroy snapshots as the options 

6. Must for v3.1.x - 
nmc@nza:/$ show auto-sync :service-name property zip_level (look at the end for zip_level entry =0) 
This zip_level should be 0. compression in RR doesn't work efficiently and consistently. 
(Disable auto-sync, change zip_level property to 0 (setup auto-sync :service-name property zip_level), Enable auto-sync, Run auto-sync) 

7. Check source and destination can communicate using rr protocol from bash 
nmc# option expert_mode=1 
nmc# !bash 
y 

rrmgr -x 'rdestination_host' ping (on source) 

8. Restart this rrdaemon service, once the service is restarted - try to run the auto-sync job again. 

nmc# option expert_mode=1 
nmc# !bash 
y 

svcs -v | grep rrdaemon 
svcadm restart rrdaemon 


----------------------------------------------------

Have you recreated the jobs without the /* ?. You have to destroy the jobs, kill any that are still running and recreate the jobs without the /*. 

Have you done the 
setup auto-sync serialize all 
setup trigger autosync-check disable 

To kill any running auto-sync jobs 

ssh admin@server_name 
su 
<root's password> 
ps -ef | grep auto- 

pkill <auto-sync_job_process> 

--------------------------------------------------------------------------------

3.1.5 version bugs --- sup-799 hint:"snapshot is cloned"

The work around for this issue is the following:

From BASH:
1. Determine clone names on the destination host:

# zdb -d <poolname> | grep %

OR

## For HA cluster use

# zdb -U /opt/HAC/RSF-1/etc/volume-cache/data.cache -d data|grep %

2. Destroy identified clones:

# zfs destroy <clone-with-%-in-the-name>

It will complain that 'dataset does not exist', but you can check
again(see 1)

3. Destroy snapshot(s) that could not be destroyed previously
# zfs destroy <snapshot-name>

4. Rerun the auto-sync job

發佈了44 篇原創文章 · 獲贊 0 · 訪問量 11萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章