Auto-Sync tips

Please find the following guide which is used when there are auto-sync failures

1. nmc@nza:/$ show appliance version (from both nodes)
If you are not at v3.1.3 version on both nodes, you should upgrade rrdaemon had some stability fixes that went into 3.1.3. If you can't upgrade, then both nodes needs to be same.

2. Whenever you see a data-set lock and there is no real data-getting transferred
nmc@nza:/$ show appliance nms locks –v (to check locks on both nodes)
If this shows locks, please run the following from bash --> #svcadm restart nms (No output means no lock which is good)

3. Must for v3.1.x - nmc@nza:/$ setup trigger autosync-check disable (on source)

4. Must for v3.1.x - nmc@nza:/$ setup auto-sync serialize-all (on source)

5. setup auto-sync :job_name property force (on source)

and pick Use_any_snapshots, rollback, destroy snapshots as the options

6. Must for v3.1.x -
nmc@nza:/$ show auto-sync :service-name property zip_level (look at the end for zip_level entry =0)
This zip_level should be 0. compression in RR doesn't work efficiently and consistently.
(Disable auto-sync, change zip_level property to 0 (setup auto-sync :service-name property zip_level), Enable auto-sync, Run auto-sync)

7. Check source and destination can communicate using rr protocol from bash
nmc# option expert_mode=1
nmc# !bash
y

rrmgr -x 'rdestination_host' ping (on source)

8. Restart this rrdaemon service, once the service is restarted - try to run the auto-sync job again.

nmc# option expert_mode=1
nmc# !bash
y

svcs -v | grep rrdaemon
svcadm restart rrdaemon

----------------------------------------------------

Have you recreated the jobs without the /* ?. You have to destroy the jobs, kill any that are still running and recreate the jobs without the /*.

Have you done the
setup auto-sync serialize all
setup trigger autosync-check disable

To kill any running auto-sync jobs

ssh admin@server_name
su
<root's password>
ps -ef | grep auto-

pkill <auto-sync_job_process>

--------------------------------------------------------------------------------

3.1.5 version bugs --- sup-799 hint:"snapshot is cloned"

The work around for this issue is the following:

From BASH:
1. Determine clone names on the destination host:

# zdb -d <poolname> | grep %

OR

## For HA cluster use

# zdb -U /opt/HAC/RSF-1/etc/volume-cache/data.cache -d data|grep %

2. Destroy identified clones:

# zfs destroy <clone-with-%-in-the-name>

It will complain that 'dataset does not exist', but you can check
again(see 1)

3. Destroy snapshot(s) that could not be destroyed previously
# zfs destroy <snapshot-name>

4. Rerun the auto-sync job