Tuesday, February 19, 2013

Data Guard Catch-Up via RMAN using TSM

Many a times, we DBA's are faced with an issue where the DG has been out of synch with the Production DB for quite a long time.
Certain examples are we ran an overnight rman duplicate script to create a standy database and by the time we got to look at the status of the restore, hundreds of logs had been generated in the primary .
In all probability, in a typical production environment, a hourly RMAN archivelog backup and delete script would have run and deleted the archivelogs (after backing it up, obviously).

So to help the standby get in synch with the primary , we have to either put the standby in recover mode and restore the archivelogs on production hoping the FAL services would pick it up.

Once we realize that the archivelogs number in 100's or 1000's, this task becomes nearly impossible with the risk of endangering the archive destination on the primary getting filled up.

I have expalined a method below uses the same RMAN technique to get the standby in synch with the primary.
The example uses TSM (Tivoli) as the storage medium and please change accordingly as per your environment's tape set up.

For TSM, we need to copy over the primary node's TSM password file to the standby to ensure that the auxiliary channel can access the tape backups created from the primary database's node.Along with it , we need to ensure that both primary and standby are talking to the same TSM server.

At the Standby, Cancel the Managed recover process if running and ensure that the database is in mount mode.

alter database recover managed standby database cancel;


1)   Do the following in Primary.

create a file called recoverystandby.cmd, with the following contents


#++++++++++++++++++++++++++++++++++
#++RECOVERYSTANDBY.CMD++++++++++
#++++++++++++++++++++++++++++++++++
backup archivelog all not backed up;
run {
#Adjust it as per the archivelogs already present in standby
set until time 'sysdate-(1/24)';
#Make sure below /tmp/tdpo.opt file is available in standby database.
allocate auxiliary channel d2 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/tmp/tdpo.opt)';
recover standby clone database delete archivelog;
}

2)Do the following in Standby.

This is to make sure that the standby can access the tape backups.
******These steps are specific to TSM******
Make sure /tmp/tdpo.opt file is available. This TDPO file is used by auxiliary channel in step 1.

> cat /tmp/tdpo.opt
DSMI_LOG                /tmp
DSMI_ORC_CONFIG         /tmp/dsm.opt
TDPO_FS                 xxx
TDPO_NUM_BUFFERS        10
TDPO_NODE               prmyserver
TDPO_OWNER              oracle
TDPO_PSWDPATH           /tmp

Make sure /tmp/dsm.opt file is present referred above as DSMI_ORC_CONFIG


> cat /tmp/dsm.opt
SERVERNAME <tsm server>

Make sure you have physically copied the primary node's TSM password file. The name of the password file would be TDPO.<tdpo_node> i.e. TDPO.prmyserver and should be available under
TDPO_PSWDPATH path.





3) Do the following in Primary

Set environmental variables to connect to local primary database
export ORACLE_HOME='<path of ORACLE_HOME>'
export ORACLE_SID='<SID of Primary Database>'
export PATH=$ORACLE_HOME/bin:$PATH

Run the above command via RMAN as

rman target / auxiliary sys/manager@<standby database> catalog rman/rman@<rman db> cmdfile=recoverstandby.cmd log=catchup_stdby.log

Please allow the above script to run and this script should allow the standby to catch up with the primary without the need to restore the files into primary.

Once it has caught up, you can put the primary back into managed recovery process.

Please note that there is one more way to achieve the same thing using backup as incremental via rman like 
backup incremental from scn <scn number> database . You can look at other blogs to read more about this and decide for yourself which one to use. If there is a missing archivelog , then you cannot use my method and will be better off using the other approach I just mentioned above.