Friday, 8 June 2018

RMAN-3008/RMAN-3009 and RMAN-20095

Today I had the following problem:
  • initially the backup failed with
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-03008: error while performing automatic resync of recovery catalog
    RMAN-20095: invalid backup/copy control file checkpoint SCN
    
  • every next backup failed with
    RMAN-03014: implicit resync of recovery catalog failed
    RMAN-03009: failure of partial resync command on default channel at 06/07/2018 04:18:50
    RMAN-20095: invalid backup/copy control file checkpoint SCN
    
This is reported on Metalink as "Bug 19209117 RMAN-3008 RMAN-20095 while performing Automatic Resync"

How to move forward?
I found the 1st backup, which failed, then in this backup log found the place the 1st time the error occurred, then started to uncatalog since that place upward. Please note the uncataloging has to take place when connected only to the database (i.e. when working only with backaup entries stored within a control file) - otherwise rman will try as a 1st step implicitly resync the catalog which will lead to the error over and over again.
change backuppiece '/path_to_backup/cf_YYYYY_c-283029066-20180607-02' uncatalog;
The first call to uncatalog controlfile backup was enough.
In the debug of call to resync catalog during problems there is the following section:
DBGRESYNC:     channel default:   file# 0 [11:30:20.805] (resync)
DBGRPC:        krmxrpc - channel default kpurpc2 err=0 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CHECKBACKUPDATAFILE excl: 0
   DBGRCVCAT: addBackupControlfile - Inside dup_val_on_index exception
DBGRESYNC:     channel default: Calling checkBackupDataFile for set_stamp 978149312 set_count 8542 recid 6038 [11:30:20.807] (resync)
DBGRESYNC:     channel default:   file# 0 [11:30:20.807] (resync)
DBGRPC:        krmxrpc - channel default kpurpc2 err=20095 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CHECKBACKUPDATAFILE excl: 129
   DBGRCVCAT: addBackupControlfile - Inside dup_val_on_index exception
   DBGRCVCAT: addBackupControlfile - ckp_scn 3201066972364 ckp_time 07-JUN-18
   DBGRCVCAT: addBackupControlfile - lckp_scn 3201066972565 lckp_time 07-JUN-18
DBGRPC:        krmxrpc - channel default kpurpc2 err=0 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CANCELCKPT excl: 0
   DBGRCVCAT: cancelCkpt - rollback, released all locks

-- here the moment the error 20095 is catched
DBGPLSQL:     EXITED resync with status ORA--20095 [11:30:20.992]
DBGRPC:       krmxr - channel default returned from peicnt
DBGMISC:      ENTERED krmstrim [11:30:20.992]
DBGMISC:       Trimming message: ORA-06512: at line 3401 [11:30:20.992] (krmstrim)
DBGMISC:        (24) (krmstrim)
DBGMISC:      EXITED krmstrim with status 24 [11:30:20.992] elapsed time [00:00:00:00.000]
DBGRPC:       krmxr - channel default got execution errors (step_60)
DBGRPC:       krmxr - exiting with 1
DBGMISC:      krmqexe: unhandled exception on channel default [11:30:20.992]
DBGMISC:     EXITED krmiexe with status 1 [11:30:20.992] elapsed time [00:00:01:21.889]
[..]
DBGMISC:     error recovery releasing channel resources [11:30:20.992]
DBGRPC:      krmxcr - channel default resetted
DBGMISC:     ENTERED krmice [11:30:20.993]
DBGMISC:      command to be compiled and executed is: cleanup  [11:30:20.993] (krmice)
DBGMISC:      command after this command is: NONE  [11:30:20.993] (krmice)
[..]
DBGMISC:     EXITED krmice [11:30:21.017] elapsed time [00:00:00:00.024]
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 06/08/2018 11:30:20
RMAN-20095: invalid backup/copy control file checkpoint SCN