Friday 11 December 2015

A note about Data pump behaviour while blocked

Lately I manage several db environments, on which I perform frequent loads with the Data Pump tools. As those loads are sometimes quite large (up to 0.5T), I constantly meet the situation, when the space designated for archivelogs is filled completely, which in turn causes archival process to stop.
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance YYYYYY - Archival Error
ORA-16038: log 1 sequence# 3208 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 1 thread 1: '+RECO/YYYYYY/onlinelog/group_1.257.895415461'
ORA-00312: online log 1 thread 1: '+DATA/YYYYYY/onlinelog/group_1.257.895415461'
Fri Dec 11 11:03:35 2015
Archiver process freed from errors. No longer stopped
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance YYYYYY - Archival Error
ORA-16014: log 1 sequence# 3208 not archived, no available destinations
I used to load data to databases in the NOARCHIVELOG mode, where there is no such issue at all. In many cases the archival break means that a Data Pump job stops for a time, until the space for archivelogs is freed and available for future archivelogs. Thus it does not matter when the space will be freed (by backup).
But there are scenarios, when I have faced the following behaviour. When a Data Pump job meets the archival error, it fails with the following:
ORA-39065: unexpected master process exception in RECEIVE
ORA-39078: unable to dequeue message for agent MCP from 
  queue "KUPC$C_1_20151210094644"
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPC$QUE_INT", line 610
ORA-04021: timeout occurred while waiting to lock object
Job "ZZZZZZ"."SYS_IMPORT_SCHEMA_01" stopped due to fatal error 
  at Thu Dec 10 11:54:39 2015 elapsed 0 02:07:55
It seems to me that such situations are possible, when parallel job slaves are run. They die, waiting for some object to lock and possibly it is simply the problem with communication between the slaves and the master, when db freeze disables this communication. However this is not a final error. Facing this one may attach to the Data Pump session. It seems it freezes and does not recover automatically, when the work is possible again. In such situation the user has to call CONTINUE command - then the job recreates the slaves and the job steps further.
Job SYS_IMPORT_SCHEMA_01 has been reopened at Thu Dec 10 12:23:51 2015
Restarting "ZZZZZZ"."SYS_IMPORT_SCHEMA_01":  ZZZZZZ/********
  parfile=ZZZZZZ.par