Tuesday 6 December 2011

BACKUP_TAPE_IO_SLAVES

Initial steps

Since today we start to enable the BACKUP_TAPE_IO_SLAVES parameter.
What has changed:
  • for a while when RMAN runs the RDBMS launches an additional process, which reports in alert.log, for example
    Running KSFV I/O slave I601 os pid=8282
    KSFV I/O slave I601 dp=f2e24c020 exiting
  • this process uses a memory allocated from LARGE_POOL (or SHARED_POOL if the former is not allocated) instead of private memory of the RMAN session process
  • all the RMAN IO operations to SBT device since then are seen as asynchronous ones (of course this is kind of "fake" asynchronous IO similar in shape to the architecture with DBWR_IO_SLAVES set to non-zero value)
  • thus the IO operations are reported in V$BACKUP_ASYNCH_IO (previously V$BACKUP_SYNCH_IO)
  • the following waits have a place here:
    • os thread startup (which is certainly a start of KSFV IO slave)
    • imm op (which is the event specific for RMAN with IO slaves in use)
    • io done (I found that this one is specific for synchronous IO operations, which may mean that this is wait reported by KSFV IO slave and that's why it is "fake" asynchronous IO)
    • control file sequential read
.
For both ways there are in use 4 buffers with 256K each. For relatively short operations (for example frequent backup of archivelogs) there is no difference to be seen.

Update (after 3 years of usage)

This time we go the other way round by switching off this parameter. The main reason is processes Ixxx are not stable. We had it on on 2 kinds of os-es - AIX and Solaris. On Solaris we switched it off after some crash and today (2014.02.12) we experienced a crash on AIX, so eventually we disable it on our machines completely.
And below some details concerned with today's crash:
  • we experienced a number of ORA-7445 errors in a sequence
  • ORA-07445: exception encountered: core dump [PC:0x900000002A3993C] [SIGSEGV]] [ADDR:0x100A10000FDDE0] [PC:0x900000002A3993C] [Address not mapped to object] []
  • call stack:
    skdstdst()+40 -> ksedst1()+104 -> ksedst()+40 -> dbkedDefDump()+2828 -> ksedmp()+76 -> ssexhd()+2448 -> 48bc


Update

Today a colleague of mine found a system, where the backup did not work properly. The launch of backup returned
ORA-17619: max number of processes using I/O slaves in a instance reached
The docs say it happens when there are more than 35 processes, which are controlling some IO slaves. In this particular example there were some backup processes, which lasted for few months already and never completed.

Update

Today (18 Sep 2018) on some db I 've found again ORA-17619. As this article has already few years, forget about it. Googled the net as well as Metalink resources, not much suggestions - finally found this article and checked for open orphaned rman sessions - that's it.
I thought about too many parallel slaves, while in fact this is simply a number of all the sessions opened by rman - somewhere there exists a counter, which counts processes from rman, but they do not need to be parallel (as in PX queries) - they simply exist at the same time.
In this particular example the sessions were opened at the different times across the period of 2 months.
This select may help to choose the old sessions - please note I had to kill those on the OS level, so by looking for SPID in v$process (simple session kill switched sessions in the KILLED status, but they have not gone)
select paddr||' '||logon_time||' '||module||' '||status
  ||' alter system kill session '''||sid||','||serial#||''' immediate;' 
from v$session where program like 'rman%' order by logon_time;
-- one may also further obtain also the PID of an underlying OS process 
select spid from v$process where addr='';