Wednesday, 8 August 2012

Automation while suspending the db work

From some time the Oracle RDBMS supports a situation when due to some work lacks the space within db tablespaces. The AFTER SUSPEND trigger is called, so there is a place for kind of callback, one may set the RESUMABLE_TIMEOUT parameter or there is the DBA_RESUMABLE table with entries inserted after supend events.

If I would imagine itself the best way to solve the suspend event it would be to run within the AFTER SUSPEND trigger a code which would add a new file to the tablespace, on which there is a lack of place. There are was even examples with such solution for 9i version, but it is not now at least with 11.2 version (not sure about 10g). So what one may do?

I meet this problem mostly on test/dev environments, so the solution do not need to be a complex one and I do not need to think about any special cases - all added files are of UNLIMITED size, so always the solution is to add a new file (unless the storage is fully used).
  1. I have met with a solution based upon the AFTER SUSPEND trigger. In 11.2 we get ORA-30511 if trying to run some DDL (and adding a file is of such type). The practical usage now I see in the following - setting RESUMABLE_TIMEOUT to some higher value in order to allow a DBA to make a room for the operation in suspend state or/and inform such DBA by SMS or an alert generation
  2. use DBMS_SCHEDULER job to run from time to time a piece of code, which would add a new file
In my case the second solution has much more sense and is a real automation. Because we mostly use Data Pump utilities to load test/dev databases, it usually enables 2h long suspend (or so) on the operation. In such case the best way is to create a procedure like this:
CREATE OR REPLACE
PROCEDURE expand_resumable_tblspace
AS
  l_tblspace VARCHAR2(80);
  l_count pls_integer := 0;
BEGIN
  SELECT
    COUNT(*)
  INTO
    l_count
  FROM
    dba_resumable
  WHERE
    1               =1
  AND SUSPEND_TIME IS NOT NULL
  AND resume_time  IS NULL
  AND status        ='SUSPENDED' ;
  IF l_count        > 0 THEN
    SELECT
      SUBSTR(regexp_substr(ERROR_MSG, 'tablespace [^[:blank:]]*', 1, 1, 'im'),
      12)
    INTO
      l_tblspace
    FROM
      dba_resumable
    WHERE
      1                       =1
    AND SUSPEND_TIME         IS NOT NULL
    AND resume_time          IS NULL
    AND status                ='SUSPENDED'
    AND rownum                <2 ;
    IF SUBSTR(l_tblspace,1,4)!='TEMP' THEN
      EXECUTE immediate 'alter tablespace '||l_tblspace||
      ' add datafile size 1m autoextend on next 8m';
    ELSE
      EXECUTE immediate 'alter tablespace '||l_tblspace||
      ' add tempfile size 128m autoextend on next 128m';
    END IF;
  END IF;
EXCEPTION
WHEN OTHERS THEN
  SYS.DBMS_SYSTEM.KSDDDT;
  SYS.DBMS_SYSTEM.KSDWRT(2, SQLERRM);
  SYS.DBMS_SYSTEM.KSDFLS;
END;
/
and to run a job every let's say 5 minutes, which would call such procedure.

Why take only one row from DBA_RESUMABLE? I assume there is not too much such events at a moment and the second one would be serviced with next job run.

Why cut a tablespace name from ERROR_MSG and not from ERROR_PARAMETERx? I found that in some circumstances (different db version for example) the ERROR_PARAMETERx are set differently or not at all. For 11.2 IIRC the tablespace name was set in ERROR_PARAMETER4.

No comments: