Wednesday, 8 August 2012

Automation while suspending the db work

From some time the Oracle RDBMS supports a situation when due to some work lacks the space within db tablespaces. The AFTER SUSPEND trigger is called, so there is a place for kind of callback, one may set the RESUMABLE_TIMEOUT parameter or there is the DBA_RESUMABLE table with entries inserted after supend events.

If I would imagine itself the best way to solve the suspend event it would be to run within the AFTER SUSPEND trigger a code which would add a new file to the tablespace, on which there is a lack of place. There are was even examples with such solution for 9i version, but it is not now at least with 11.2 version (not sure about 10g). So what one may do?

I meet this problem mostly on test/dev environments, so the solution do not need to be a complex one and I do not need to think about any special cases - all added files are of UNLIMITED size, so always the solution is to add a new file (unless the storage is fully used).
  1. I have met with a solution based upon the AFTER SUSPEND trigger. In 11.2 we get ORA-30511 if trying to run some DDL (and adding a file is of such type). The practical usage now I see in the following - setting RESUMABLE_TIMEOUT to some higher value in order to allow a DBA to make a room for the operation in suspend state or/and inform such DBA by SMS or an alert generation
  2. use DBMS_SCHEDULER job to run from time to time a piece of code, which would add a new file
In my case the second solution has much more sense and is a real automation. Because we mostly use Data Pump utilities to load test/dev databases, it usually enables 2h long suspend (or so) on the operation. In such case the best way is to create a procedure like this:
CREATE OR REPLACE
PROCEDURE expand_resumable_tblspace
AS
  l_tblspace VARCHAR2(80);
  l_count pls_integer := 0;
BEGIN
  SELECT
    COUNT(*)
  INTO
    l_count
  FROM
    dba_resumable
  WHERE
    1               =1
  AND SUSPEND_TIME IS NOT NULL
  AND resume_time  IS NULL
  AND status        ='SUSPENDED' ;
  IF l_count        > 0 THEN
    SELECT
      SUBSTR(regexp_substr(ERROR_MSG, 'tablespace [^[:blank:]]*', 1, 1, 'im'),
      12)
    INTO
      l_tblspace
    FROM
      dba_resumable
    WHERE
      1                       =1
    AND SUSPEND_TIME         IS NOT NULL
    AND resume_time          IS NULL
    AND status                ='SUSPENDED'
    AND rownum                <2 ;
    IF SUBSTR(l_tblspace,1,4)!='TEMP' THEN
      EXECUTE immediate 'alter tablespace '||l_tblspace||
      ' add datafile size 1m autoextend on next 8m';
    ELSE
      EXECUTE immediate 'alter tablespace '||l_tblspace||
      ' add tempfile size 128m autoextend on next 128m';
    END IF;
  END IF;
EXCEPTION
WHEN OTHERS THEN
  SYS.DBMS_SYSTEM.KSDDDT;
  SYS.DBMS_SYSTEM.KSDWRT(2, SQLERRM);
  SYS.DBMS_SYSTEM.KSDFLS;
END;
/
and to run a job every let's say 5 minutes, which would call such procedure.

Why take only one row from DBA_RESUMABLE? I assume there is not too much such events at a moment and the second one would be serviced with next job run.

Why cut a tablespace name from ERROR_MSG and not from ERROR_PARAMETERx? I found that in some circumstances (different db version for example) the ERROR_PARAMETERx are set differently or not at all. For 11.2 IIRC the tablespace name was set in ERROR_PARAMETER4.

Tuesday, 7 August 2012

Recycle bin and dropping tablespaces

Today I hit an interesting issue. First the instance have not shut down within an hour. Second I could not drop a schema. The reason seems to be an entry in the recycle bin left after some table, which was bound with a tablespace, which in turn has been dropped apparently some time ago. So no tablespace, while the entry left and dropping the schema or purging recycle bin have failed with error in recursive SQL.

The solution quite simple - creating a new tablespace with the old name, then purging the recycle bin again .

Thursday, 19 July 2012

Log rotation in 11g

Till version 11g the log rotation may been easily done by lsnrctl and set log_file to new name. With advent of ADR if used it blocks this way.
Excellent article on the theme IMHO at http://msutic.blogspot.com/2009/06/truncating-rotating-flushing.html.

Wednesday, 18 July 2012

Su-like technology for Oracle db

From time to time I meet with the situation when it comes to perform many changes on many db schemas provided as a set of scripts.
  • The simplest way to do this is to logon sequentially to chosen schema and perform there all the necessary changes. This attitude has some drawbacks:
    1. one needs to know a password to every schema visited
    2. one needs to perform authentication to db many times
    3. one needs to run many scripts
  • first improvement is to create master script, use CONNECT <<user>> command of sqlplus - but this still do not resolve providing passwords
  • so we may create a more powerful account, gathering all the needed privileges and use
    ALTER SESSION SET CURRENT_SCHEMA=<<user>>; instead of CONNECT - this is a step in right direction and as long as scripts use simple objects and functionalities this way addresses all the inconveniences. Another ability is to prefix all the used objects.
  • still yet we are far from perfectness - there are some functionalities which work wrong - the simplest example here is usage of private db links. Here we may use so called proxy users. Starting from 10gR2 these are available through sqlplus. We don't need a powerful user. We return back to use CONNECT, this time slightly modified. First we need an account with CREATE SESSION privilege. We need to alter all schemas, which we plan to make available:
    ALTER USER <<user>> GRANT CONNECT THROUGH <<proxy user>> (we may revoke it with REVOKE CONNECT THROUGH). This time we connect to any so configured user through
    CONNECT <<proxy user>>[<<user>>]/<<proxy user password>>. With this addressing all the problems is easy - we need to authenticate as only one user, so even though reconnecting many times, may use sqlplus variable substitution in order to provide a password only once.
There exists the special role BECOME USER, which apparently allows for similar behavior, however it is used for internal processing bound with Data Pump technology and rather not for use for "e;end users"e;.

Tuesday, 5 June 2012

Enabling Real Time Apply on open standby (with VPD)

Actually nothing special. One needs to stop the recovery, open standby in READ-ONLY mode and one again start the recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
ALTER DATABASE OPEN;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
There is however one trick - one needs to do this rather quick. It seems (at least for me) that even one archivelog behind the most actual means the managed replication will not start, waiting apparently on some flag indicating the end of previously written log. Thus copying to standby needed logs does not mean, the standby will apply them. It seems a condition is here that archivelog on standby should be "in transition" and the same log being current on primary.

UPDATE from 2013-06-20
It seems I missed some apparent (and specific for this configuration) facts. First of all the database in question despite of having its standby as Active Data Guard has also implemented some VPD policy, which corresponds to a trigger triggered on logon.

The starting position is the following:
  • primary database up and running, log_archive_dest_state_2 equals to defer
  • standby database shut down, few archivelogs behind the primary, FAL server indicated

When the standby is mounted or started it tries to connect to FAL server. In my case this connection fails due to the trigger mentioned above as it throws an exception for some reason. If the standby is mounted then enabling log_archive_dest_2 (ie setting log_archive_dest_state_2 do enable) on the primary makes the primary to connect to the standby. Those connections succeed. Now no matter what I will do (open the standby in READ ONLY mode and then start the Real Time Apply or start it on the mounted standby) the managed recovery will succeed - after recovery start the ARCH processes provide not yet transferred archivelogs, the recovery process (MRP0) applies them, and then (assuming USING CURRENT LOGFILE clause) switches to communication with a primary LNS process in order to apply directly from memory not waiting for the logfile switch.
But if I would open the standby directly or do not enable log_archive_dest_2 when the standby stays in MOUNTED mode then all the connections from the primary would fail as the FAL connection from the standby to the primary. In such case there is impossible to open the communication and enable successfully the replication - the MRP0 process will start without problems, yet the archivelogs will not be transferred to the standby.
Then the only solution I've found is to shut the standby down and then mount it, ensure the communication with the primary is enabled (simple switching log_archive_dest_state_2 to defer, then to enabled again resets the connections and the messages in the standby alert log will tell if it works (no messages or error messages signal the problem)) and now (optionally) open, then start the managed recovery.

All my previous hypotheses are apparently wrong. There is no need to be at the same sequence on the primary and the standby, so the time after which the recovery is started is irrelevant in this context. The only thing important is to make connections during mount phase (as they are impossible to acquire when any database is in open mode).

Monday, 4 June 2012

Error ORA-01031/ORA-00604 on communication with standby

When I have hit the ORA-01031 error on version 10g, the cause was usually boring - badly configured connection primary<->standby, lack of password file, improper permissions on such file, etc. With 11g version the trivial error become more enigmatic. I checked connections, copied the password file from the primary and still got ORA-01031 on connection from the primary to the standby (and ORA-00604 on the connection from the standby to the primary). Finally under the link https://forums.oracle.com/forums/thread.jspa?messageID=10350931 I have found quite good explanation to my problem.
With the advent of 11g version DEFAULT profile has got some limitations - among others expired passwords. Due to this passwords go through grace time period, when "normal" user sees a message about grace time period. But automatically controlled connection to/from a standby gets lost with this behavior, while returned errors do not indicate the right solution, which is simple of course. One may
  • change a password (to solve temporarily)
  • or change the DEFAULT profile
  • or change SYS profile to less restrictive
One must remember that after any of those operations the current logfile needs to be applied on a standby in order to see a positive change. After that automatic redo shipping started to work properly.

UPDATE on 2013-06-20
It seems that despite possible problems with passwords (which may arise with similar symptoms) in my particular case the true reason behind the ORA-1031/ORA-604 errors is the local coniguration - on the database is enabled VPD technology and an ON LOGON trigger to enforce the required context settings. The connections are successful only if done from a primary to a mounted standby. More on the subject in another article.

Friday, 25 May 2012

Installation of Oracle RDBMS on Oracle Linux 6

We install usually 2 Oracle products on database environments - RDBMS itself and Grid Control agent. The RDBMS binaries installation actually brings no problems - of course one needs X installation, which may be more than cautious sysadmin would like to install. The Grid agent is other thing - there is number of additional packages to install in order to make the installation process smooth. The best way on Oracle Linux 5 was to install package oracle-validated, which depends on few other packages, which fully cover the agent needs. With Oracle Linux 6 the package, which seems to be needed for that purpose is named oracle-rdbms-server-11gR2-preinstall (R must be in uppercase). However it seems it is not enough - additionally:
  • yum install libXp.so.6
  • yum install libXt.so.6
  • yum install libXtst.so.6
Despite that another problem is I try to install the agent in 10.2.0.5 version, so must have used -ignoreSysPrereqs option. And now failure - some errors in linking. I will switch to 11.1.0.1.