Thursday, 7 May 2015

Source of great tech papers (external link)

Cary Millsap's company Method-R published some tech papers - really interesting thing.

Wednesday, 6 May 2015

Asynch descriptor resize (external link)

Excellent analysis of the event in cases it is seen high in query time by Tanel Poder.
We have experienced this behaviour on some databases, but usually it is bound with a particular session, so we used to kill and repeat the action.

Thursday, 2 April 2015

Zabbix and Oracle DRCP

Here short note on possible usage of database resident connection pool feature with Zabbix. This feature allows for pooling database server processes (so they stay ready to use in opposition to standard dedicated processes, which are created at the connection start and destroyed at the end). While at the beginning it seemed to be a interesting issue, it has limited impact on the Zabbix performance.
The main reason for such situation is the fact the Zabbix server keeps persistent connections to a database - in such case DRCP would be a source of problems rather than a helper. Thus only the GUI part may be a subject to such configuration. Furthermore in future versions of Zabbix the database logic is planned to be located in the API, which in turn will be a part of the Zabbix server as well (while the GUI will call the API), and with such architecture the DRCP will be no longer an option.
Anyway this is still interesting feature assuming one makes an extensive use of the GUI, while on version below 3.X
-- this call creates a pool on a database side - first argument will 
-- become a pool name in the future (currently there is only 1 pool)
begin
  dbms_connection_pool.configure_pool(
    null, minsize=5, maxsize=20, inactivity_timeout=>300, max_think_time=>600);
end;
/
-- this call starts the pool
exec dbms_connection_pool.start_pool();
-- this call stops the pool
exec dbms_connection_pool.stop_pool();
-- calls like the one below allows pool parameters change
begin
  dbms_connection_pool.ALTER_PARAM(
    null, param_name=>'session_cached_cursors', param_value=>'50');
end;
/

Monday, 9 March 2015

Jammed AQ notification

Problem definition

One of frequently performed activities in my job is preparing a test environment based on data from production. This is quite lengthy task as the main goal is to create a database as close to a productional one as possible. There are of course some differences - cheaper platform is one of them, another one is limited data volume, yet another limited number of external systems - in turn they implicate further differences.
Thus the most effective way to achieve such goal is to leverage the Data Pump technology and number of home made scripts for partially "manual".
From time to time, while creating such environment, shows up a problem with the automatic AQ notification feature, which is used quite extensively, so we have to get it working properly. One of scenarios, when it comes to the AQ notification jam is the following:
  • we try to check the AQ notification health
  • we run a script, which creates a demo queue and put there a message
  • however during load we implicitly get the situation described here
  • the notification job fails
  • we (i.e. AQ) are stuck
After such sequence of events further enqueues to the AQ queue, which should run automatic callback, fail. I am unable to provide exact reason as there may be a plenty of them - old messages left in notification queue (SYS.AQ_SRVNTFN_TABLE_1), error messages in corresponding error queue (SYS.AQ$_AQ_SRVNTFN_TABLE_1_E), automatic AQ callback job failure due to ORA-28031 and some internal reminiscences after that and so on, but I provide here a sequence of activities to fix it.

Solution

The main goal of the solution is to get rid of the old messages in the queue. First one may look for a post like this one, which tells us, what to do with messages, which are moved to the error queue. In general it is necessary to enable the error queue for dequeue and get all the messages from there.
Next step is to run very similar piece of code to dequeue "manually" all the messages from notification queue - after that the AQ notification works like a new. Additionally I restarted the database, but not sure if that was important. It may be possible You find a locking problem, when dealing with queues - in my case there was a job, which was fired on the notification queue and I was unable e.g. to enable the error queue for dequeue - I managed with it by killing the offending session.

Friday, 20 February 2015

Diag policies (external link)

Link

Some time ago I googled great article by Gavin Soorma (http://gavinsoorma.com/2010/09/purging-trace-and-dump-files-with-11g-adrci). It emphasizes the existence of policies build for information in diagnostics_dest location.

Example usage

#!/bin/bash
## The script sets trace/diagnostics policies for test environments
SHORT_POLICY="SHORTP_POLICY\=308" #in h (14 days)
LONG_POLICY="LONGP_POLICY\=2400" #in h (100 days)

if [ ! $ORACLE_BASE ]; then
  #echo "no ORACLE_BASE defined\n"
  #exit
  oratab='/etc/oratab'
  os=`uname -s`
  if [ $os = 'SunOS' ]; then
      oratab='/var/opt/oracle/oratab'
  fi
  ## or rather read only interesting lines
  readarray -t -u $oratab oratab_entries
  for entry in oratab_entries;
  do
    if [ ${entry:0:1}!='#' ]; then
      ## set sid
      export ORACLE_SID=${entry%%:*}
      ## get the diag_dest
      diag_dest=`sqlplus -S / as sysdba <<-CMD1
set feed off head off
select value from v$parameter where name='diagnostic_dest';
CMD1`
      if [ $diag_dest ]; then
        break;
      fi
    fi
  done;
else
  diag_dest="$ORACLE_BASE/diag"
fi
set -x
echo $diag_dest
cd $diag_dest
for h in `adrci exec="show homes" | grep -v "ADR Homes:"`;
do
  adrci exec="set home $h; set control \($SHORT_POLICY,$LONG_POLICY\)"
done;

Tuesday, 13 January 2015

PSU installation issues - example on AIX

Here short report from today's problems with a PSU installation. It consists of 3 commands actually. Firstly I run usually
opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./
to find any problems opatch is able to figure out without running the patch application. The next step assuming all is ok is to run patch application and quite frequently update of database dictionaries
opatch apply
# then within the SQL*Plus run the patched database and call e.g.
@?/rdbms/admin/catbundle.sql psu apply


OPatch can not find patch files

Opatch failed with an error code 73 (which is quite generic). It provided some details presented below:
Failed to load the patch object. Possible causes are: The specified path is not an interim Patch shiphome Meta-data files are missing from the patch area Patch location = /oracle/dist/17082367 Details = PatchObject constructor: Input file "/oracle/dist/17082367/etc/config/actions" or "/oracle/dist/17082367/etc/config/inventory" does not exist. ApplySession failed: PatchObject constructor: Input file "/oracle/dist/17082367/etc/config/actions" or "/oracle/dist/17082367/etc/config/inventory" does not exist. System intact, OPatch will not attempt to restore the system
The reason for this problem is the following. The patch 17082367 is in fact a PSU, so all the patches' files and structures are placed in subfolders, while the opatch looks for them in the parent directory. The solution in my case was to download a newer OPatch version.

OPatch can not copy a library file (AIX)

Another problem arouse with a message:
Copy failed from '/oracle/dist/17082367/13696224/files/lib/libnnz11.so' to '$ORACLE_HOME/lib/libnnz11.so'...
Here important thing is this is the AIX operating system. And frequent thing on AIX is libraries are somewhat locked when loaded in memory. The solution here is to run slibclean as root.

There is too less storage space

It is quite important to check the available storage volume. Fortunately the version of OPatch I used checked that for me. The solution was to clean up to make something like ~500G of free space.

Warnings (AIX)

Common thing on AIX when installing some patches is to get a number of warnings about warnings e.g.
OPatch found the word "warning" in the stderr of the make command. Please look at this stderr. You can re-run this make command. Stderr output: ld: 0711-773 WARNING: Object /oracle/app/11.2.0.2/lib//libgeneric11.a[sdbgrfu.o], imported symbol timezone Symbol was expected to be local. Extra instructions are being generated to reference the symbol.
We ignore them actually.

Friday, 9 January 2015

How to reuse already restored datafiles when recovering database with datafiles rename

Problem

It happens few times a year I have to prepare some test environments, which are complete copies of production databases. Because in such situation everything is compliant with a production environment, I do not have to deal with architecture change so the best way is simply to make a use of RMAN backups, which are stored on tapes somewhere within a backup system.

The task is then quite easy. I produce something like the following:
CONNECT target /
CONNECT catalog rman/xxx@rman_catalog
SPOOL LOG TO restore_db_20141202.log
RUN {
ALLOCATE CHANNEL t1 TYPE 'SBT_TAPE' FORMAT '<backup file name format>' 
PARMS 'ENV=(<backup system specific options>)';
SET UNTIL TIME "to_date('2014-11-30 00:11:00','yyyy-mm-dd hh24:mi:ss')";
RESTORE CONTROLFILE;
ALTER DATABASE MOUNT;
sql "ALTER DATABASE DISABLE BLOCK CHANGE TRACKING";
SET NEWNAME FOR DATABASE TO NEW;
RESTORE DATABASE;
SWITCH DATAFILE ALL;
SWITCH TEMPFILE ALL;
RECOVER DATABASE DELETE ARCHIVELOG MAXSIZE 90G;
RELEASE CHANNEL t1;
}
As You may see I rename all the database files to something new. I do not do this everytime, but sometimes I have on such new environment differently named mount points or sometimes a production database has some history so not all files are kept on the same mount point, while on test environment this is convenient to calculate whole database size, allocate such amount in one resource and mount it under one location.
In this script this location is not specified explicitly - I 'd rather like to set db_create_file_dest parameter of instance - thus also a possibly newly created files are placed on the same path.

Then I run the script and very often I do not have to do anything until the process will end successfully. Yet from time to time shit happens :-( and 9 of 10 cases in our shop it is a temporary problem with the backup system or at least with a communication with it - in case of small databases this would not be a big deal, but first I never met it and second even starting from scratch for a 50G means only a part of an hour.
The real problem is with quite big databases (few TB) and we have one case where it is almost frequent, where the connection to the backup system is done through LAN (in opposition to SAN). Such operation takes at least few days and after few TB restored, while still in a restore phase, we meet a crash.
Some of datafiles are fully restored, some partially, some not at all and yet in v$datafile exist only old file names and for a first sight there is no mapping between newly restored files and metadata content stored within the controlfile.
Below there is a short how to deal with this situation.

Solution

First of all You have to look into views v$datafile_header and v$datafile_copy. The v$datafile_copy should keep names of the newly created files (name is not null and completion_time><some date before running the restore and recover operation>) - then it is easy to create a script, which instead of switching to a new name would switch to the already restored files.
select
'set newname for datafile '||x.file#||' to '||x.descr||';' rman_cmd
from (
select d.file#, 'new' descr from v$datafile d
minus
select c.file#, 'new' descr 
from v$datafile_copy c where c.completion_time > sysdate-5
union all
select c.file#, ''''||c.name||'''' descr 
from v$datafile_copy c where c.completion_time > sysdate-5
) x
order by x.file#
Now we take the above output, paste it into previous script, remove restore controlfile and mount commands and run - the RMAN will check the restored datafiles status, keeps the fully restored intact while restoring those fuzzy or not-yet-restored. And that's it.