Thursday 4 December 2014

Problem with OPatch

Today I hit a problem with installation of a patchset. Details below:
[oracle@yyy]$ opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./19121548
Oracle Interim Patch Installer version 11.2.0.3.5
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

PREREQ session

Oracle Home       : /oracle/product/11.2.0.3
Central Inventory : /oracle/oraInventory
   from           : /oracle/product/11.2.0.3/oraInst.loc
OPatch version    : 11.2.0.3.5
OUI version       : 11.2.0.3.0
Log file location : /oracle/product/11.2.0.3/cfgtoollogs/opatch/opatch2014-08-08_13-19-37PM_1.log

Invoking prereq "checkconflictagainstohwithdetail"
List of Homes on this system:

  Home name= OraDb10g_home1, Location= "/oracle/product/102"
  Home name= OraDb11g_home1, Location= "/oracle/product/11.2.0.2"
  Home name= agent10g1, Location= "/oracle/oagent/product/10.2.0/agent10g"
Prereq "checkConflictAgainstOHWithDetail"not executed 
Unable to create Patch Object.
Exception occured : null

OPatch succeeded.


Colleagues of mine suggested to rebuild the Oracle inventory.
[oracle@yyy bin]$ ./runInstaller -silent -invPtrLoc $ORACLE_HOME/oraInst.loc -attachHome ORACLE_HOME="/oracle/product/11.2.0.3" ORACLE_HOME_NAME="Ora11gR2home"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 1239 MB    Passed
'AttachHome' failed.
'AttachHome' failed.


The solution is quite simple. First one have to remove any remnants of current Oracle inventory. Then the inventory operation starts to work properly. With the inventory rebuild the OPatch also starts to work properly.

Thursday 6 November 2014

ORA-4030 and huge PL/SQL block

Lately we have got for execution a quite huge anonymous PL/SQL block (~10M). When it went for execution it failed with ORA-4030. There are limits to the size of a PL/SQL block. They may be rised at the OS level or in the database (look at "ORA-4030 (PLSQL Opt Pool,pdziM01_Create: New Set), ORA-4030 (PLS CGA hp,pdzgM64_New_Link)" (Doc ID 1551115.1) on the Metalink).

For example on the OS level the /proc/sys/vm/max_map_count parameter sets the number of the maximum number of memory map areas a process may have for Linux.
But before You will mess with those settings, rethink if it is really necessary - most possibly there is another way to run the equivalent code, which makes a better use of the memory in disposal.
In this very case the 10M block consists of 1mln of calls to a procedure with different parameters provided as literals. Much better way in this case would be to load all the parameter values to a table (by sqlldr to persistent table or by simple inserts in one session to a global temporary table with rows preserved on commits) and then call a procedure by much, much smaller script on values from a cursor or get rid of PL/SQL block, turn procedure to function and call it in simple SELECT.

Friday 24 October 2014

Hash (#) sign and block comment in SQL*Plus

Today I hit a weird (or on the second thought actually completely normal) behaviour of SQL*Plus utility in certain very special way of events. I have got a package for review and here an excerpt, which brought my attention
create or replace package body SOME_PKG
[..]
/*####################################################
some comment
###################################################### */

procedure PROC_NAME(arg1 VarChar2) IS
[..]
When I call it to create on a test environment, I have got:
SP2-0734: unknown command beginning "##########..." - rest of line ignored.

As this file was created on a Windows environment I thought initially some hidden characters are obstacle or EOL signs. But not. It appears that when one works in editor mode of SQL*Plus, the hash sign (#) if being set as a first character in line, is interpreted as an opening of command line and all text after it is immediately interpreted. E.g.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> create or replace procedure test1 as
  2  /*
  3  # select 1 from dual;

         1
----------
         1

SQL> */
SP2-0042: unknown command "*/" - rest of line ignored.
SQL>
With block comment it turns into a deadly combo ;-) - one may think it is absolutely ok (and it works with let's say SQL Developer) but with SQL*Plus it fails (the closing of block comment may be ommited as in the example above) and then random errors appear.

Monday 20 October 2014

ORA-600 [ktbdchk1: bad dscn]

This weekend I was responsible for a switchover of one of our databases. All went fine until the rman backup check - I discovered then that again I hit the bug described here. As I knew it already, I was confident I will soon go to sleep, but then I once again looked at the alert log and found there
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []


This is a database on 11.2.0.2 version, which is very important here. On first sight I have found that exists on the Metalink a bug (Bug 8895202: ORA-1555 / ORA-600 [ktbdchk1: bad dscn] ORA-600 [2663] in Physical Standby after switchover (Doc ID 1608167.1)), which happens after switchover, but version 11.2.0.2 covers also a patch for this one. And there are no other descriptions for that version, so there are only some suggestions, what to do to make the possible bug finding easier.
However when I read everything I have found that the fix is disabled by default. So I enabled it according to the article above and even though I was not sure if this is it (the trace messages were different than the ones provided in Metalink), it helped.
Please keep in mind the fix may blow up some queries, so one have to enable also another parameter (Bug 13513004 - ORA-600 [ktbgcl1_KTUCLOMINSCN_1] on blocks fixed up by 8895202 (Doc ID 13513004.8)).

Friday 26 September 2014

TCP connections and ORA-30678

Just a while ago we had an issue with ORA-30678 on one of test databases.

Scenario

The scenario looks like the following:
  • there exists a script, which reacts for certain events by connecting to the database and calling a procedure, which among other things sends also a notification email
  • the script keeps persistent connection to the database
  • at some occasion the script was received wrong parameters
  • it make a call to procedure, which opened SMTP connection
  • due to wrong parameter value the procedure threw an exception, which apparently was catched at some level (may be on the db level, but I suppose it simply returned to the script level, which ended a loop turn and started another one with the advent of a new triggering event)
  • it seems that there is a built-in limit set to 16 TCP connections per session in Oracle - apparently 15 more event loop turns were performed
  • at every next loop turn the ORA-30678 was threw

What was wrong?

Apparently the code was wrong. It worked well, but only for good parameters. The lack was exception handling, which have to close any TCP connection at the end of run no matter the communication outcome. Of course good programmer will log the exception circumstances, etc., but the cleanup is crucial here. It is important for all the packages, which use TCP connections: UTL_SMTP, UTL_HTTP, UTL_TCP

Wednesday 24 September 2014

ACLs modification

In fact this is a very tiny observation.
When one deals with DBMS_NETWORK_ACL_ADMIN it is necessary to call COMMIT to confirm the changes - rather obvious, but sometimes some things end with implicit commit.

Wednesday 16 July 2014

GLOBAL NAME summary

I have always wrongly understood this stuff with global names. Even if dig something up, shortly after the use I forget. So here is a summary.

GLOBAL NAME definition

It is fully taken from the Oracle documentation (Understanding How Global Database Names Are Formed).
A global database name is formed from two components: a database name and a domain. The database name and the domain name are determined by the following initialization parameters at database creation:
  • DB_NAME
  • DB_DOMAIN
The DB_DOMAIN initialization parameter is only important at database creation time when it is used, together with the DB_NAME parameter, to form the database global name. At this point, the database global name is stored in the data dictionary. You must change the global name using an ALTER DATABASE statement, not by altering the DB_DOMAIN parameter in the initialization parameter file. It is good practice, however, to change the DB_DOMAIN parameter to reflect the change in the domain name before the next database startup.
The place, where the GLOBAL NAME is stored is the table SYS.PROP$. You may also display it with a call to the view called GLOBAL_NAME (there is also a global synonym to this view with the same name).
One more note - in commands specific for db link name (like CREATE DATABASE LINK, ALTER DATABASE RENAME GLOBAL_NAME, etc. where the db link name is used as identifier and not as the value one should use double quotes. However surprisingly a little the name is case-insensitive i.e. one may use "ORCL.WORLD@ANOTHER_USER" as well as "orcl.WORLD@ANOTHER_USER" - the call will work

SQL> SELECT * FROM props$ WHERE name='GLOBAL_DB_NAME';

NAME            VALUE$   COMMENT$
--------------- -------- --------------------
GLOBAL_DB_NAME  EXAMPLE  Global database name

SQL> SELECT * FROM global_name;

GLOBAL_NAME
---------------------------------------------
EXAMPLE

GLOBAL NAME change

I spotted 2 ways to achieve it (DbForums and others):
  1. ALTER DATABASE RENAME GLOBAL_NAME TO "[new global name]";
    IIRC this way has some limitations - e.g. if domain was once defined, it always will be added, thus no way to remove it completely
  2. because this parameter is stored in a table, it is possible to update it with simple SQL
    UPDATE sys.prop$ SET value$='[new global name]' WHERE name='GLOBAL_DB_NAME';
    -- update on view also works IIRC
    -- UPDATE global_name SET global_name='[new global name]';
    COMMIT;
    
    This way we may set the GLOBAL NAME to anything we want. However this is messing with the db dictionary, so a little frightening and there is a Metalink note id 1018063.102, when the change hangs. In short when one once starts to update the GLOBAL NAME manually, have to do it this way further.

GLOBAL_NAMES parameter

The GLOBAL_NAMES parameter enforces naming db links as GLOBAL NAME of source databases. The parameter is of Boolean type and may be set globally or in a session.

With the GLOBAL_NAMES enabled one may create many connections to the same database (for example differing in credentials; AskTom.oracle.com) by using @ sign, e.g.
-- most probably You need here double quotes around the db link name
-- at least I have got ORA-00933: SQL command not properly ended if not use them
create database link "ORCL.WORLD@SCOTT_USER"
connect to scott identified by tiger
using 'orcl.world';

create database link "ORCL.WORLD@ANOTHER_USER"
connect to another identified by lion
using 'orcl.world';

select 1 from dual@"ORCL.WORLD@ANOTHER_USER";

Monday 7 July 2014

Locating data files on standby

Problem

The db_create_file_dest parameter indicates the default directory in which an Oracle RDBMS instance creates new files according to the OMF rules.
The scenario considered is this:
  • we want to add new files
  • despite the default location we have also additional one built upon lesser storage and designated for archive purposes
  • there is also standby configured also with OMF, and file paths differ between both instances
The problem is how to place files in the archival location on standby with least effort?

Solution 1

One simply:
  • adds data files on the primary with explicit specification of the file path - the files are added on primary to the right location, on the standby to the default one (instead of the archivel one)
  • lists recently added files identifiers - this may be a mundane task especially if one adds data files to already existing tablespaces or is unable to provide simple condition on tablespace_names - in general this may be done on the dba_data_files or v$datafile view.
  • next one creates a script with move operation
    -- first part is to move data files at the OS level
    select 
      'mv '||name||' [archival path]'||substr(name, instr(name, '/', -1))||';' bash1 
    from v$datafile where file# in ([ids list]);
    
    -- second part is to rename files at the db level
    select 
      'alter database rename datafile '''||name||''' to ''[archival path]'||substr(name, instr(name, '/', -1))||''';' sql1 
    from v$datafile where file# in ([ids list]);
    
    The best way of course is to format and spool the answers of those queries to scripts (or one script)
  • now one have to switch off the managed standby mode (alter database recover managed standby cancel;)
  • the next step is to switch standby_file_management to MANUAL (alter system set standby_file_management='MANUAL';)
  • further one have to perform the core commands - run scripts generated earlier and move files to the final location and rename the paths at the database level
  • the last steps are switching to AUTO mode again (alter system set standby_file_management='AUTO';) and starting the managed replication (recover managed standby database using current logfile disconnect)


Solution 2

I suppose this one is much more clever than the first. On the standby one simply changes temporarily the default OMF path to the archival location - of course there are elements we have no influence on like db_unique_name or datafile literal within the path, but these are of small importance and we may be prepared for it, if we assume earlier such scenario. Then it is enough to add data files - the standby will automatically place them into the location of choice - of course we have to be sure, no one adds other files at the moment. After the operation it is enough to set db_crate_file_dest back to previous setting.

Thursday 26 June 2014

How to handle a subset of data with Data Pump (external link)

Lately I found a great article in the theme http://dataworks-inc.com/import-a-subset-of-table-columns-using-oracle-data-pump, which nicely fulfils a common sense ways like creating mviews (or even views when 12c is in context) or load firstly to temporary tables and then copy to or exchange with the target objects.

Friday 13 June 2014

Fun with parameter file and VIM

Here small anecdote rather than something dealing with technical problems.
I had created a parameter file in VIM (which is important) with a QUERY parameter specified. Nothing fancy. I opened ready file in one VIM and copied to another
dumpfile=REQ_N2521_01.DMP 
logfile=REQ_N2521_01.log 
 
tables= 
[some tables]
 
query= 
[some limits]
~                                                                               
~                                                                               
~         
I've run it and got:
ORA-39001: invalid argument value
ORA-39035: Data filter SUBQUERY has already been specified.
and started to look for error. And did it for some time. Initially without success - it is simple, right?

The reason was the addition of tilda characters at the end of file - because this way are represented non existent lines in the VIM if the file content is shorter than the terminal, I did not spot it at once, while had copied them beforehand unintentionally to the new file - thus it seemed perfectly normal.

Thursday 5 June 2014

Mysterious problem with archivelog replication

The starting points to this short story are the following:
  • we have a system with active-passive configuration
  • lately one of us made a failover to another node
  • the configuration actually not changed and used for some time - anyway we checked it few times and there were no errors
  • the standby was recreated
  • local application of archivelogs on a standby was successful

The problem was the primary database did not send the archivelogs to the standby.
The entry in V$ARCHIVE_DEST_STATUS indicated wrong unique name in the configuration. Not sure if recall correctly, but I think it was ORA-16053 listed there. We checked the configuration at least few times by few pairs of eyes and not spotted anything wrong.
There was hanging one of ARCH processes on the primary, which tried to send an archivelog from before the failover incarnation, so we suspected it may be it, but killing it did not change anything.

However even though the real cause stays mysterious, the solution has been quite obvious and strightforward. A collegue of mine configured another archive destination - we disabled number 2 and enabled number 3 - the system return to work properly.

Thursday 29 May 2014

Zabbix 2.2.3 and ZBX-7373


Migration from 2.0.2 to 2.2.3

Lately we have upgraded the Zabbix installation from version 2.0.2 to 2.2.3. The installation run quite smoothly (some minor issues were all bound with our own optimizations) - we did it in around half an hour. So far so good, but when we launch it, we face the high IO load and growing at the steady pace until the whole system was actually out of service.

Diagnostics

As we have already upgraded the database, the return to the previous version would take us several hours, during which we would be quite blind in regard to production operations. So we started to look for the reasons.
  • in the Oracle db there is a concept of services. A service is a name registered on listener and used for identification of a particular database instance. For version 2.0.2 we run the system on 2 service names: zbx_gui and zbx_server. This is easy to do for Zabbix as it actually consists of few components among which there is gui, server and others.
    We spotted that all the IO intensive operations came from the gui part of Zabbix. So we disabled the gui to confirm this observation. The whole system immediately return to proper work. I may add that it does even better than with previous version and assume this is due to use of bind variables - it takes less CPU (though enabling CURSOR_SHARING to FORCE)
  • hopefully the problem has been located in the Zabbix part, which is done in PHP, so the fix is not so troublesome as in case of the part in C (no recompilation, etc). From the monitoring we knew that the most troublesome query was the one, which we later found in the api part. It is run for history tables, which are the biggest entities in the Zabbix database.
    class CHistoryManager {
    
     /**
      * Returns the last $limit history objects for 
      * the given items.
      *
      * @param array $items  an array of items with 
      * the 'itemid' and 'value_type' properties
      * @param int $limit
      *
      * @return array    an array with items IDs as keys 
      * and arrays of history objects as values
      */
     public function getLast(array $items, $limit = 1) {
      $rs = array();
      foreach ($items as $item) {
       $table = self::getTableName($item['value_type']);
       $query = DBselect(
        'SELECT *'.
        ' FROM '.$table.' h'.
        ' WHERE h.itemid='.zbx_dbstr($item['itemid']).
        ' ORDER BY h.clock DESC',
        $limit
       );
       while ($history = DBfetch($query)) {
        $rs[$history['itemid']][] = $history;
       }
      }
    
      return $rs;
     }
    

    My colleagues have found also that similar bag was registered under the ZBX-7373 codename in the Zabbix bug repository.
  • we started to look into details of this query in order to find how to improve it for better performance and here the details of the analysis:
    • we checked the following query
      select * from 
        (select * from ZABBIX.HISTORY H where H.ITEMID=:Z 
           order by H.CLOCK desc) 
      where rownum<2;
      
    • here is the plan for the query
      Plan hash value: 168015457
       OPERATION   OBJ NAME   COST   CR BUFFER GETS   PARTITION START   PARTITION STOP 
      SELECT STATEMENT
         64694       
           COUNT
           10049     
                Filter Predicates 
                   ROWNUM<:SYS_B_0 
                PARTITION RANGE 
         64694   10049   1048575   1 
                    VIEW 
         64694   10049     
                        SORT 
         64694   10049     
                            Filter Predicates 
                                ROWNUM<:SYS_B_0 
                            TABLE ACCESS 
       HISTORY   64277   10049   1048575   1 
                                INDEX 
       P_HISTORY_1   433   70   1048575   1 
                                    Access Predicates 
                                        H.ITEMID=TO_NUMBER(:Z) 

      Predicate Information (identified by operation id):

      1 - filter(ROWNUM<2)
      4 - filter(ROWNUM<2)
      6 - access("H"."ITEMID"=TO_NUMBER(:Z))

Solution

  • we assumed the query simply takes to much blocks, even when optimized with the index range scan. The solution is to limit clock values from the bottom. From start we tried the fixed value of 2 days, but this was both inflexible and wrong in certain cases. Finally we go with the limit which is based on the delay value from the items table.
    Here goes the query in final shape:
    select * from (
    select * from ZABBIX.HISTORY H where H.ITEMID=:Z 
    and CLOCK>(cast((systimestamp at time zone 'UTC') as date)
      -TO_DATE('1970-01-01', 'yyyy-mm-dd'))*86400
        -1*(select delay from ZABBIX.ITEMS where ITEMID=:Z)
    order by H.CLOCK desc
    ) where rownum<2;
    
    -- update 2014-05-30
    -- the query above does not take into consideration items with delay equal to 0
    -- a possible solution to this issue below
    -- Please keep in mind, the query is not thought to service every scenario - e.g. 
    -- the asynchronous
    select * from (
    select * from ZABBIX.HISTORY H where H.ITEMID=:Z 
    and CLOCK>(cast((systimestamp at time zone 'UTC') as date)
      -TO_DATE('1970-01-01', 'yyyy-mm-dd'))*86400
        -(select 
          case 
            -- complex schedule
            when type=0 and delay=0 then 1*50
            -- no schedule, async 
            when type=2 and delay=0 then 3*86400
            else 1*delay
          end delay from ZABBIX.ITEMS where ITEMID=:Z)
    order by H.CLOCK desc
    ) where rownum<2;
    
  • here goes the plan for the final query:

    Plan hash value: 1183558203
     OPERATION   OBJ NAME   COST   CR BUFFER GETS   PARTITION START   PARTITION STOP 
     SELECT STATEMENT 
       609       
          COUNT 
         8     
              Filter Predicates 
                  ROWNUM<:SYS_B_5 
              PARTITION RANGE 
       609   8   1048575   KEY 
                  VIEW 
       609   5     
                      SORT 
       609   5     
                          Filter Predicates 
                              ROWNUM<:SYS_B_5 
                          TABLE ACCESS 
     HISTORY   606   5   1048575   KEY 
                              INDEX 
     P_HISTORY_1   31   4   1048575   KEY 
                                  Access Predicates 
                                      AND 
                                          H.ITEMID=TO_NUMBER(:Z) 
                                          CLOCK>(CAST(SYSTIMESTAMP(6) AT TIME ZONE 'UTC' AS date)-TO_DATE(:SYS_B_1,:SYS_B_2))*:SYS_B_3-:SYS_B_4* 
                                  TABLE ACCESS 
     ITEMS   2   3     
                                      INDEX 
     SYS_C0018758   1   2     
                                          Access Predicates 
                                              ITEMID=TO_NUMBER(:Z) 

    Predicate Information (identified by operation id):

    1 - filter(ROWNUM<2)
    4 - filter(ROWNUM<2)
    6 - access("H"."ITEMID"=TO_NUMBER(:Z)
      AND "CLOCK">(CAST(SYSTIMESTAMP(6) AT TIME ZONE 'UTC' AS date)
      -TO_DATE(' 1970-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))*86400
      -1* (SELECT "DELAY" FROM "ZABBIX"."ITEMS" "ITEMS" WHERE "ITEMID"=TO_NUMBER(:Z)))
    8 - access("ITEMID"=TO_NUMBER(:Z))

  • as the last argument let me provide a short example: for a randomly chosen item the unoptimized query takes 10049 consistent gets and next executions do not change anything, while the optimized query takes for the first time 158 consistent gets and in next executions only 11 per execution.
  • The solution presented here is universal assuming You use the Oracle RDBMS. To get a solution, which would be portable between databases one have to calculate the current unix timestamp in PHP and embed it into the query instead of the part "(cast((systimestamp at time zone 'UTC') as date) - TO_DATE('1970-01-01', 'yyyy-mm-dd'))*86400". The database have to support subqueries (if not the solution would be to call first for the delay value). At the time of upgrade there was no solution to the ZBX-7373, but now it was included into versions 2.3.1 and 2.2.1rc1. Anyway I belive our solution with subquery is more elegant to the one included in this bug.

Update after migration to the 2.4.4 version

Lately we upgraded zabbix to the newest 2 version (2.4.4) and here few remarks.
There were no major issues during the upgrade procedure. AFAIK we lost some filter settings (so there emerged hundreds of not important alerts) and as usually scheduler tasks were disabled. No performance issues at this point.
Fixing of the filters was quite simple - it required repetition of filter creation process.
Then after enabling scheduler tasks (based on the API) we spotted quite heavy I/O load. Replacing CHistoryManager with custom version (according to rules provided above, though we left the new period argument usage, made things work smooth again. Below new code for the getLast method - the signature is compliant with the 2.4.4 version
public function getLast(array $items, $limit = 1, $period = null) {
    $rs = array();
    $que1 = '(cast((systimestamp at time zone \'UTC\') as date)'.
        ' -TO_DATE(\'1970-01-01\', \'yyyy-mm-dd\'))*86400';
    $que2 = 'select case '.
        'when type=0 and delay=0 then '.$limit.'*60 '.
        'when type=2 and delay=0 then 3*86400 '.
        'else '.$limit.'*delay '.
        'end delay1 from ZABBIX.ITEMS where ITEMID=';
    foreach ($items as $item) {
        $values = DBfetchArray(DBselect(
            'SELECT *'.
            ' FROM '.self::getTableName($item['value_type']).' h'.
            ' WHERE h.itemid='.zbx_dbstr($item['itemid']).
            ($period ? ' AND h.clock>'.(time() - $period) : '').
            ' and h.clock >= '.$que1.' -('.$que2.zbx_dbstr($item['itemid']).')'.
            ' ORDER BY h.clock DESC',
            $limit
        ));

        if ($values) {
            $rs[$item['itemid']] = $values;
        }
    }
    return $rs;
}