Thursday 15 December 2011

NFS and Oracle UTL_FILE

Today we faced a challenge in the shape of slow dump by UTL_FILE of rows to a file located on a NFS mounted storage. The dump runs quite fast on a test environment, but there the storage in use is a FC based storage array.
After initial code review we found that the UTL_FILE.PUT_LINE is followed immediately by a call to UTL_FILE.FFLUSH, so it become apparent, that every single row dump is immediately flushed to the NFS file. An educated guess (or intuition) pointed out that the code sends data in inefficient manner.

The NFS storage was mounted with the following options: rw,bg,vers=2,proto=tcp,sec=sys. Those settings mean the write window size to be 8K for version 2 of NFS. I have found among many articles by Glen Fawcett a tip to the 359515.1 article on the Metalink (anyway this is a blog worth of looking at). In our case the recommended settings were rw,bg,hard,rsize=32768,wsize=32768,vers=3,cio,intr,timeo=600,proto=tcp, so we went with those new settings. The test showed that with those new ones was even worse.

And now all become clear: in case of NFS the write buffer is sent through network as is, so one can write as much as want to, but the data will be divided into buffers of set size and pushed to a network. In our case we flushed 231-bytes worth portion of data through first 8K, then 32K write windows.
After that all went smoothly. As we could not have been able to change the code at once, we managed to decrease the write window size - first 512B, then even 256B. The test showed that 512B window is better - evidently the protocol overhead + our 231 bytes was more then 256B, so finally we stayed at wsize=512 with a goal to do something about this FFLUSH call in the future and return to 32K window.

Long story short do not call FFLUSH after every PUT_LINE (unless You know what You do in order for example to allow for reading by others as quick as You are able to write), or even better do not call it at all (as it is called implicitly after buffer become full).
And second thought: set Your rsize and wsize according to the data amounts You will read or write in a single call to Your NFS storage.

Tuesday 6 December 2011

BACKUP_TAPE_IO_SLAVES

Initial steps

Since today we start to enable the BACKUP_TAPE_IO_SLAVES parameter.
What has changed:
  • for a while when RMAN runs the RDBMS launches an additional process, which reports in alert.log, for example
    Running KSFV I/O slave I601 os pid=8282
    KSFV I/O slave I601 dp=f2e24c020 exiting
  • this process uses a memory allocated from LARGE_POOL (or SHARED_POOL if the former is not allocated) instead of private memory of the RMAN session process
  • all the RMAN IO operations to SBT device since then are seen as asynchronous ones (of course this is kind of "fake" asynchronous IO similar in shape to the architecture with DBWR_IO_SLAVES set to non-zero value)
  • thus the IO operations are reported in V$BACKUP_ASYNCH_IO (previously V$BACKUP_SYNCH_IO)
  • the following waits have a place here:
    • os thread startup (which is certainly a start of KSFV IO slave)
    • imm op (which is the event specific for RMAN with IO slaves in use)
    • io done (I found that this one is specific for synchronous IO operations, which may mean that this is wait reported by KSFV IO slave and that's why it is "fake" asynchronous IO)
    • control file sequential read
.
For both ways there are in use 4 buffers with 256K each. For relatively short operations (for example frequent backup of archivelogs) there is no difference to be seen.

Update (after 3 years of usage)

This time we go the other way round by switching off this parameter. The main reason is processes Ixxx are not stable. We had it on on 2 kinds of os-es - AIX and Solaris. On Solaris we switched it off after some crash and today (2014.02.12) we experienced a crash on AIX, so eventually we disable it on our machines completely.
And below some details concerned with today's crash:
  • we experienced a number of ORA-7445 errors in a sequence
  • ORA-07445: exception encountered: core dump [PC:0x900000002A3993C] [SIGSEGV]] [ADDR:0x100A10000FDDE0] [PC:0x900000002A3993C] [Address not mapped to object] []
  • call stack:
    skdstdst()+40 -> ksedst1()+104 -> ksedst()+40 -> dbkedDefDump()+2828 -> ksedmp()+76 -> ssexhd()+2448 -> 48bc


Update

Today a colleague of mine found a system, where the backup did not work properly. The launch of backup returned
ORA-17619: max number of processes using I/O slaves in a instance reached
The docs say it happens when there are more than 35 processes, which are controlling some IO slaves. In this particular example there were some backup processes, which lasted for few months already and never completed.

Update

Today (18 Sep 2018) on some db I 've found again ORA-17619. As this article has already few years, forget about it. Googled the net as well as Metalink resources, not much suggestions - finally found this article and checked for open orphaned rman sessions - that's it.
I thought about too many parallel slaves, while in fact this is simply a number of all the sessions opened by rman - somewhere there exists a counter, which counts processes from rman, but they do not need to be parallel (as in PX queries) - they simply exist at the same time.
In this particular example the sessions were opened at the different times across the period of 2 months.
This select may help to choose the old sessions - please note I had to kill those on the OS level, so by looking for SPID in v$process (simple session kill switched sessions in the KILLED status, but they have not gone)
select paddr||' '||logon_time||' '||module||' '||status
  ||' alter system kill session '''||sid||','||serial#||''' immediate;' 
from v$session where program like 'rman%' order by logon_time;
-- one may also further obtain also the PID of an underlying OS process 
select spid from v$process where addr='';

Wednesday 30 November 2011

Queue and failed automatic callback

Last time we installed a new callback for a queue. The next day this callback failed. The side effect was that the RDBMS did not make automatic calls of callbacks for messages enqueued after the message, for which the callback failed.
We have no entry in the alert log nor traces, so this was quite weird.
The solution is somewhat trivial. One needs to improve failing callback - which is obvious, because if not, we will hit the same situation again. The second thing is to dequeue all the not consumed messages and possibly enqueue them again (or rerun the processes, which register such messages).
More details one may find at MOS 1265138.1

Tuesday 29 November 2011

More than 255 columns and chained rows

We have a table, which has around 276 columns or so. Due to this it hits the known Oracle limitation (or feature) that a row is internally divided into rows of 255 columns.
The table data is stored unoptimally mainly due to chained rows. Theoretically the rows in that table will be divided no matter what, so one could say there is no sense in rebuilding that table (by alter table move command).
However it is possible to locate more of the chain in the same block.
Below there is a dump of a table with one row in it (for simplicity).

Block dump from disk:
buffer tsn: 4 rdba: 0x0100163e (4/5694)
scn: 0x0009.b6833260 seq: 0x01 flg: 0x06 tail: 0x32600601
frmt: 0x02 chkval: 0xdfc2 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0xFFFFFFFF7A506A00 to 0xFFFFFFFF7A508A00
[.. dump info here ..]
Block header dump: 0x0100163e
Object id on Block? Y
seg/obj: 0x18d76 csc: 0x09.b683325f itc: 3 flg: E typ: 1 - DATA
brn: 0 bdba: 0x1001638 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0009.010.00000fc8 0x00c0008f.031b.0c --U- 2 fsc 0x0000.b6833260
0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x03 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.00000000
bdba: 0x0100163e
data_block_dump,data header at 0xffffffff7a506a7c
===============
tsiz: 0x1f80
hsiz: 0x16
pbl: 0xffffffff7a506a7c
76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1b2b
avsp=0x1b15
tosp=0x1b15
0xe:pti[0] nrow=2 offs=0
0x12:pri[0] offs=0x1bba
0x14:pri[1] offs=0x1b2b
block_row_dump:
tab 0, row 0, @0x1bba
tl: 966 fb: -----L-- lb: 0x1 cc: 255
col 0: [ 2] c1 2e
col 1: [ 2] c1 2f
[.. dump info here ..]
col 253: [ 3] c2 03 63
col 254: [ 3] c2 03 64
tab 0, row 1, @0x1b2b
tl: 143 fb: --H-F--- lb: 0x1 cc: 45
nrid: 0x0100163e.0
col 0: [ 1] 80
col 1: [ 2] c1 02
[.. dump info here ..]
col 43: [ 2] c1 2c
col 44: [ 2] c1 2d
end_of_block_dump


As we see internally this one row is divided into 2 rows - 255 columns in the first one and 45 columns in the second one, and both pieces are stored in the same block. So possibly rows chained throughout a table after move may be stored with all the pieces in the same block, which certainly is a bonus.
After all the important factor here is a row length and not a chaining bound with more than 255 columns.

Tuesday 22 November 2011

PSU lists

All the PSU lists are available with Doc ID 756388.1 on Metalink.
In particular Oracle Database Recommended Patches are at Doc ID 756671.1.

Virtual Circuit Wait event

I have found few explanations connected with this event. It is specific for "shared server" configuration of the Oracle RDBMS.
  1. Dick Goulet suggests it is connected with periods of inaccessibility of shared servers due to easy to miss short bursts of load. The solution was suggested for the database in version 9.2.0.4
  2. Jonathan Lewis describes this event in similar way and suggests to check the v$reqdist view in order to get a histogram for duration of calls
  3. Very detailed description is available at Igor Usoltsev site and most up-to-date as it refers to the 11g version. Most important observations are:
    • old event virtual circuit status was split to shared server idle wait (which is recognized as an idle one) and virtual circuit wait (which consists actually of 2 parts: idle one and network non-idle one - the event will be probably split into such 2 components in future versions of the RDBMS)
    • the virtual circuit wait event is strongly connected with the SQL*Net message from client event (which is recognized usually as an idle one) and the existence of one or the other depends on the size of the array used for fetching rows - the bigger the array the more SQL*Net message from client the less virtual circuit wait

In our case we have had this occurred mostly on long-running queries over a db link, so very inconvenient, because even if this event is "partially idle", then for us a response time always includes this event. The solution for us is to tune every query over this db link as well as possible.

Auto-start of an Oracle database

The traditional way to run/stop automatically an Oracle RDBMS is to call appropriately the script dbstart/dbshut. This script takes one optional parameter, which is an ORACLE_HOME of a listener to run. The following script does the job and should be configured to run at the proper runlevels (usually start on 3, shut down on 0 and 6).
Of course one needs to keep oratab up to date in required configuration.


#!/bin/bash

## date : 2011-11-22
## desc : starts|stops all the instances with enabled autostart named in oratab
## desc : usually to be used on dev/test environments
## os : Linux, possibly Solaris, other may need improvements
## loc : /etc/init.d

CMD=$1
LOG='log/oracle_db.log'
su -l oracle -c 'if [ ! -d $HOME/log ]; then mkdir -p $HOME/log; fi;'
case $CMD in
start)
su -l oracle -c '$ORACLE_HOME/bin/dbstart $ORACLE_HOME &> $HOME/$LOG'
;;
stop)
su -l oracle -c '$ORACLE_HOME/bin/dbshut $ORACLE_HOME &> $HOME/$LOG'
;;
*)
echo "Usage: $0 start|stop"
;;
esac
if [ $? -ne 0 ]; then
echo 'Command failed. More details in /log/oracle_db.log'
fi

exit 0


Of course Windows does the autostart in a different way - one needs to set proper flag/s in the registry.

Shrinking temporary tablespaces in an Oracle database

This new in 11g feature is very similar to resizing temporary tablespace files. The main difference is that while resizing the RDBMS is able to cut off only not allocated space, while shrinking additionally is able to deallocate space, which is free and a subject to reuse.

So there is no analogy to shrinking tables for example. There is no data movement. If in the area of a cut there exists an object used at the moment, shrinking is of course able to return back only the space between the end of a last such object and the end of a file.

Saturday 22 October 2011

GLOBAL_NAME and db links

Once we created a database for migration from other older system with some domain_name. We set a db_domain in order to be compliant with other databases we managed. However the database to be migrated worked without such setting. Thus we hit a problem when creating database links, as those new were created with some suffix even provided with double quotes and in result some code did not want to compile.

What is behind the scene?
It seems at the database creation the GLOBAL_NAME setting for this database is created as well based on db_name and db_domain settings. Any changes to those base parameter are not reflected further in this GLOBAL_NAME value. And that is GLOBAL_NAME which influence database links name - its domain part is added to every database link.

How to deal with it?
http://dbatips4u.blogspot.com/2009/04/tip21-db-link-name-suffix.html provides the details.
I add here small note to it - I was especially interested in cancelling the global_name domain part at all, which is less trivial than just setting it to something new as the simple ALTER DATABASE RENAME GLOBAL_NAME TO "{db_name}"; does not work as planned (the domain part stays in place). I followed the notes from the link above. I run UPDATE GLOBAL_NAME SET GLOBAL_NAME='{db_name}'; but even though the GLOBAL_NAME content has changed, still db links' names were creates with suffix I did not want.
The important part was a database restart, after which all started work as planned.
Btw. our GLOBAL_NAMES was set to FALSE, which is different from the settings presented in the tip from the link above - may be that is the clue, why I needed to restart.

Wednesday 21 September 2011

10g to 11g upgrade

I used to think that pre-upgrade tools (ie. utlu112i.sql) are clearly for reporting purposes. I mean this script simply displays what one needs to fix in order to prepare a database to upgrade process.
Yesterday showed I was wrong.
So here is what You should not do at home ;-)
I started with direct setting of compatible parameter to 11.2.0.0.0 (which in turn was a very, very bad move). Because we upgraded quite a bunch of databases I felt quite sure. Another thing was we prepared a procedure for upgrade (based on documents from Metalink, but shortened, as we do not need to go through all the steps) and there the pre-upgrade tool was missing.
So I startup in upgrade mode and after a while I get an error that there is unknown identifier TZ_VERSION in table REGISTRY$DATABASE. The hint in script was simply to run utlu112i.sql - it became clear that there are some changes made by utlu112i.sql to the database structure.
Due to setting compatible parameter to 11.2.0.0.0 I was unable to start this database again as 10g version, so can not run utlu112i.sql. @#$$$%!
Then I tried to add a column to the problematic table. The command was successful, but the structure showed still as old due to running in upgrade mode and apparently in a different edition.
Recreating the controlfiles based on the database structure, which I still was able to query, was successful, but it is not enough because data files were changed as well (probably only headers, but this solution path was thus closed).
What I did finally was a simple ugly hack.
I edited the file $ORACLE_HOME/rdbms/admin/catupstr.sql and changed queries to the REGISTRY$DATABASE replacing TZ_VERSION calls with a value ('14' in my case). Then I just started catupgrd.sql again.

There were some issues with the SI_INFORMATN_SCHEMA schema, but not sure if this has anything to do with problems with pre-upgrade tool and as this is a test environment, it is not so clean as the production (some not compiling code pieces, etc). Eventually the problem was solved.

Concluding:
  1. do not change compatible parameter to newer when upgrading - do it sometime afterwards
  2. always run pre-upgrade tool