Friday 14 December 2018

ORA-07286: sksagdi: cannot obtain device information

Lately we had an issue regarding a Dataguard configuration - quite myesterious.
In dgmgrl there were reported standard errors about broken transport and inability to solve the gap and ORA-16810, which is actually the error, which just tell You, something is wrong (and not what is wrong).
The behaviour was as follows - the apply processes worked ok, so if I restored the archivelogs on the standby, they were applied even up to the current log. But then nothing - so it was clear there was a problem with the transport. But resetting connections was in vain, we also changed some parameters in the dataguard configuration, still nothing.
Finally the right solution was to restart the standby, which was a bit counterintuitive as the transport is mainly the feature of the primary. This was also fortunate, as there was no problem to perform that at once without service disruption.

Thursday 22 November 2018

Tuesday 2 October 2018

Work with EMCLI

Starting the work

Initially one have to login - without that the commands are not recognized as valid and even help is reduced
[oracle@host ~]$ emcli help
Summary of commands:


  Basic Administration Verbs
    argfile    -- Execute emcli verbs from a file
    help       -- Get help for emcli verbs (Usage: emcli help [verb_name])
    invoke_ws     -- Invoke EM web service.
    login      -- Login to the EM Management Server (OMS)
    logout     -- Logout from the EM Management Server
    setup      -- Setup emcli to work with an EM Management Server
    status     -- List emcli configuration details
    sync       -- Synchronize with the EM Management Server
    version    -- List emcli verb versions or the emcli client version

  To get the help for a single verb, execute - emcli help  
So we start
[oracle@host ~]$ emcli login -username=sysman
Enter password 

Login successful
Now help looks way more elaborate:
[oracle@host ~]$ emcli help
Summary of commands:


  Basic Administration Verbs
    argfile    -- Execute emcli verbs from a file
    help       -- Get help for emcli verbs (Usage: emcli help [verb_name])
    invoke_ws     -- Invoke EM web service.
    login      -- Login to the EM Management Server (OMS)
    logout     -- Logout from the EM Management Server
    setup      -- Setup emcli to work with an EM Management Server
    status     -- List emcli configuration details
    sync       -- Synchronize with the EM Management Server
    version    -- List emcli verb versions or the emcli client version

  ADM Verbs
    associate_target_to_adm     -- Associate a target to an application data model.
    export_adm     -- Export Application Data Model to a specified .xml file.
    import_adm     -- Import Application Data Model from a specified .xml file.
[..]

  WebSphere Target Management Verbs
    discover_was    -- Discover a IBM WebSphere Cell or a IBM Websphere Standalone Server
    refresh_was    -- Refresh IBM WebSphere Cells
argus
  To get the help for a single verb, execute - emcli help  

Adding a new blackout reason

## to add:
[oracle@host ~]$ emcli add_blackout_reason -name="Target disabled intentionally"
Added Blackout Reason "Target disabled intentionally" successfully

## to list
[oracle@host ~]$ emcli get_blackout_reasons
AS: Application Server Bounce
AS: Application Server Configuration Change
AS: Application Server Emergency
[..]

Tuesday 11 September 2018

ASM: moving disk between groups

SQL> select name||':'||path from v$asm_disk;

[..]
DATA_0008:/dev/oracleasm/disks/ASMDISK9

-- the operation
SQL> alter diskgroup DATA drop disk {chosen name from previous query} [rebalance power {speed between 0 and 11}];

-- the operation progress
SQL> select operation, state, est_minutes from v$asm_operation;

OPERA STAT EST_MINUTES
----- ---- -----------
REBAL RUN      7
REBAL WAIT      0

SQL> select GROUP_NUMBER,name,STATE from v$asm_disk where name='DATA_0008';

GROUP_NUMBER NAME       STATE
------------ ------------------------------ --------
    1 DATA_0008       DROPPING

-- ASM alert
Tue Sep 11 10:40:51 2018
NOTE: requesting all-instance membership refresh for group=1
Tue Sep 11 10:40:51 2018
GMON updating for reconfiguration, group 1 at 14 for pid 30, osid 19142
Tue Sep 11 10:40:51 2018
NOTE: group 1 PST updated.
SUCCESS: grp 1 disk DATA_0008 emptied
NOTE: erasing header (replicated) on grp 1 disk DATA_0008
NOTE: erasing header on grp 1 disk DATA_0008
NOTE: process _x000_+asm (19142) initiating offline of disk 8.3915952393 (DATA_0008) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: initiating PST update: grp 1 (DATA), dsk = 8/0xe968b109, mask = 0x6a, op = clear
Tue Sep 11 10:40:51 2018
GMON updating disk modes for group 1 at 15 for pid 30, osid 19142
Tue Sep 11 10:40:51 2018
NOTE: PST update grp = 1 completed successfully 
NOTE: initiating PST update: grp 1 (DATA), dsk = 8/0xe968b109, mask = 0x7e, op = clear
Tue Sep 11 10:40:51 2018
GMON updating disk modes for group 1 at 16 for pid 30, osid 19142
Tue Sep 11 10:40:51 2018
NOTE: cache closing disk 8 of grp 1: DATA_0008
Tue Sep 11 10:40:51 2018
NOTE: cache closing disk 8 of grp 1: (not open) DATA_0008
Tue Sep 11 10:40:51 2018
NOTE: PST update grp = 1 completed successfully 
Tue Sep 11 10:40:51 2018
GMON updating for reconfiguration, group 1 at 17 for pid 30, osid 19142
Tue Sep 11 10:40:51 2018
NOTE: cache closing disk 8 of grp 1: (not open) DATA_0008
Tue Sep 11 10:40:51 2018
NOTE: group 1 PST updated.
Tue Sep 11 10:40:51 2018
NOTE: membership refresh pending for group 1/0xb68400e (DATA)
Tue Sep 11 10:40:51 2018
GMON querying group 1 at 18 for pid 16, osid 10242
GMON querying group 1 at 19 for pid 16, osid 10242
Tue Sep 11 10:40:51 2018
NOTE: Disk DATA_0008 in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xb68400e (DATA)
NOTE: stopping process ARB0
NOTE: Attempting voting file refresh on diskgroup DATA
Tue Sep 11 10:40:54 2018
SUCCESS: rebalance completed for group 1/0xb68400e (DATA)

SQL> select operation, state, est_minutes from v$asm_operation;

no rows selected

SQL> select GROUP_NUMBER,name,STATE from v$asm_disk where name='DATA_0008';

no rows selected

SQL> alter diskgroup reco  add disk '/dev/oracleasm/disks/ASMDISK9';

Diskgroup altered

-- ASM alert entry
Tue Sep 11 10:45:43 2018
NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/ASMDISK9)
NOTE: requesting all-instance membership refresh for group=2
NOTE: Disk 1 in group 2 is assigned fgnum=2
NOTE: discarding redo for group 2 disk 1
NOTE: initializing header (replicated) on grp 2 disk RECO_0001
NOTE: initializing header on grp 2 disk RECO_0001
NOTE: requesting all-instance disk validation for group=2
Tue Sep 11 10:45:44 2018
NOTE: skipping rediscovery for group 2/0xb78400f (RECO) on local instance.
Tue Sep 11 10:45:44 2018
NOTE: requesting all-instance disk validation for group=2
Tue Sep 11 10:45:44 2018
NOTE: skipping rediscovery for group 2/0xb78400f (RECO) on local instance.
Tue Sep 11 10:45:44 2018
GMON updating for reconfiguration, group 2 at 20 for pid 28, osid 20206
Tue Sep 11 10:45:44 2018
NOTE: group 2 PST updated.
Tue Sep 11 10:45:44 2018
NOTE: membership refresh pending for group 2/0xb78400f (RECO)
Tue Sep 11 10:45:44 2018
GMON querying group 2 at 21 for pid 16, osid 10242
NOTE: cache opening disk 1 of grp 2: RECO_0001 path:/dev/oracleasm/disks/ASMDISK9
GMON querying group 2 at 22 for pid 16, osid 10242
Tue Sep 11 10:45:44 2018
SUCCESS: refreshed membership for 2/0xb78400f (RECO)
Tue Sep 11 10:45:44 2018
SUCCESS: alter diskgroup reco  add disk '/dev/oracleasm/disks/ASMDISK9'
NOTE: Attempting voting file refresh on diskgroup RECO
Tue Sep 11 10:45:45 2018
NOTE: starting rebalance of group 2/0xb78400f (RECO) at power 1
Starting background process ARB0
Tue Sep 11 10:45:45 2018
ARB0 started with pid=31, OS id=20232 
NOTE: assigning ARB0 to group 2/0xb78400f (RECO) with 1 parallel I/O



Wednesday 11 July 2018

Broken DG configuration

Few days ago I 've run into the following situation on a DG configuration:
  • the primary shipped logs to the standby properly, but reported errors if asked for status in dgmgrl
    DGMGRL> show configuration
    
    Configuration - dbd
    
      Protection Mode: MaxPerformance
      Databases:
        db1d - Primary database
        db2d - Physical standby database
    
    Fast-Start Failover: DISABLED
    
    Configuration Status:
    ORA-16610: command "Broker operation" in progress
    DGM-17017: unable to determine configuration status
    
  • the standby initially did not process archivelogs, but after configuration reset (disable and enable operations) it started to process them, but also reported errors in dgmgrl
  • it was possible to log in from the primary to the standby
  • it was possible to log in from the standby to the primary
  • I copied password file from the primary to the standby to ensure this is the same file
  • when calling show database on the primary the errors were reported for the standby
  • when calling show database on the standby the errors were reported for the primary
What was the cause?
It showed that the problem was generated by the TNS entries on the primary - apparently the TNS alias definitions for the primary and the standby there included some blank characters, which caused the alias to be read improperly in case of dgmgrl - at the same time simple call to those aliases through sqlplus went ok.
Once I removed all the formatting in the tnsnames.ora, whole thing started to work properly. What is a bit interesting is that the errors in the TNS were present only on the primary, while the dgmgrl calls malfunctioned on both ends.

Role order

Lately a colleague of mine run into the problem with too many roles - some account have them 150, while maximally allowed are 148. He wanted to establish the order in which the roles are assigned to the user as obviously only the excess over 148 was thrown away.
He has found that the roles are loaded with the following query:
select /*+ connect_by_filtering */ privilege#,level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0; 
Thus it seems as the the order may be random, though it is possible that usually the data is returned in the order of insertion.

Friday 29 June 2018

ORA-01554: transaction concurrency limit reached reason:no undo segment found with available slot params:0, 0 :

Some time ago I found such error on one of our databases. It is a bit mysterious as there are not much about it on Metalink (few bugs) or on the internet (at least when searching by Google). But indeed the database hit a limit of concurrent transactions in some very short period of time. On the performance chart in OEM there is a high peak with an INSERT to some custom table, with waits on TX - index contention, buffer busy, enq: SQ contention and undo segment tx slot.
The database is working with UNDO_MANAGEMENT=AUTO, which is a default setting and since 10g used very widely, so the administrator may simply forget about rollback segments management. According to the docs with this setting some work parameters relevant to the rollback management are simply ignored (TRANSACTIONS, TRANSACTIONS_PER_ROLLBACK_SEGMENT), while the number of transactions is limited by the undo tablespace size.
So it seems we should not experience this error, but rather session suspension and the error about lack of space in the undo tablespace.
Yet here are are. It is clear from the numbers below, that still there is a space in the undo (up to 4598 undo blocks used
What I could check is this:
select BEGIN_TIME, UNDOTSN, UNDOBLKS, TXNCOUNT, MAXQUERYLEN, MAXQUERYID, MAXCONCURRENCY, 
ACTIVEBLKS, UNEXPIREDBLKS, EXPIREDBLKS, TUNED_UNDORETENTION 
from v$undostat 
where begin_time between to_date('2018-06-25 14', 'yyyy-mm-dd hh24') and to_date('2018-06-25 16', 'yyyy-mm-dd hh24') 
order by begin_time;

BEGIN_TIMEUNDOBLKSTXNCOUNTMAXQUERYLENMAXCONCURRENCYACTIVEBLKSUNEXPIREDBLKSEXPIREDBLKSTUNED_UNDORETENTION
2018-06-25 14:3907816520160071682372
2018-06-25 14:4914119116532160071682373
2018-06-25 14:5945981042251653131160473658882373
2018-06-25 15:092169103321933311160665643521653
2018-06-25 15:19228217341160806428162454
Since the midnight the number of undo blocks consumed was 63475 with 524416 available blocks (so ~12%) and maximum query length was below 1h. Thus definitely we were far below the undo space limit. From this figures it is clear that still it is possible to hit an error typical for MANUAL undo management scenario, while on AUTO - as here we are.

And here few words of wisdom from Jonathan Lewis:
Taking a quick guess:

An (automatic) undo segment has 34 slots in the transaction table (the thing in the undo segment header).
If you and up with a small number N of undo segments then you can't have more than 34 * N concurrent transactions active because each one needs a transaction table slot and the next attempt to start a transaction would see ORA-01554

It seems a little unlikely that this would happan in typical system since Oracle would usually end up stealing an extent from an existing undo segment to use it as the first extent of a new undo segment. But it wouldn't be hard to set up a mechanism that started with a small number of undo segments in a relatively small undo tablespace - started some transactions (then leave them idle forever after the first change - e.g. they might have come in from a remote database) that made it impossible to free any extents, then did a lot of work to allocate all the extents (but still left a little space in allocated extents for continuting transactions), then started lots of transactions until you hit the error.

And Julian Dyke's presentation:
Undo segments are allocated at instance startup.
Undo segments can be added dynamically.

Each undo segment header contains
  • Pool of free undo extents
  • Set of undo slots

One undo slot is allocated to each transaction.
Undo slot contains list of undo extents.
Extents can migrate from one undo segment to another.
Undo slots are used cyclically:
  • remain in header as long as possible
  • reduces probability of ORA-01555: Snapshot too old

With all this in mind I would say that 1000 times higher transaction rate is in this particular case the main reason for the error. Apparently the db is not able to extend its concurrent transactions rate capability fast enough to respond to extended demand. Possible action here could be initial setting of _rollback_segment_count to a higher value than 12, which are set here.

Monday 18 June 2018

High latch free on Exadata

Today I have investigated an issue with poor performance on some processing. On the production environment the thing have been usually accomplished in some 5 minutes, while on one of test databases it took once 5h to complete.
As this was a RAC on Exadata machine, AWR report was not that much helpful, though immediately was clear the main problem was a latch free event. The problem was, which latch was responsible here.
The performance chart in OEM did not provide SQL text (as apparently those calls were one-time SQL calls - candidates for dealing with them by setting cursor_sharing to FORCE).
A good step was to call the following query:
select name, (misses/decode(gets,0,1,gets) )*100 ratio, 
       (immediate_misses/decode(immediate_gets,0,1,immediate_gets))*100 immediate_ratio,
        spin_gets, wait_time
 from v$latch
where wait_time > 0
order by 2
;
While one can not be 100% sure the query indicates the right latch, it at least gives some clues. This time the only distinctive ratio (around 40%) was Result Cache: RC Latch. The other one with around 10% was resource manager latch but there the number of gets and misses in total was fairly small, so it was not that interesting.
I've looked for this latch name and found few articles on the subject:

Friday 8 June 2018

RMAN-3008/RMAN-3009 and RMAN-20095

Today I had the following problem:
  • initially the backup failed with
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-03008: error while performing automatic resync of recovery catalog
    RMAN-20095: invalid backup/copy control file checkpoint SCN
    
  • every next backup failed with
    RMAN-03014: implicit resync of recovery catalog failed
    RMAN-03009: failure of partial resync command on default channel at 06/07/2018 04:18:50
    RMAN-20095: invalid backup/copy control file checkpoint SCN
    
This is reported on Metalink as "Bug 19209117 RMAN-3008 RMAN-20095 while performing Automatic Resync"

How to move forward?
I found the 1st backup, which failed, then in this backup log found the place the 1st time the error occurred, then started to uncatalog since that place upward. Please note the uncataloging has to take place when connected only to the database (i.e. when working only with backaup entries stored within a control file) - otherwise rman will try as a 1st step implicitly resync the catalog which will lead to the error over and over again.
change backuppiece '/path_to_backup/cf_YYYYY_c-283029066-20180607-02' uncatalog;
The first call to uncatalog controlfile backup was enough.
In the debug of call to resync catalog during problems there is the following section:
DBGRESYNC:     channel default:   file# 0 [11:30:20.805] (resync)
DBGRPC:        krmxrpc - channel default kpurpc2 err=0 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CHECKBACKUPDATAFILE excl: 0
   DBGRCVCAT: addBackupControlfile - Inside dup_val_on_index exception
DBGRESYNC:     channel default: Calling checkBackupDataFile for set_stamp 978149312 set_count 8542 recid 6038 [11:30:20.807] (resync)
DBGRESYNC:     channel default:   file# 0 [11:30:20.807] (resync)
DBGRPC:        krmxrpc - channel default kpurpc2 err=20095 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CHECKBACKUPDATAFILE excl: 129
   DBGRCVCAT: addBackupControlfile - Inside dup_val_on_index exception
   DBGRCVCAT: addBackupControlfile - ckp_scn 3201066972364 ckp_time 07-JUN-18
   DBGRCVCAT: addBackupControlfile - lckp_scn 3201066972565 lckp_time 07-JUN-18
DBGRPC:        krmxrpc - channel default kpurpc2 err=0 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CANCELCKPT excl: 0
   DBGRCVCAT: cancelCkpt - rollback, released all locks

-- here the moment the error 20095 is catched
DBGPLSQL:     EXITED resync with status ORA--20095 [11:30:20.992]
DBGRPC:       krmxr - channel default returned from peicnt
DBGMISC:      ENTERED krmstrim [11:30:20.992]
DBGMISC:       Trimming message: ORA-06512: at line 3401 [11:30:20.992] (krmstrim)
DBGMISC:        (24) (krmstrim)
DBGMISC:      EXITED krmstrim with status 24 [11:30:20.992] elapsed time [00:00:00:00.000]
DBGRPC:       krmxr - channel default got execution errors (step_60)
DBGRPC:       krmxr - exiting with 1
DBGMISC:      krmqexe: unhandled exception on channel default [11:30:20.992]
DBGMISC:     EXITED krmiexe with status 1 [11:30:20.992] elapsed time [00:00:01:21.889]
[..]
DBGMISC:     error recovery releasing channel resources [11:30:20.992]
DBGRPC:      krmxcr - channel default resetted
DBGMISC:     ENTERED krmice [11:30:20.993]
DBGMISC:      command to be compiled and executed is: cleanup  [11:30:20.993] (krmice)
DBGMISC:      command after this command is: NONE  [11:30:20.993] (krmice)
[..]
DBGMISC:     EXITED krmice [11:30:21.017] elapsed time [00:00:00:00.024]
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 06/08/2018 11:30:20
RMAN-20095: invalid backup/copy control file checkpoint SCN

Tuesday 22 May 2018

ORA-04030 on a process with large memory requirements

Lately I've met with the following problem. The database in version 11.2.0.4, the application with an evening batch job. At some point the batch job started to fail constantly, reaching around 4500M of allocated PGA. All the efforts to increase PGA_AGGREGATE_TARGET were void, we increased 1st to 5G (from 3), then to 8, but still failed at pretty much the same allocation.
Top allocations look as follows:
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------
70% 3204 MB, 205630 chunks: "pmuccst: adt/record       "  PL/SQL
         koh-kghu call   ds=0x7f2e2ea3c938  dsprt=0xb8c04a0
30% 1356 MB, 87052 chunks: "pl/sql vc2                "  PL/SQL
         koh-kghu call   ds=0x7f2e2ea3c938  dsprt=0xb8c04a0
I've found an article on Metalink (ORA-04030 Error For Memory Allocation Type "pmuccst: adt/record" (Doc ID 1513623.1)), but there was stated that this allocation is common for PL/SQL collections and it is not possible to limit it with PGA_AGGREGATE_TARGET (or _pga_max_size) as this is a different allocation bound with the heap of the process and governed with other settings. Similarly pl/sql vc2 allocation is common for local variables like varrays (ORA-04030 Error For Memory Allocation Type "pl/sql vc2" (Doc ID 1315884.1)).

There are few solutions to the problem above - one is to rewrite the PL/SQL code to replace those large collections with GTT or another is limit collections to fetch only limited number of entries and process cursors in loops.
There is also (as it finally shows up) a possibility to deal with this threshold.
There are the following parameters, which govern this allocation:
  • vm.max_map_count, which is set to 65530 by default - on OS level
  • _use_realfree_heap - hidden db parameter (in 11.2.0.4 defaults to TRUE) - this one enables the whole feature apparently
  • _realfree_heap_pagesize_hint - hidden db parameter (in 11.2.0.4 defaults to 65536) - this one sets page size, so e.g. with increase to 262144 we can increase the memory allocation to around 16G
  • _realfree_heap_pagesize (since 12c)
  • _realfree_heap_max_size - hidden db parameter (in 11.2.0.4 defaults to 32768) - it is described as "minimum max total heap size", in Kbytes - not sure exactly what does that mean and what is controlled by that
The indication, that this is a problem with realfree heap, is present in the process map dump - in this case the dump displayed exactly maximal allowed number of entries.

In general it seems the memory available to the process is calculated as vm.max_map_count * _realfree_heap_pagesize_hint, thus one may affect one of multipliers or both of them, where max_map_count controls the number of allocations while realfree_heap_pagesize the amount of memory within one allocation.

Friday 13 April 2018

ORA-12154 while duplicating from active database on RAC

A colleague of mine tried to rebuild a standby from some primary database. He prepared, added proper entries to tnsnames.ora on both clusters (primary and standby), then fired duplicate command. When it came to
restore clone from service  'ORCLPRIM' standby controlfile;
the command failed with:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 04/13/2018 12:54:04
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-06136: ORACLE error from auxiliary database: ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
We checked tnsnames.ora files, connections 'standby to primary' and 'primary to standby' - all seemed ok.

Sometimes it is some 'invisible' character in tnsnames.ora, which gets in the way, sometimes simple lack of definition. But not this time, so this case is a bit more interesting...
The grid binaries and database binaries have different homes here and keep different tnsnames.ora files (as there is no shared TNS_ADMIN location).
And the solution was to add the required TNS entries also to the grid home.
My supposition is as follows:
  • the point where the command is issued is the standby instance, to which the script connects through statically registered interface.
  • the listener is fired from the grid home (and not db), so its environment is derived from grid, even though in static registration definition the db home is pointed as ORACLE_HOME
  • thus addition of TNS definitions to the grid home tnsnames.ora helped
  • I 've found also that for some static registration entries there was added additional clause ENVS (disclaimer: this parameter in not supported on Windows, any process started by the listener will simply inherit the listener's environment) in the form
    (SID_DESC =
          (SID_NAME = orclstb1)
          (ORACLE_HOME = [db home])
          (GLOBAL_DBNAME = orclstb_dgmgrl)
          (ENVS = "TNS_ADMIN=[db home]/network/admin") 
    )
    
    so seems my supposition is correct and this way is another way to get around this issue

Monday 9 April 2018

Some issue with external table over preprocessor

I built an external table based on the output from a preprocessor. Got the following explanation, why it does not work:
      Trim whitespace same as SQL Loader
KUP-04027: file name check failed: /u01/app/oracle/nx/datamasking/nx_masking.dat
KUP-04017: OS message: Error 0
KUP-04118: operation "lstat64", location "skudmicdl"
All seemed to be ok. The script returned the output when called manually. Returned value was 0.
So what was wrong? The file I called was a link to the real script in fact. Once I dropped the link and copied the script source to the file with name referred within database, the problem vanished.

Monday 26 March 2018

kkzifr3g: Encountered error ORA-12057.

The solution is as follows:
-- as we initially do not know what is happening exactly
alter system set events '12057 trace name errorstack level 3';
-- and wait until hit the issue again (note the command above dumps only 1st occurrence in a session)

[.. this is from alert.log - this time much more info ..]
Mon Mar 26 10:16:01 2018
kkzifr3g: Encountered error ORA-12057.
Mon Mar 26 10:16:01 2018
Dumping diagnostic data in directory=[cdmp_20180326101601], requested by (instance=1, osid=60555319 (J002)), summary=[abnormal process termination].
Mon Mar 26 10:16:02 2018
Errors in file /u01/app/oracle/diag/rdbms/db1/db1/trace/db1_j005_24117326.trc:
ORA-12057: materialized view or zonemap "TEST"."MV_DATA1" is invalid and must use complete refresh
[..]

-- and even more data in trace, but we care here mostly about the mv name
-- then ORA-12057 During Fast Refresh of Aggregate Materialized View (Doc ID 1399651.1)

-- to clean up
alter system set events '12057 trace name context off';

Thursday 22 March 2018

Fractured block found during backing up datafile

{hostname}, Hex dump of (file 75, block 87980) in trace file /oracle/diag/rdbms/db1/db1/trace/db1_ora_23614.trc
Corrupt block relative dba: 0x12c157ac (file 75, block 87980)
Fractured block found during backing up datafile
Data in bad block:
 type: 6 format: 2 rdba: 0x12c157ac
 last change scn: 0x02d7.7dc0c4bc seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xb1a20602
 check value in block header: 0xc892
 computed block checksum: 0x67fa
Reread of blocknum=87980, file=+DATA/db1/datafile/file1.343.956407117. found valid data : 

dbv file=+DATA/db1/datafile/file1.343.956407117 blocksize=8192 start=87980 end=87981 userid=sys/blbla

DBVERIFY: Release 11.2.0.4.0 - Production on Thu Mar 22 10:04:33 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = +DATA/db1/datafile/file1.343.956407117


DBVERIFY - Verification complete

Total Pages Examined         : 2
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 2
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 0
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 0
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 0 (0.0)

Monday 5 March 2018

Restricted session

-- at startup
startup restrict;

-- on running instance
alter system enable restricted session;

System altered.

-- to disable
alter system disable restricted session;

-- to check
select logins from v$instance; 

LOGINS
----------
RESTRICTED

Thursday 22 February 2018

Insightful article on connections db terminated by firewalls (external link)

Very insightful article on why/when firewalls throw connections https://li555li.blogspot.se/2014/09/.
In short - to prevent this one have to set SQLNET.EXPIRE_TIME to {timeout on firewall}/2-1 in minutes.

Wednesday 31 January 2018

How to find a time for the latest DML on a table

Lately I've got such question from a co-worker. I looked around a bit and here are the findings:
  • ORA_ROWSCN pseudocolumn - seems to be the most versatile solution here - the maximal value of ORA_ROWSCN would definitely point the time of the last modification for any table.
  • VERSIONS_ENDSCN https://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_flashback.htm#ADFNS01009
  • dba_tab_modifications - this is the table, which keeps number of DML operations since the last statistics collection. Thus it may be used to find the approx. time of a last modification assuming it is done after the latest stats collection, but it has of course a number of limitations

Thursday 11 January 2018

Change of DB_UNIQUE_NAME in a database

Today I created a new database. I created with usage of Oracle Universal Installer graphic interface. I did not find the switch for advanced configuration, where I could change some parameters along the installation phase, so I was left over with this what gives the basic configuration for Server-class database option. May be I missed some tab or checkbox, does not matter.
Now the starting point is to have a 12c standalone database registered in clusterware.
  • the simple change in the spfile ends with
    SQL> alter system set db_unique_name='pizmak_node3' scope=spfile;
    alter system set db_unique_name='pizmak_node3' scope=spfile
    *
    ERROR at line 1:
    ORA-32017: failure in updating SPFILE
    ORA-65500: could not modify DB_UNIQUE_NAME, resource exists
    
  • so I found "How to change the DB_UNIQUE_NAME in a RAC database (Doc ID 1604421.1)", and remove the database from clusterware, but then
    SQL> startup mount
    ORA-01078: failure in processing system parameters
    ORA-01565: error in identifying file '+DATA/pizmak/spfilepizmak.ora'
    ORA-17503: ksfdopn:2 Failed to open file +DATA/pizmak/spfilepizmak.ora
    ORA-15056: additional error message
    ORA-17503: ksfdopn:2 Failed to open file +DATA/pizmak/spfilepizmak.ora
    ORA-15173: entry 'spfilepizmak.ora' does not exist in directory 'pizmak'
    ORA-06512: at line 4
    
    
    Apparently the oratab entry was removed with the srvctl remove database command
  • so I found "Alert Log Error ORA-17503 ORA-15173 After Deleting Database. (Doc ID 1900471.1)"
  • so I looked at $ORACLE_HOME/dbs and found pfile with SPFILE pointing to ASM location and put it aside
  • after some time I discovered true reason - initially spfile was located in +DATA/PIZMAK/ (possibly on some stage of db installation), but eventually the location was +DATA/PIZMAK/PARAMETERFILE
  • so I used the pfile in $ORACLE_BASE/admin/pizmak/pfile/, which is put there during installation
  • so I started with startup nomount pfile=my_pfile_copy - this pfile had already db_unique_name changed to the one I wanted to, no changes to db_create_file_dest (as it would point to
  • I've got info 12c is able to move files online to another location, so
    SQL> select file_id||':'||file_name from dba_data_files;
    
    FILE_ID||':'||FILE_NAME
    --------------------------------------------------------------------------------
    6:+DATA/PIZMAK/DATAFILE/users.302.965122931
    4:+DATA/PIZMAK/DATAFILE/undotbs1.303.965122931
    3:+DATA/PIZMAK/DATAFILE/sysaux.300.965122739
    1:+DATA/PIZMAK/DATAFILE/system.314.965139835
    
    alter database move datafile 1;
    alter database move datafile 3;
    alter database move datafile 4;
    alter database move datafile 6;
    
    SQL> select file_id||':'||file_name from dba_data_files;
    
    FILE_ID||':'||FILE_NAME
    --------------------------------------------------------------------------------
    6:+DATA/PIZMAK_NODE3/DATAFILE/users.303.965140667
    3:+DATA/PIZMAK_NODE3/DATAFILE/sysaux.301.965140607
    4:+DATA/PIZMAK_NODE3/DATAFILE/undotbs1.300.965140655
    1:+DATA/PIZMAK_NODE3/DATAFILE/system.314.965139835
    
  • now logs
    SQL> select member from v$logfile;
    
    MEMBER
    --------------------------------------------------------------------------------
    +DATA/PIZMAK/ONLINELOG/group_3.310.965122991
    +DATA/PIZMAK/ONLINELOG/group_3.311.965122991
    +DATA/PIZMAK/ONLINELOG/group_2.308.965122991
    +DATA/PIZMAK/ONLINELOG/group_2.309.965122991
    +DATA/PIZMAK/ONLINELOG/group_1.306.965122991
    +DATA/PIZMAK/ONLINELOG/group_1.307.965122991
    
    SQL> select GROUP#, SEQUENCE#, MEMBERS, STATUS from v$log;
    
        GROUP#  SEQUENCE#    MEMBERS STATUS
    ---------- ---------- ---------- ------------------------------------------------
             1          7          1 INACTIVE
             2          8          2 INACTIVE
             3          9          2 CURRENT
    
    
    alter database drop logfile group 1;
    alter database add logfile group 1 size 512m;
    -- here in alert log (as this was 1st log file on REDO) an error
    -- Thu Jan 11 14:41:29 2018
    -- ERROR: failed to establish dependency between database pizmak_node3 and diskgroup resource ora.REDO.dg
    
    
    alter database drop logfile group 2;
    alter database add logfile group 2 size 512m;
    -- no more errors as the structure already exists
    
    alter system switch logfile;
    alter system switch logfile;
    
    alter database drop logfile group 3;
    alter database add logfile group 3 size 512m;
    
    SQL> select member from v$logfile;
    
    MEMBER
    ------------------------------------------------------------------------------------------------------------------------
    +REDO/PIZMAK_NODE3/ONLINELOG/group_3.275.965141729
    +REDO/PIZMAK_NODE3/ONLINELOG/group_2.274.965141165
    +REDO/PIZMAK_NODE3/ONLINELOG/group_1.256.965140887
    
    
    
    I have to double members yet
  • now spfile
    create spfile='+DATA' from pfile='/oracle/m010162/init.ora.011201894952';
    -- and change in initpizmak.ora
    -- SPFILE='+DATA/pizmak_node3/parameterfile/spfile.311.965142657'
    !cp initpizmak.ora $ORACLE_HOME/dbs/
    
  • to move temporary tablespace I just created a new one and switched default temporary tablespace to it
    create temporary tablespace temp2 tempfile size 128m autoextend on next 128m maxsize 16g;
    alter database default temporary tablespace temp2;
    
    -- then as I wanted temporary tablespace named TEMP I did that once again
    create temporary tablespace temp tempfile size 128m autoextend on next 128m maxsize 16g;
    alter database default temporary tablespace temp;
    
    SQL> select file_name from dba_temp_files;
    
    FILE_NAME
    ------------------------------------------------------------------------------------------------------------------------
    +DATA/PIZMAK_NODE3/TEMPFILE/temp.312.965143967
    
    
  • to move control files I used asmcmd
    1. change control_files parameter to new location with scope spfile
    2. shutdown immediate
    3. copy or move control files to the new location in asmcmd (with cp one have to specify the name of the target file, but without instance/revision? number)
    4. start the db

Tuesday 2 January 2018

ORA-27303: additional information: startup egid = 5003 (asmadmin), current egid = 5000 (oinstall)

I've got the following info on several occasions:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 5003 (asmadmin), current egid = 5000 (oinstall)
It seems the problem arises from wrong permissions on oracle binary files. More explanations here - https://asanga-pradeep.blogspot.dk/2012/06/ora-27303-additional-information.html.