Lately we had an issue regarding a Dataguard configuration - quite myesterious.
In dgmgrl there were reported standard errors about broken transport and inability to solve the gap and ORA-16810, which is actually the error, which just tell You, something is wrong (and not what is wrong).
The behaviour was as follows - the apply processes worked ok, so if I restored the archivelogs on the standby, they were applied even up to the current log. But then nothing - so it was clear there was a problem with the transport. But resetting connections was in vain, we also changed some parameters in the dataguard configuration, still nothing.
Finally the right solution was to restart the standby, which was a bit counterintuitive as the transport is mainly the feature of the primary. This was also fortunate, as there was no problem to perform that at once without service disruption.
Friday, 14 December 2018
Thursday, 22 November 2018
Logon storms
https://www.slideshare.net/tanelp/tanel-poder-troubleshooting-complex-oracle-performance-issues-part-1
RATE_LIMIT parameter in listener.ora
_logout_storm_rate - instance parameter
RATE_LIMIT parameter in listener.ora
_logout_storm_rate - instance parameter
Tuesday, 2 October 2018
Work with EMCLI
Starting the work
Initially one have to login - without that the commands are not recognized as valid and even help is reduced[oracle@host ~]$ emcli help Summary of commands: Basic Administration Verbs argfile -- Execute emcli verbs from a file help -- Get help for emcli verbs (Usage: emcli help [verb_name]) invoke_ws -- Invoke EM web service. login -- Login to the EM Management Server (OMS) logout -- Logout from the EM Management Server setup -- Setup emcli to work with an EM Management Server status -- List emcli configuration details sync -- Synchronize with the EM Management Server version -- List emcli verb versions or the emcli client version To get the help for a single verb, execute - emcli helpSo we start
[oracle@host ~]$ emcli login -username=sysman Enter password Login successfulNow help looks way more elaborate:
[oracle@host ~]$ emcli help Summary of commands: Basic Administration Verbs argfile -- Execute emcli verbs from a file help -- Get help for emcli verbs (Usage: emcli help [verb_name]) invoke_ws -- Invoke EM web service. login -- Login to the EM Management Server (OMS) logout -- Logout from the EM Management Server setup -- Setup emcli to work with an EM Management Server status -- List emcli configuration details sync -- Synchronize with the EM Management Server version -- List emcli verb versions or the emcli client version ADM Verbs associate_target_to_adm -- Associate a target to an application data model. export_adm -- Export Application Data Model to a specified .xml file. import_adm -- Import Application Data Model from a specified .xml file. [..] WebSphere Target Management Verbs discover_was -- Discover a IBM WebSphere Cell or a IBM Websphere Standalone Server refresh_was -- Refresh IBM WebSphere Cells argus To get the help for a single verb, execute - emcli help
Adding a new blackout reason
## to add: [oracle@host ~]$ emcli add_blackout_reason -name="Target disabled intentionally" Added Blackout Reason "Target disabled intentionally" successfully ## to list [oracle@host ~]$ emcli get_blackout_reasons AS: Application Server Bounce AS: Application Server Configuration Change AS: Application Server Emergency [..]
Tuesday, 11 September 2018
ASM: moving disk between groups
SQL> select name||':'||path from v$asm_disk; [..] DATA_0008:/dev/oracleasm/disks/ASMDISK9 -- the operation SQL> alter diskgroup DATA drop disk {chosen name from previous query} [rebalance power {speed between 0 and 11}]; -- the operation progress SQL> select operation, state, est_minutes from v$asm_operation; OPERA STAT EST_MINUTES ----- ---- ----------- REBAL RUN 7 REBAL WAIT 0 SQL> select GROUP_NUMBER,name,STATE from v$asm_disk where name='DATA_0008'; GROUP_NUMBER NAME STATE ------------ ------------------------------ -------- 1 DATA_0008 DROPPING -- ASM alert Tue Sep 11 10:40:51 2018 NOTE: requesting all-instance membership refresh for group=1 Tue Sep 11 10:40:51 2018 GMON updating for reconfiguration, group 1 at 14 for pid 30, osid 19142 Tue Sep 11 10:40:51 2018 NOTE: group 1 PST updated. SUCCESS: grp 1 disk DATA_0008 emptied NOTE: erasing header (replicated) on grp 1 disk DATA_0008 NOTE: erasing header on grp 1 disk DATA_0008 NOTE: process _x000_+asm (19142) initiating offline of disk 8.3915952393 (DATA_0008) with mask 0x7e in group 1 (DATA) without client assisting NOTE: initiating PST update: grp 1 (DATA), dsk = 8/0xe968b109, mask = 0x6a, op = clear Tue Sep 11 10:40:51 2018 GMON updating disk modes for group 1 at 15 for pid 30, osid 19142 Tue Sep 11 10:40:51 2018 NOTE: PST update grp = 1 completed successfully NOTE: initiating PST update: grp 1 (DATA), dsk = 8/0xe968b109, mask = 0x7e, op = clear Tue Sep 11 10:40:51 2018 GMON updating disk modes for group 1 at 16 for pid 30, osid 19142 Tue Sep 11 10:40:51 2018 NOTE: cache closing disk 8 of grp 1: DATA_0008 Tue Sep 11 10:40:51 2018 NOTE: cache closing disk 8 of grp 1: (not open) DATA_0008 Tue Sep 11 10:40:51 2018 NOTE: PST update grp = 1 completed successfully Tue Sep 11 10:40:51 2018 GMON updating for reconfiguration, group 1 at 17 for pid 30, osid 19142 Tue Sep 11 10:40:51 2018 NOTE: cache closing disk 8 of grp 1: (not open) DATA_0008 Tue Sep 11 10:40:51 2018 NOTE: group 1 PST updated. Tue Sep 11 10:40:51 2018 NOTE: membership refresh pending for group 1/0xb68400e (DATA) Tue Sep 11 10:40:51 2018 GMON querying group 1 at 18 for pid 16, osid 10242 GMON querying group 1 at 19 for pid 16, osid 10242 Tue Sep 11 10:40:51 2018 NOTE: Disk DATA_0008 in mode 0x0 marked for de-assignment SUCCESS: refreshed membership for 1/0xb68400e (DATA) NOTE: stopping process ARB0 NOTE: Attempting voting file refresh on diskgroup DATA Tue Sep 11 10:40:54 2018 SUCCESS: rebalance completed for group 1/0xb68400e (DATA) SQL> select operation, state, est_minutes from v$asm_operation; no rows selected SQL> select GROUP_NUMBER,name,STATE from v$asm_disk where name='DATA_0008'; no rows selected SQL> alter diskgroup reco add disk '/dev/oracleasm/disks/ASMDISK9'; Diskgroup altered -- ASM alert entry Tue Sep 11 10:45:43 2018 NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/ASMDISK9) NOTE: requesting all-instance membership refresh for group=2 NOTE: Disk 1 in group 2 is assigned fgnum=2 NOTE: discarding redo for group 2 disk 1 NOTE: initializing header (replicated) on grp 2 disk RECO_0001 NOTE: initializing header on grp 2 disk RECO_0001 NOTE: requesting all-instance disk validation for group=2 Tue Sep 11 10:45:44 2018 NOTE: skipping rediscovery for group 2/0xb78400f (RECO) on local instance. Tue Sep 11 10:45:44 2018 NOTE: requesting all-instance disk validation for group=2 Tue Sep 11 10:45:44 2018 NOTE: skipping rediscovery for group 2/0xb78400f (RECO) on local instance. Tue Sep 11 10:45:44 2018 GMON updating for reconfiguration, group 2 at 20 for pid 28, osid 20206 Tue Sep 11 10:45:44 2018 NOTE: group 2 PST updated. Tue Sep 11 10:45:44 2018 NOTE: membership refresh pending for group 2/0xb78400f (RECO) Tue Sep 11 10:45:44 2018 GMON querying group 2 at 21 for pid 16, osid 10242 NOTE: cache opening disk 1 of grp 2: RECO_0001 path:/dev/oracleasm/disks/ASMDISK9 GMON querying group 2 at 22 for pid 16, osid 10242 Tue Sep 11 10:45:44 2018 SUCCESS: refreshed membership for 2/0xb78400f (RECO) Tue Sep 11 10:45:44 2018 SUCCESS: alter diskgroup reco add disk '/dev/oracleasm/disks/ASMDISK9' NOTE: Attempting voting file refresh on diskgroup RECO Tue Sep 11 10:45:45 2018 NOTE: starting rebalance of group 2/0xb78400f (RECO) at power 1 Starting background process ARB0 Tue Sep 11 10:45:45 2018 ARB0 started with pid=31, OS id=20232 NOTE: assigning ARB0 to group 2/0xb78400f (RECO) with 1 parallel I/O
Wednesday, 11 July 2018
Broken DG configuration
Few days ago I 've run into the following situation on a DG configuration:
It showed that the problem was generated by the TNS entries on the primary - apparently the TNS alias definitions for the primary and the standby there included some blank characters, which caused the alias to be read improperly in case of dgmgrl - at the same time simple call to those aliases through sqlplus went ok.
Once I removed all the formatting in the tnsnames.ora, whole thing started to work properly. What is a bit interesting is that the errors in the TNS were present only on the primary, while the dgmgrl calls malfunctioned on both ends.
- the primary shipped logs to the standby properly, but reported errors if asked for status in dgmgrl
DGMGRL> show configuration Configuration - dbd Protection Mode: MaxPerformance Databases: db1d - Primary database db2d - Physical standby database Fast-Start Failover: DISABLED Configuration Status: ORA-16610: command "Broker operation" in progress DGM-17017: unable to determine configuration status
- the standby initially did not process archivelogs, but after configuration reset (disable and enable operations) it started to process them, but also reported errors in dgmgrl
- it was possible to log in from the primary to the standby
- it was possible to log in from the standby to the primary
- I copied password file from the primary to the standby to ensure this is the same file
- when calling show database on the primary the errors were reported for the standby
- when calling show database on the standby the errors were reported for the primary
It showed that the problem was generated by the TNS entries on the primary - apparently the TNS alias definitions for the primary and the standby there included some blank characters, which caused the alias to be read improperly in case of dgmgrl - at the same time simple call to those aliases through sqlplus went ok.
Once I removed all the formatting in the tnsnames.ora, whole thing started to work properly. What is a bit interesting is that the errors in the TNS were present only on the primary, while the dgmgrl calls malfunctioned on both ends.
Role order
Lately a colleague of mine run into the problem with too many roles - some account have them 150, while maximally allowed are 148. He wanted to establish the order in which the roles are assigned to the user as obviously only the excess over 148 was thrown away.
He has found that the roles are loaded with the following query:
He has found that the roles are loaded with the following query:
select /*+ connect_by_filtering */ privilege#,level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0;Thus it seems as the the order may be random, though it is possible that usually the data is returned in the order of insertion.
Friday, 29 June 2018
ORA-01554: transaction concurrency limit reached reason:no undo segment found with available slot params:0, 0 :
Some time ago I found such error on one of our databases. It is a bit mysterious as there are not much about it on Metalink (few bugs) or on the internet (at least when searching by Google). But indeed the database hit a limit of concurrent transactions in some very short period of time. On the performance chart in OEM there is a high peak with an INSERT to some custom table, with waits on TX - index contention, buffer busy, enq: SQ contention and undo segment tx slot.
The database is working with UNDO_MANAGEMENT=AUTO, which is a default setting and since 10g used very widely, so the administrator may simply forget about rollback segments management. According to the docs with this setting some work parameters relevant to the rollback management are simply ignored (TRANSACTIONS, TRANSACTIONS_PER_ROLLBACK_SEGMENT), while the number of transactions is limited by the undo tablespace size.
So it seems we should not experience this error, but rather session suspension and the error about lack of space in the undo tablespace.
Yet here are are. It is clear from the numbers below, that still there is a space in the undo (up to 4598 undo blocks used
What I could check is this:
Since the midnight the number of undo blocks consumed was 63475 with 524416 available blocks (so ~12%) and maximum query length was below 1h. Thus definitely we were far below the undo space limit. From this figures it is clear that still it is possible to hit an error typical for MANUAL undo management scenario, while on AUTO - as here we are.
And here few words of wisdom from Jonathan Lewis:
And Julian Dyke's presentation:
With all this in mind I would say that 1000 times higher transaction rate is in this particular case the main reason for the error. Apparently the db is not able to extend its concurrent transactions rate capability fast enough to respond to extended demand. Possible action here could be initial setting of _rollback_segment_count to a higher value than 12, which are set here.
The database is working with UNDO_MANAGEMENT=AUTO, which is a default setting and since 10g used very widely, so the administrator may simply forget about rollback segments management. According to the docs with this setting some work parameters relevant to the rollback management are simply ignored (TRANSACTIONS, TRANSACTIONS_PER_ROLLBACK_SEGMENT), while the number of transactions is limited by the undo tablespace size.
So it seems we should not experience this error, but rather session suspension and the error about lack of space in the undo tablespace.
Yet here are are. It is clear from the numbers below, that still there is a space in the undo (up to 4598 undo blocks used
What I could check is this:
select BEGIN_TIME, UNDOTSN, UNDOBLKS, TXNCOUNT, MAXQUERYLEN, MAXQUERYID, MAXCONCURRENCY, ACTIVEBLKS, UNEXPIREDBLKS, EXPIREDBLKS, TUNED_UNDORETENTION from v$undostat where begin_time between to_date('2018-06-25 14', 'yyyy-mm-dd hh24') and to_date('2018-06-25 16', 'yyyy-mm-dd hh24') order by begin_time;
BEGIN_TIME | UNDOBLKS | TXNCOUNT | MAXQUERYLEN | MAXCONCURRENCY | ACTIVEBLKS | UNEXPIREDBLKS | EXPIREDBLKS | TUNED_UNDORETENTION |
2018-06-25 14:39 | 0 | 78 | 1652 | 0 | 160 | 0 | 7168 | 2372 |
2018-06-25 14:49 | 1411 | 91 | 1653 | 2 | 160 | 0 | 7168 | 2373 |
2018-06-25 14:59 | 4598 | 104225 | 1653 | 131 | 160 | 4736 | 5888 | 2373 |
2018-06-25 15:09 | 2169 | 103321 | 933 | 311 | 160 | 6656 | 4352 | 1653 |
2018-06-25 15:19 | 2 | 282 | 1734 | 1 | 160 | 8064 | 2816 | 2454 |
And here few words of wisdom from Jonathan Lewis:
Taking a quick guess:
An (automatic) undo segment has 34 slots in the transaction table (the thing in the undo segment header).
If you and up with a small number N of undo segments then you can't have more than 34 * N concurrent transactions active because each one needs a transaction table slot and the next attempt to start a transaction would see ORA-01554
It seems a little unlikely that this would happan in typical system since Oracle would usually end up stealing an extent from an existing undo segment to use it as the first extent of a new undo segment. But it wouldn't be hard to set up a mechanism that started with a small number of undo segments in a relatively small undo tablespace - started some transactions (then leave them idle forever after the first change - e.g. they might have come in from a remote database) that made it impossible to free any extents, then did a lot of work to allocate all the extents (but still left a little space in allocated extents for continuting transactions), then started lots of transactions until you hit the error.
And Julian Dyke's presentation:
Undo segments are allocated at instance startup.
Undo segments can be added dynamically.
Each undo segment header contains
- Pool of free undo extents
- Set of undo slots
One undo slot is allocated to each transaction.
Undo slot contains list of undo extents.
Extents can migrate from one undo segment to another.
Undo slots are used cyclically:
- remain in header as long as possible
- reduces probability of ORA-01555: Snapshot too old
With all this in mind I would say that 1000 times higher transaction rate is in this particular case the main reason for the error. Apparently the db is not able to extend its concurrent transactions rate capability fast enough to respond to extended demand. Possible action here could be initial setting of _rollback_segment_count to a higher value than 12, which are set here.
Monday, 18 June 2018
High latch free on Exadata
Today I have investigated an issue with poor performance on some processing. On the production environment the thing have been usually accomplished in some 5 minutes, while on one of test databases it took once 5h to complete.
As this was a RAC on Exadata machine, AWR report was not that much helpful, though immediately was clear the main problem was a latch free event. The problem was, which latch was responsible here.
The performance chart in OEM did not provide SQL text (as apparently those calls were one-time SQL calls - candidates for dealing with them by setting cursor_sharing to FORCE).
A good step was to call the following query:
I've looked for this latch name and found few articles on the subject:
As this was a RAC on Exadata machine, AWR report was not that much helpful, though immediately was clear the main problem was a latch free event. The problem was, which latch was responsible here.
The performance chart in OEM did not provide SQL text (as apparently those calls were one-time SQL calls - candidates for dealing with them by setting cursor_sharing to FORCE).
A good step was to call the following query:
select name, (misses/decode(gets,0,1,gets) )*100 ratio, (immediate_misses/decode(immediate_gets,0,1,immediate_gets))*100 immediate_ratio, spin_gets, wait_time from v$latch where wait_time > 0 order by 2 ;While one can not be 100% sure the query indicates the right latch, it at least gives some clues. This time the only distinctive ratio (around 40%) was Result Cache: RC Latch. The other one with around 10% was resource manager latch but there the number of gets and misses in total was fairly small, so it was not that interesting.
I've looked for this latch name and found few articles on the subject:
- High "Latch Free" Waits with Contention on 'Result Cache: RC Latch' when RESULT_CACHE_MODE = MANUAL on Oracle 12c (Doc ID 2002089.1) on Metalink
- https://dban00b.wordpress.com/2015/04/21/311/
Friday, 8 June 2018
RMAN-3008/RMAN-3009 and RMAN-20095
Today I had the following problem:
How to move forward?
I found the 1st backup, which failed, then in this backup log found the place the 1st time the error occurred, then started to uncatalog since that place upward. Please note the uncataloging has to take place when connected only to the database (i.e. when working only with backaup entries stored within a control file) - otherwise rman will try as a 1st step implicitly resync the catalog which will lead to the error over and over again.
In the debug of call to resync catalog during problems there is the following section:
- initially the backup failed with
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03008: error while performing automatic resync of recovery catalog RMAN-20095: invalid backup/copy control file checkpoint SCN
- every next backup failed with
RMAN-03014: implicit resync of recovery catalog failed RMAN-03009: failure of partial resync command on default channel at 06/07/2018 04:18:50 RMAN-20095: invalid backup/copy control file checkpoint SCN
How to move forward?
I found the 1st backup, which failed, then in this backup log found the place the 1st time the error occurred, then started to uncatalog since that place upward. Please note the uncataloging has to take place when connected only to the database (i.e. when working only with backaup entries stored within a control file) - otherwise rman will try as a 1st step implicitly resync the catalog which will lead to the error over and over again.
change backuppiece '/path_to_backup/cf_YYYYY_c-283029066-20180607-02' uncatalog;The first call to uncatalog controlfile backup was enough.
In the debug of call to resync catalog during problems there is the following section:
DBGRESYNC: channel default: file# 0 [11:30:20.805] (resync) DBGRPC: krmxrpc - channel default kpurpc2 err=0 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CHECKBACKUPDATAFILE excl: 0 DBGRCVCAT: addBackupControlfile - Inside dup_val_on_index exception DBGRESYNC: channel default: Calling checkBackupDataFile for set_stamp 978149312 set_count 8542 recid 6038 [11:30:20.807] (resync) DBGRESYNC: channel default: file# 0 [11:30:20.807] (resync) DBGRPC: krmxrpc - channel default kpurpc2 err=20095 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CHECKBACKUPDATAFILE excl: 129 DBGRCVCAT: addBackupControlfile - Inside dup_val_on_index exception DBGRCVCAT: addBackupControlfile - ckp_scn 3201066972364 ckp_time 07-JUN-18 DBGRCVCAT: addBackupControlfile - lckp_scn 3201066972565 lckp_time 07-JUN-18 DBGRPC: krmxrpc - channel default kpurpc2 err=0 db=rcvcat proc=BSPEP.DBMS_RCVCAT.CANCELCKPT excl: 0 DBGRCVCAT: cancelCkpt - rollback, released all locks -- here the moment the error 20095 is catched DBGPLSQL: EXITED resync with status ORA--20095 [11:30:20.992] DBGRPC: krmxr - channel default returned from peicnt DBGMISC: ENTERED krmstrim [11:30:20.992] DBGMISC: Trimming message: ORA-06512: at line 3401 [11:30:20.992] (krmstrim) DBGMISC: (24) (krmstrim) DBGMISC: EXITED krmstrim with status 24 [11:30:20.992] elapsed time [00:00:00:00.000] DBGRPC: krmxr - channel default got execution errors (step_60) DBGRPC: krmxr - exiting with 1 DBGMISC: krmqexe: unhandled exception on channel default [11:30:20.992] DBGMISC: EXITED krmiexe with status 1 [11:30:20.992] elapsed time [00:00:01:21.889] [..] DBGMISC: error recovery releasing channel resources [11:30:20.992] DBGRPC: krmxcr - channel default resetted DBGMISC: ENTERED krmice [11:30:20.993] DBGMISC: command to be compiled and executed is: cleanup [11:30:20.993] (krmice) DBGMISC: command after this command is: NONE [11:30:20.993] (krmice) [..] DBGMISC: EXITED krmice [11:30:21.017] elapsed time [00:00:00:00.024] RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of resync command on default channel at 06/08/2018 11:30:20 RMAN-20095: invalid backup/copy control file checkpoint SCN
Tuesday, 22 May 2018
ORA-04030 on a process with large memory requirements
Lately I've met with the following problem. The database in version 11.2.0.4, the application with an evening batch job. At some point the batch job started to fail constantly, reaching around 4500M of allocated PGA. All the efforts to increase PGA_AGGREGATE_TARGET were void, we increased 1st to 5G (from 3), then to 8, but still failed at pretty much the same allocation.
Top allocations look as follows:
There are few solutions to the problem above - one is to rewrite the PL/SQL code to replace those large collections with GTT or another is limit collections to fetch only limited number of entries and process cursors in loops.
There is also (as it finally shows up) a possibility to deal with this threshold.
There are the following parameters, which govern this allocation:
In general it seems the memory available to the process is calculated as vm.max_map_count * _realfree_heap_pagesize_hint, thus one may affect one of multipliers or both of them, where max_map_count controls the number of allocations while realfree_heap_pagesize the amount of memory within one allocation.
Top allocations look as follows:
TOP 10 MEMORY USES FOR THIS PROCESS --------------------------------------- 70% 3204 MB, 205630 chunks: "pmuccst: adt/record " PL/SQL koh-kghu call ds=0x7f2e2ea3c938 dsprt=0xb8c04a0 30% 1356 MB, 87052 chunks: "pl/sql vc2 " PL/SQL koh-kghu call ds=0x7f2e2ea3c938 dsprt=0xb8c04a0I've found an article on Metalink (ORA-04030 Error For Memory Allocation Type "pmuccst: adt/record" (Doc ID 1513623.1)), but there was stated that this allocation is common for PL/SQL collections and it is not possible to limit it with PGA_AGGREGATE_TARGET (or _pga_max_size) as this is a different allocation bound with the heap of the process and governed with other settings. Similarly pl/sql vc2 allocation is common for local variables like varrays (ORA-04030 Error For Memory Allocation Type "pl/sql vc2" (Doc ID 1315884.1)).
There are few solutions to the problem above - one is to rewrite the PL/SQL code to replace those large collections with GTT or another is limit collections to fetch only limited number of entries and process cursors in loops.
There is also (as it finally shows up) a possibility to deal with this threshold.
There are the following parameters, which govern this allocation:
- vm.max_map_count, which is set to 65530 by default - on OS level
- _use_realfree_heap - hidden db parameter (in 11.2.0.4 defaults to TRUE) - this one enables the whole feature apparently
- _realfree_heap_pagesize_hint - hidden db parameter (in 11.2.0.4 defaults to 65536) - this one sets page size, so e.g. with increase to 262144 we can increase the memory allocation to around 16G
- _realfree_heap_pagesize (since 12c)
- _realfree_heap_max_size - hidden db parameter (in 11.2.0.4 defaults to 32768) - it is described as "minimum max total heap size", in Kbytes - not sure exactly what does that mean and what is controlled by that
In general it seems the memory available to the process is calculated as vm.max_map_count * _realfree_heap_pagesize_hint, thus one may affect one of multipliers or both of them, where max_map_count controls the number of allocations while realfree_heap_pagesize the amount of memory within one allocation.
Friday, 13 April 2018
ORA-12154 while duplicating from active database on RAC
A colleague of mine tried to rebuild a standby from some primary database. He prepared, added proper entries to tnsnames.ora on both clusters (primary and standby), then fired duplicate command. When it came to
Sometimes it is some 'invisible' character in tnsnames.ora, which gets in the way, sometimes simple lack of definition. But not this time, so this case is a bit more interesting...
The grid binaries and database binaries have different homes here and keep different tnsnames.ora files (as there is no shared TNS_ADMIN location).
And the solution was to add the required TNS entries also to the grid home.
My supposition is as follows:
restore clone from service 'ORCLPRIM' standby controlfile;the command failed with:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of Duplicate Db command at 04/13/2018 12:54:04 RMAN-05501: aborting duplication of target database RMAN-03015: error occurred in stored script Memory Script RMAN-06136: ORACLE error from auxiliary database: ORA-17629: Cannot connect to the remote database server ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified ORA-17629: Cannot connect to the remote database serverWe checked tnsnames.ora files, connections 'standby to primary' and 'primary to standby' - all seemed ok.
Sometimes it is some 'invisible' character in tnsnames.ora, which gets in the way, sometimes simple lack of definition. But not this time, so this case is a bit more interesting...
The grid binaries and database binaries have different homes here and keep different tnsnames.ora files (as there is no shared TNS_ADMIN location).
And the solution was to add the required TNS entries also to the grid home.
My supposition is as follows:
- the point where the command is issued is the standby instance, to which the script connects through statically registered interface.
- the listener is fired from the grid home (and not db), so its environment is derived from grid, even though in static registration definition the db home is pointed as ORACLE_HOME
- thus addition of TNS definitions to the grid home tnsnames.ora helped
- I 've found also that for some static registration entries there was added additional clause ENVS (disclaimer: this parameter in not supported on Windows, any process started by the listener will simply inherit the listener's environment) in the form
(SID_DESC = (SID_NAME = orclstb1) (ORACLE_HOME = [db home]) (GLOBAL_DBNAME = orclstb_dgmgrl) (ENVS = "TNS_ADMIN=[db home]/network/admin") )
so seems my supposition is correct and this way is another way to get around this issue
Monday, 9 April 2018
Some issue with external table over preprocessor
I built an external table based on the output from a preprocessor. Got the following explanation, why it does not work:
So what was wrong? The file I called was a link to the real script in fact. Once I dropped the link and copied the script source to the file with name referred within database, the problem vanished.
Trim whitespace same as SQL Loader KUP-04027: file name check failed: /u01/app/oracle/nx/datamasking/nx_masking.dat KUP-04017: OS message: Error 0 KUP-04118: operation "lstat64", location "skudmicdl"All seemed to be ok. The script returned the output when called manually. Returned value was 0.
So what was wrong? The file I called was a link to the real script in fact. Once I dropped the link and copied the script source to the file with name referred within database, the problem vanished.
Monday, 26 March 2018
kkzifr3g: Encountered error ORA-12057.
The solution is as follows:
-- as we initially do not know what is happening exactly alter system set events '12057 trace name errorstack level 3'; -- and wait until hit the issue again (note the command above dumps only 1st occurrence in a session) [.. this is from alert.log - this time much more info ..] Mon Mar 26 10:16:01 2018 kkzifr3g: Encountered error ORA-12057. Mon Mar 26 10:16:01 2018 Dumping diagnostic data in directory=[cdmp_20180326101601], requested by (instance=1, osid=60555319 (J002)), summary=[abnormal process termination]. Mon Mar 26 10:16:02 2018 Errors in file /u01/app/oracle/diag/rdbms/db1/db1/trace/db1_j005_24117326.trc: ORA-12057: materialized view or zonemap "TEST"."MV_DATA1" is invalid and must use complete refresh [..] -- and even more data in trace, but we care here mostly about the mv name -- then ORA-12057 During Fast Refresh of Aggregate Materialized View (Doc ID 1399651.1) -- to clean up alter system set events '12057 trace name context off';
Thursday, 22 March 2018
Fractured block found during backing up datafile
{hostname}, Hex dump of (file 75, block 87980) in trace file /oracle/diag/rdbms/db1/db1/trace/db1_ora_23614.trc Corrupt block relative dba: 0x12c157ac (file 75, block 87980) Fractured block found during backing up datafile Data in bad block: type: 6 format: 2 rdba: 0x12c157ac last change scn: 0x02d7.7dc0c4bc seq: 0x2 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xb1a20602 check value in block header: 0xc892 computed block checksum: 0x67fa Reread of blocknum=87980, file=+DATA/db1/datafile/file1.343.956407117. found valid data : dbv file=+DATA/db1/datafile/file1.343.956407117 blocksize=8192 start=87980 end=87981 userid=sys/blbla DBVERIFY: Release 11.2.0.4.0 - Production on Thu Mar 22 10:04:33 2018 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = +DATA/db1/datafile/file1.343.956407117 DBVERIFY - Verification complete Total Pages Examined : 2 Total Pages Processed (Data) : 0 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 2 Total Pages Failing (Index): 0 Total Pages Processed (Other): 0 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 0 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 0 (0.0)
Monday, 5 March 2018
Restricted session
-- at startup startup restrict; -- on running instance alter system enable restricted session; System altered. -- to disable alter system disable restricted session; -- to check select logins from v$instance; LOGINS ---------- RESTRICTED
Friday, 23 February 2018
ROLL_INVALID_MISMATCH (external link)
Great article on one of reasons to not share cursor https://magnusjohanssontuning.wordpress.com/2016/04/22/cursors-not-shared-due-to-roll_invalid_mismatch/
Password hash explained (external link)
An excellent article on password hashing in Oracle https://www.trustwave.com/Resources/SpiderLabs-Blog/Changes-in-Oracle-Database-12c-password-hashes/?page=1&year=0&month=0
Thursday, 22 February 2018
Insightful article on connections db terminated by firewalls (external link)
Very insightful article on why/when firewalls throw connections https://li555li.blogspot.se/2014/09/.
In short - to prevent this one have to set SQLNET.EXPIRE_TIME to {timeout on firewall}/2-1 in minutes.
In short - to prevent this one have to set SQLNET.EXPIRE_TIME to {timeout on firewall}/2-1 in minutes.
Wednesday, 31 January 2018
How to find a time for the latest DML on a table
Lately I've got such question from a co-worker. I looked around a bit and here are the findings:
- ORA_ROWSCN pseudocolumn - seems to be the most versatile solution here - the maximal value of ORA_ROWSCN would definitely point the time of the last modification for any table.
- VERSIONS_ENDSCN https://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_flashback.htm#ADFNS01009
- dba_tab_modifications - this is the table, which keeps number of DML operations since the last statistics collection. Thus it may be used to find the approx. time of a last modification assuming it is done after the latest stats collection, but it has of course a number of limitations
Thursday, 11 January 2018
Change of DB_UNIQUE_NAME in a database
Today I created a new database. I created with usage of Oracle Universal Installer graphic interface. I did not find the switch for advanced configuration, where I could change some parameters along the installation phase, so I was left over with this what gives the basic configuration for Server-class database option. May be I missed some tab or checkbox, does not matter.
Now the starting point is to have a 12c standalone database registered in clusterware.
Now the starting point is to have a 12c standalone database registered in clusterware.
- the simple change in the spfile ends with
SQL> alter system set db_unique_name='pizmak_node3' scope=spfile; alter system set db_unique_name='pizmak_node3' scope=spfile * ERROR at line 1: ORA-32017: failure in updating SPFILE ORA-65500: could not modify DB_UNIQUE_NAME, resource exists
- so I found "How to change the DB_UNIQUE_NAME in a RAC database (Doc ID 1604421.1)", and remove the database from clusterware, but then
SQL> startup mount ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/pizmak/spfilepizmak.ora' ORA-17503: ksfdopn:2 Failed to open file +DATA/pizmak/spfilepizmak.ora ORA-15056: additional error message ORA-17503: ksfdopn:2 Failed to open file +DATA/pizmak/spfilepizmak.ora ORA-15173: entry 'spfilepizmak.ora' does not exist in directory 'pizmak' ORA-06512: at line 4
Apparently the oratab entry was removed with the srvctl remove database command - so I found "Alert Log Error ORA-17503 ORA-15173 After Deleting Database. (Doc ID 1900471.1)"
- so I looked at $ORACLE_HOME/dbs and found pfile with SPFILE pointing to ASM location and put it aside
- after some time I discovered true reason - initially spfile was located in +DATA/PIZMAK/ (possibly on some stage of db installation), but eventually the location was +DATA/PIZMAK/PARAMETERFILE
- so I used the pfile in $ORACLE_BASE/admin/pizmak/pfile/, which is put there during installation
- so I started with startup nomount pfile=my_pfile_copy - this pfile had already db_unique_name changed to the one I wanted to, no changes to db_create_file_dest (as it would point to
- I've got info 12c is able to move files online to another location, so
SQL> select file_id||':'||file_name from dba_data_files; FILE_ID||':'||FILE_NAME -------------------------------------------------------------------------------- 6:+DATA/PIZMAK/DATAFILE/users.302.965122931 4:+DATA/PIZMAK/DATAFILE/undotbs1.303.965122931 3:+DATA/PIZMAK/DATAFILE/sysaux.300.965122739 1:+DATA/PIZMAK/DATAFILE/system.314.965139835 alter database move datafile 1; alter database move datafile 3; alter database move datafile 4; alter database move datafile 6; SQL> select file_id||':'||file_name from dba_data_files; FILE_ID||':'||FILE_NAME -------------------------------------------------------------------------------- 6:+DATA/PIZMAK_NODE3/DATAFILE/users.303.965140667 3:+DATA/PIZMAK_NODE3/DATAFILE/sysaux.301.965140607 4:+DATA/PIZMAK_NODE3/DATAFILE/undotbs1.300.965140655 1:+DATA/PIZMAK_NODE3/DATAFILE/system.314.965139835
- now logs
SQL> select member from v$logfile; MEMBER -------------------------------------------------------------------------------- +DATA/PIZMAK/ONLINELOG/group_3.310.965122991 +DATA/PIZMAK/ONLINELOG/group_3.311.965122991 +DATA/PIZMAK/ONLINELOG/group_2.308.965122991 +DATA/PIZMAK/ONLINELOG/group_2.309.965122991 +DATA/PIZMAK/ONLINELOG/group_1.306.965122991 +DATA/PIZMAK/ONLINELOG/group_1.307.965122991 SQL> select GROUP#, SEQUENCE#, MEMBERS, STATUS from v$log; GROUP# SEQUENCE# MEMBERS STATUS ---------- ---------- ---------- ------------------------------------------------ 1 7 1 INACTIVE 2 8 2 INACTIVE 3 9 2 CURRENT alter database drop logfile group 1; alter database add logfile group 1 size 512m; -- here in alert log (as this was 1st log file on REDO) an error -- Thu Jan 11 14:41:29 2018 -- ERROR: failed to establish dependency between database pizmak_node3 and diskgroup resource ora.REDO.dg alter database drop logfile group 2; alter database add logfile group 2 size 512m; -- no more errors as the structure already exists alter system switch logfile; alter system switch logfile; alter database drop logfile group 3; alter database add logfile group 3 size 512m; SQL> select member from v$logfile; MEMBER ------------------------------------------------------------------------------------------------------------------------ +REDO/PIZMAK_NODE3/ONLINELOG/group_3.275.965141729 +REDO/PIZMAK_NODE3/ONLINELOG/group_2.274.965141165 +REDO/PIZMAK_NODE3/ONLINELOG/group_1.256.965140887
I have to double members yet - now spfile
create spfile='+DATA' from pfile='/oracle/m010162/init.ora.011201894952'; -- and change in initpizmak.ora -- SPFILE='+DATA/pizmak_node3/parameterfile/spfile.311.965142657' !cp initpizmak.ora $ORACLE_HOME/dbs/
- to move temporary tablespace I just created a new one and switched default temporary tablespace to it
create temporary tablespace temp2 tempfile size 128m autoextend on next 128m maxsize 16g; alter database default temporary tablespace temp2; -- then as I wanted temporary tablespace named TEMP I did that once again create temporary tablespace temp tempfile size 128m autoextend on next 128m maxsize 16g; alter database default temporary tablespace temp; SQL> select file_name from dba_temp_files; FILE_NAME ------------------------------------------------------------------------------------------------------------------------ +DATA/PIZMAK_NODE3/TEMPFILE/temp.312.965143967
- to move control files I used asmcmd
- change control_files parameter to new location with scope spfile
- shutdown immediate
- copy or move control files to the new location in asmcmd (with cp one have to specify the name of the target file, but without instance/revision? number)
- start the db
Tuesday, 2 January 2018
ORA-27303: additional information: startup egid = 5003 (asmadmin), current egid = 5000 (oinstall)
I've got the following info on several occasions:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== ORA-27140: attach to post/wait facility failed ORA-27300: OS system dependent operation:invalid_egid failed with status: 1 ORA-27301: OS failure message: Operation not permitted ORA-27302: failure occurred at: skgpwinit6 ORA-27303: additional information: startup egid = 5003 (asmadmin), current egid = 5000 (oinstall)It seems the problem arises from wrong permissions on oracle binary files. More explanations here - https://asanga-pradeep.blogspot.dk/2012/06/ora-27303-additional-information.html.
Subscribe to:
Posts (Atom)