Tuesday 24 May 2016

ORA-06502: PL/SQL: numeric or value error on auto execute of job "SYS"."BSLN_MAINTAIN_STATS_JOB" (external link)

The background of this issue is frequently bound to some cloning or copying some template database to a new instance. The Oracle Support (former Metalink) for version 11.2.0.3 suggests to install patch "Patch 10110625: DBSNMP.BSLN_INTERNAL RECEIVES ORA-6502" or to recreate a DBSNMP user. However by googling I 've found more finesse solution - https://oracledbazone.com/2012/04/17/ora-12012-error-on-auto-execute-of-job-sys-bsln_maintain_stats_job-2/.

Friday 20 May 2016

A data block was corrupted in alert log on Exadata

The following excerpt in the alert log on one of instances on Exadata:
Mon May 16 08:22:08 2016
Errors in file /u01/app/oracle/diag/rdbms//4/trace/4_dbw0_103431.trc:
ORA-01186: file 4097 failed verification tests
ORA-01157: cannot identify/lock data file 4097 - see DBWR trace file
ORA-01110: data file 4097: '+DATA1'
Mon May 16 08:22:08 2016
File 4097 not verified due to error ORA-01157
Mon May 16 08:22:08 2016
Dictionary check complete
Mon May 16 08:22:08 2016
Errors in file /u01/app/oracle/diag/rdbms//4/trace/4_dbw0_103431.trc:
ORA-01186: file 4097 failed verification tests
ORA-01157: cannot identify/lock data file 4097 - see DBWR trace file
ORA-01110: data file 4097: '+DATA1'
Mon May 16 08:22:08 2016
File 4097 not verified due to error ORA-01157
Mon May 16 08:22:09 2016
Re-creating tempfile +DATA1 as +DATA1//TEMPFILE/temp.5375.911982129

While at first it looks somewhat dangerous, it is a minor issue and self-healing. First of all one have to know the database in question is a standby, and it is in open mode with log apply (so Active Data Guard option in use). The file 4097 has id just above the db_files parameter - which immediately indicates this is a temporary file and such files do not exist by default on standbys - hence the inability to identify the file.
Now look at the last line in the excerpt - recreation of the temporary file, which heals the problem.

Thus if one looks at the v$database_block_corruptions, there is nothing there. In dba_temp_files also one does not meet the re-created tempfile - while this is standby, it does not change sources for this view apparently, while the file is listed in the v$tempfile.

Wednesday 11 May 2016

No home entry in oraInventory

Today I hit the following scenario - no agent home entry in oraInventory, while I require it when running a script for checking the db installation quality. Due to this the script reported errors in several points, while it should not.
The solution is quite strightforward. In the agent home under AGENT_HOME/oui/bin there is a script attachHome.sh.
#!/bin/sh
OHOME=
OHOMENAME=agent12c1
CUR_DIR=`pwd`
cd $OHOME/oui/bin
./runInstaller -detachhome ORACLE_HOME=$OHOME ORACLE_HOME_NAME=$OHOMENAME $* > /dev/null 2>&1
./runInstaller -attachhome ORACLE_HOME=$OHOME ORACLE_HOME_NAME=$OHOMENAME $* 
cd $CUR_DIR
The script in fact reattaches the home and after the call my script worked properly again.

ORA-00600: internal error code, arguments: [ktbConvertRedo_1]

Today we experienced a terrible disaster caused by some corrupted data in a table. It started with
ORA-04031: unable to allocate 20504 bytes of shared memory 
("shared pool","unknown object","sga heap(2,1)","KTI SGA freea")
and after several occurences also
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
and then problems with SMON, after few SMON terminations also instance termination by PMON.

After some investigation we found 2 articles on Metalink (one without a clue, the 2nd one with info to set a certain event to find a corrupted object/table and to remove it).
The reason was apparently a corruption in a table - the main problem was to find the id of that object.
Finally one of us spotted the lines below:
Block recovery completed at rba 2428.149064.16, scn 31.305977921
Errors in file /oracle/diag/rdbms/sbiu_prodora03/sbiu/trace/sbiu_smon_28380.trc  (incident=230673):
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance sbiu (pid = 18) - Error 600 encountered while recovering transaction (51, 6) on object 144492.
The corrupted object was the one with id 144492. After dropping the object (remember to purge recyclebin if on) database returned to normal work.
In this case we were very,very lucky as the table we dropped was kind of log of errors and we knew it - nobody will cry over some debugging info when "forests are ablaze ".

Friday 6 May 2016

Db restore of the standby (more or less manual)

  1. remove old db files - controlfiles, log files (online and standby), data files, temp files - it is recommended to do it in the first place before any restore will start as it may be quite easy to choose wrong files for removal if the restored files are mixed with the old ones
  2. check db_create_% parameters
    SQL> alter system set db_create_online_log_dest_1='/ora_log/redolog1';
    
    System został zmieniony.
    
    SQL> alter system set db_create_online_log_dest_2='/ora_log/redolog2';
    
    System został zmieniony.
    
    SQL> show parameter db_create
    
    NAME                                 TYPE         VALUE
    ------------------------------------ ------------ ------------------------------
    db_create_file_dest                  string       /ora_db/
    db_create_online_log_dest_1          string       /ora_log/redolog1
    db_create_online_log_dest_2          string       /ora_log/redolog2
    db_create_online_log_dest_3          string
    db_create_online_log_dest_4          string
    db_create_online_log_dest_5          string
    
    
  3. (optionally) here one may create a standby controlfile on PRIMARY and copy to planned standby, then mount
    ALTER DATABASE CREATE STANDBY CONTROLFILE 
      AS '/disk1/oracle/oradata/payroll/standby/payroll2.ctl';
    
  4. or directly restore from backup (which is much more comfortable)
    connect target sys/pass1;
    connect catalog rman_catalog/pass2@rman_catalog;
    spool log to rman_db_restore_20160505.log
    run
    {
    allocate channel t1 type 'SBT_TAPE' FORMAT 'db_%t_%s_%p' parms 
      'ENV=(TDPO_OPTFILE=/TSM/CFG/tdpo.opt)';
    allocate channel t2 type 'SBT_TAPE' FORMAT 'db_%t_%s_%p' parms 
      'ENV=(TDPO_OPTFILE=/TSM/CFG/tdpo.opt)';
    # in case one restored controlfiles already - next 2 lines has to be removed
    restore standby controlfile; 
    alter database mount;
    SET NEWNAME FOR DATABASE TO NEW;
    restore database;
    SWITCH DATAFILE ALL;
    SWITCH TEMPFILE ALL;
    recover database DELETE ARCHIVELOG MAXSIZE 90G;
    
    release channel t1;
    release channel t2;
    }
    exit;
    

Now it is possible we have to add more archivelogs, but that would mean repeating the last line from the script above, until db is fine. Then one may want to fix some issues like paths to online and standby redo log files, etc.
From this time on the standby is ready to use - the main point at this moment is to restore the managed standby database recovery process. This task would require first checking the parameters like log_archive_config, log_archive_dest_* and log_archive_dest_state_*, fal_server and fal_client if they are properly configured. Of course in the first place the communication between primary and standby db nodes has to be checked by sqlplus for example.
Now we call
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
The output in alert logs on both nodes should confirm that the connections were built and the transfer of archivelogs has started.

Good to know

There exists bug 7553431 - RMAN restore controlfile for standby does not create correct controlfile type (Doc ID 7553431.8), which influences the role of restored database, fixed in 12c version.

Wednesday 4 May 2016

ORA-29283: invalid file operation

Today I 've got the following problem: the procedure, which was planned to write a file through UTL_FILE to a mounted resource, is unable to do so despite the proper privileges on the directory. At the same time the local OS user (db owner) was able to do it without problem. The whole thing was done on an AIX OS, and the mounted resource was presented through NFS - in general many points of problems.

The investigation

Running a test procedure produced something like:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
ORA-06512: at "TCCS.TEST_UTL_FILE", line 6
ORA-06512: at line 1
29283. 00000 -  "invalid file operation"
*Cause:    An attempt was made to read from a file or directory that does
           not exist, or file or directory access was denied by the
           operating system.
*Action:   Verify file and directory access privileges on the file system,
           and if reading, verify that the file exists.
The truss on the db process while calling the test procedure
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208) (sleeping...)
kread(16, "\0 _\0\006\0\0\0\0\011 i".., 8208)   = 95
statx("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE9140, 176, 010) Err#13 EACCES
statfs("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE8D20) Err#13 EACCES
statx("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE8FF8, 176, 011) Err#13 EACCES
kopen("/home/tccs/files/out/utlout.txt", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH) Err#13 EACCES
kwrite(16, "\0\v\0\0\f\0\0\001\001", 11)        = 11
kwrite(16, "\0\v\0\0\f\0\0\001\002", 11)        = 11
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208)   = 11
lseek(5, 701440, 0)                             = 701440
kread(5, "\00F r V\0\0\0 b r W\0\0".., 512)     = 512
lseek(5, 162304, 0)                             = 162304
kread(5, "\0\n19 n\0\0\0 D19 o\0\0".., 512)     = 512
lseek(5, 701440, 0)                             = 701440
kread(5, "\00F r V\0\0\0 b r W\0\0".., 512)     = 512
lseek(5, 162304, 0)                             = 162304
kread(5, "\0\n19 n\0\0\0 D19 o\0\0".., 512)     = 512
kwrite(16, "\0 �\0\006\0\0\0\0\00401".., 233)   = 233
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208) (sleeping...)
The 1st investigation point was brought by the article at http://jakub.wartak.pl/blog/?p=272. But after a while it become clear that, though the conditions were met and we changed the ownership of the mount point, the problem has persisted.
The 2nd guess was by suggestion in the article on MOS:Session Accessing File On Disk Fail with OS File Permission Errors (Doc ID 392647.1). And simple check that the local user connected on BEQUEATH is able to run such procedure without problem proofed that this is the right way - after the listener restart the remote users also acquired the ability to run procedure without problems. The root cause was that the db owner was added to the group being the owner of the target directory long after the listener was started - hence the remote sessions did not acquire the proper permissions on the directory, while the local sessions with "fresh" permissions worked ok.

Tuesday 5 April 2016

Long DROP USER command

Lately I called a DROP USER command - as this was a slow environment (and I once called this command on the same user previously) I expected it to take long time. The working hours passed by, I was observing it - it waited on the "db file sequential read", so I did not bother to check the db health at all - I went home.
The next day I saw the command failed with error, that there are no temporary files' space.

Seems I forgot to finish this note as Ivan Korac has pointed me. I do not remember correctly now, as it was some time ago. The finish is I've found somewhere a note about that (possibly on Metalink) with supposition to drop that user objects first - and I did a test for that. It showed that time for dropping a user without preparations is almost the same as dropping objects first (this one schema has really many of them) and drop empty schema afterwards.
One difference though is that dropping one object is faster after all. The lack of temporary files space was an additional issue which went along with my command by accident IIRC.

Tuesday 1 March 2016

Using wallet

  1. wallet creation
    # I usally place it under TNS_ADMIN or one level below in a separate 
    # directory e.g. TNS_ADMIN=/oracle/admin/network/admin and then wallet
    # is in /oracle/admin/network/wallet
    # the call requires to specify a password twice, which is a global 
    # password to the wallet
    $ mkstore -wrl /oracle/admin/network/wallet/my.wallet -create
    Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
    Copyright (c) 2004, 2013, Oracle and/or its affiliates. 
    All rights reserved.
    
    Enter password:            
       
    Enter password again:            
       
    
  2. adding entries to the wallet
    • creating an entry in the tnsnames.ora - as one will connect through that entry without password
      alias_in_tns = (DESCRIPTION = 
        (ADDRESS = (PROTOCOL = TCP) (HOST = host1) (PORT = 1521))
        (CONNECT_DATA = (SERVICE_NAME = service1)))
      
    • creating credentials for user1 in the wallet (db_user1 is TNS entry, user1 the schema name, pass is a password)
      [oracle@ap-murex1t admin]$ mkstore -wrl [path to wallet directory] \
       -createCredential alias_in_tnsnames user1 pass
      Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
      Copyright (c) 2004, 2013, Oracle and/or its affiliates. 
      All rights reserved.
      
      Enter wallet password:            
         
      Create credential oracle.security.client.connect_string1
      

The default privileges on the wallet directory are set only for the wallet owner, so other users would get the ORA-12578 and be asked for credentials. Of course this has a perfect sense to limit access to the wallet. However sometimes You need it on a test environment and may abuse it simply by setting permissions also for others.
The whole thing works of course only if one provide correct information in the sqlnet.ora file.
WALLET_LOCATION = (SOURCE = 
  (METHOD = FILE)(METHOD_DATA = (DIRECTORY = [path to wallet directory])))
SQLNET.WALLET_OVERRIDE = TRUE 

Friday 19 February 2016

minact-scn: useg scan erroring out with error e:12751

Lately I've met in alert.log entries the following:
minact-scn: useg scan erroring out with error e:12751
According to the Metalink it is bound with high load on the system and that an undo segment scan has failed with an ORA-12751 error.
And ORA-12751 is described as "cpu time or run time policy violation"

In my case though the load on the system is not very high and the system is actually used currently for one task only, which is a massive data pump import. The import leads to temporary lack of space for archivelogs (ORA-16038, ORA-19504, ORA-16014). Of course there is a pressure on the instance trying to save archivelogs and I suppose that this state causes such notes in alert.log.

Also emergency flushes occur then:
Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 8388608 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
select total_size,awr_flush_emergency_count from v$ash_info; 

ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 0

Today I run into the following error:
ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 0
What was the most annoying, it was the fact, I can not start the instance due to this error occurence. In general within a screen session I opened sqlplus session and within it the vim for editing the newly generated pfile. Then I shut down the instance and further I edited some memory settings. So far, so good - I started the instance and got this error. And I did not even specified this parameter in the configuration.
Tried to find something on the internet and on the Metalink:
  1. http://askdba.org/weblog/2008/04/tweaking-_shared_pool_reserved_min_alloc-and-ora-4031/
  2. Ora-00828 With Automatic Shared Memory Management (Doc ID 1251784.1)
I have checked all the possibilities - look for mistakes and misspellings in settings specification and all for nothing. Finally the colleague of mine created a pfile, then created a spfile from that pfile (no changes at all), then started instance with force and voila, the instance worked like a charm.
Not sure what was wrong - the only clue is the possibility I introduced into pfile some hidden character, so the true reason was similar to the one described under the 1st link mentioned above.
Such theory is supported by the following facts:
  • the error message as the upper bound displays 0 (which is nonsense) - however I suppose the lower bound is simply hard coded as 4000 (or 4400) bytes and the upper bound derives its value from the calculation based on the value of shared pool size.
  • assuming something was wrong with the shared pool size specification I suppose it would be set to 0 and hence the upper bound also would be 0 (as any % of 0 is still 0 ;-))
The conclusion: if the upper bound of the ORA-00093 message is set to 0, check Your pfile and spfile byte by byte.
The update Hit this error again, and from this occurrence I draw another conclusion. This is rather not about character in the pfile or spfile. This time the version used is 12c. I set the SGA settings too low and here is the dialog with the database:
-- SGA set to 1G
SQL> startup
ORA-00821: Specified value of sga_target 1024M is too small, needs to be at least 1456M
SQL> create spfile from pfile='db4b.pfile.20170321' ;

File created.

-- SGA increased to 1600M, 1800M, finally 2G (which was the initial setting)
-- still ORA-00093
SQL> startup
ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 0
[..]
SQL> startup
ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 0

-- and here the only difference - force
SQL> startup force
ORACLE instance started.

Total System Global Area 2147483648 bytes
Fixed Size      3712904 bytes
Variable Size   1577060472 bytes
Database Buffers   553648128 bytes
Redo Buffers     13062144 bytes
Database mounted.
Database opened.

As You see the startup force show up as the key step to solve the problem - apparently when I opened sqlplus some settings were established and not cleared on startup failure.

Monday 8 February 2016

Large Pages Implementation in Windows (external link)

Short note on how to enable large pages support for Oracle on Windows - link

Monday 4 January 2016

About Synergy tool

This time short article somewhat off topic - as it is about a workstation tool and not really bound with databases.

Overview

Synergy is a tool allowing to bound all the workstations one uses to one common set of input peripherals (i.e. mouse and keyboard). It services also transferring the clipboard content between workstations, so greatly enhances the productivity. It is provided for Windows, Linux and Mac OS X, so I would say it covers 99,99% of the workstations' os population.

My architecture

My architecture means 2 laptops - the 1st one run by Linux (current Ubuntu) and the 2nd one run by Windows 7. The package is called synergy in version 1.7.0. and on Windows side I have synergy in version 1.3.4.

My configuration

The synergy on Linux side is configured to be a server. The keyboard and mouse of the Linux workstation is the common input set then. On the Windows the software works as a client. As I have a community version, the communication between workstations would not be secure if I would simply rely on the Synergy itself. One may buy the PRO version to have the transfer secured by SSH and I suppose this is a very convenient solution. Here I provide a "poor-man" solution where synergy communication is transferred through SSH tunnel.

Client and server insecure configuration

For the client I check the 1st radio button (Use another computer's keyboard and mouse) here and specify the IP of the server.
For the server I choose here the Server checkbox and configure interactively - on the interactive configuration window I set the order of the workstations. I believe that the available screens are displayed by checking all the available clients, so I would start with the insecure connection. The connection would be done on the 24800 port.
As You may see this configuration is very easy.

Securing the connection

Now time to secure it. I did it by opening a SSH tunnel from the client to the server, so actually all is done on Windows side (which has some pros and cons). The only thing on the server is a SSH key generation (ssh-keygen -t rsa -b 2048).
First thing on Windows is a download of the Putty SSH client. Then I generated an SSH key (PUTTYGEN.EXE SSH2 RSA algorithm and 2048 bits for length) and save both keys (private and public) to files. Next I put the public key to the server's authorized_keys file to connect by key ($HOME/.ssh/authorized_keys - if it does not exist, just create it with Your favourite editor). To move the key one may use currently working insecure synergy installation :-).
Additional step is to build a batch file to run pageant.exe (which is SSH agent to keep the password in memory to open the key)
REM content of the batch file
start "" c:\app\putty\pageant.exe "C:\Users\{user}\Documents\win2priv.ppk"
This batch file has to be put to Startup group to run just after the Windows initialization.
Further I needed to create an executable to establish a tunnel to the server. I tried to do it with another batch calling plink.exe (another executable from Putty), but it did not work (may be I did something wrong), so I tried configuring a session in Putty - to do it one have to call Putty, create a session to the server and then in Connection->SSH->Tunnels add an entry (Source port: 24800, destination: 127.0.0.1:24800 and options Local and Auto) with the Add button.

Now we are ready to open the connection to the server (beforehand however we need to disable current synergy connection as it is established on the same port). After clicking the Open button one have to set the server IP to 127.0.0.1 and run the synergy client with the Start button.

Important notes

Due to some reason not clear to me usage of Putty failed and I had to exchange the Putty binaries with Kitty binaries (which is pretty much the same). With Kitty whole thing started to work smoothly.
Pros
The connection is secure :-)
Cons for Linux being server
  • during the Windows initialization the SSH agent will ask for the password to the SSH key
  • one have to open a tunnel manually - You may try with plink.exe (klink.exe from Kitty) - this probably may be replaced with some Powershell script, but I have no ability to manipulate the Powershell scripts here
Cons for Windows being server
  • on the Linux side all things may be nicely automated - no problem with opening a tunnel in background
  • one have to install and run an SSH server on the Windows side