Tuesday, 24 May 2016

ORA-06502: PL/SQL: numeric or value error on auto execute of job "SYS"."BSLN_MAINTAIN_STATS_JOB" (external link)

The background of this issue is frequently bound to some cloning or copying some template database to a new instance. The Oracle Support (former Metalink) for version 11.2.0.3 suggests to install patch "Patch 10110625: DBSNMP.BSLN_INTERNAL RECEIVES ORA-6502" or to recreate a DBSNMP user. However by googling I 've found more finesse solution - https://oracledbazone.com/2012/04/17/ora-12012-error-on-auto-execute-of-job-sys-bsln_maintain_stats_job-2/.

Friday, 20 May 2016

A data block was corrupted in alert log on Exadata

The following excerpt in the alert log on one of instances on Exadata:
Mon May 16 08:22:08 2016
Errors in file /u01/app/oracle/diag/rdbms//4/trace/4_dbw0_103431.trc:
ORA-01186: file 4097 failed verification tests
ORA-01157: cannot identify/lock data file 4097 - see DBWR trace file
ORA-01110: data file 4097: '+DATA1'
Mon May 16 08:22:08 2016
File 4097 not verified due to error ORA-01157
Mon May 16 08:22:08 2016
Dictionary check complete
Mon May 16 08:22:08 2016
Errors in file /u01/app/oracle/diag/rdbms//4/trace/4_dbw0_103431.trc:
ORA-01186: file 4097 failed verification tests
ORA-01157: cannot identify/lock data file 4097 - see DBWR trace file
ORA-01110: data file 4097: '+DATA1'
Mon May 16 08:22:08 2016
File 4097 not verified due to error ORA-01157
Mon May 16 08:22:09 2016
Re-creating tempfile +DATA1 as +DATA1//TEMPFILE/temp.5375.911982129

While at first it looks somewhat dangerous, it is a minor issue and self-healing. First of all one have to know the database in question is a standby, and it is in open mode with log apply (so Active Data Guard option in use). The file 4097 has id just above the db_files parameter - which immediately indicates this is a temporary file and such files do not exist by default on standbys - hence the inability to identify the file.
Now look at the last line in the excerpt - recreation of the temporary file, which heals the problem.

Thus if one looks at the v$database_block_corruptions, there is nothing there. In dba_temp_files also one does not meet the re-created tempfile - while this is standby, it does not change sources for this view apparently, while the file is listed in the v$tempfile.

Wednesday, 11 May 2016

No home entry in oraInventory

Today I hit the following scenario - no agent home entry in oraInventory, while I require it when running a script for checking the db installation quality. Due to this the script reported errors in several points, while it should not.
The solution is quite strightforward. In the agent home under AGENT_HOME/oui/bin there is a script attachHome.sh.
#!/bin/sh
OHOME=
OHOMENAME=agent12c1
CUR_DIR=`pwd`
cd $OHOME/oui/bin
./runInstaller -detachhome ORACLE_HOME=$OHOME ORACLE_HOME_NAME=$OHOMENAME $* > /dev/null 2>&1
./runInstaller -attachhome ORACLE_HOME=$OHOME ORACLE_HOME_NAME=$OHOMENAME $* 
cd $CUR_DIR
The script in fact reattaches the home and after the call my script worked properly again.

ORA-00600: internal error code, arguments: [ktbConvertRedo_1]

Today we experienced a terrible disaster caused by some corrupted data in a table. It started with
ORA-04031: unable to allocate 20504 bytes of shared memory 
("shared pool","unknown object","sga heap(2,1)","KTI SGA freea")
and after several occurences also
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
and then problems with SMON, after few SMON terminations also instance termination by PMON.

After some investigation we found 2 articles on Metalink (one without a clue, the 2nd one with info to set a certain event to find a corrupted object/table and to remove it).
The reason was apparently a corruption in a table - the main problem was to find the id of that object.
Finally one of us spotted the lines below:
Block recovery completed at rba 2428.149064.16, scn 31.305977921
Errors in file /oracle/diag/rdbms/sbiu_prodora03/sbiu/trace/sbiu_smon_28380.trc  (incident=230673):
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
ORA-00600: internal error code, arguments: [ktbConvertRedo_1], [], [], [..]
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance sbiu (pid = 18) - Error 600 encountered while recovering transaction (51, 6) on object 144492.
The corrupted object was the one with id 144492. After dropping the object (remember to purge recyclebin if on) database returned to normal work.
In this case we were very,very lucky as the table we dropped was kind of log of errors and we knew it - nobody will cry over some debugging info when "forests are ablaze ".

Friday, 6 May 2016

Db restore of the standby (more or less manual)

  1. remove old db files - controlfiles, log files (online and standby), data files, temp files - it is recommended to do it in the first place before any restore will start as it may be quite easy to choose wrong files for removal if the restored files are mixed with the old ones
  2. check db_create_% parameters
    SQL> alter system set db_create_online_log_dest_1='/ora_log/redolog1';
    
    System został zmieniony.
    
    SQL> alter system set db_create_online_log_dest_2='/ora_log/redolog2';
    
    System został zmieniony.
    
    SQL> show parameter db_create
    
    NAME                                 TYPE         VALUE
    ------------------------------------ ------------ ------------------------------
    db_create_file_dest                  string       /ora_db/
    db_create_online_log_dest_1          string       /ora_log/redolog1
    db_create_online_log_dest_2          string       /ora_log/redolog2
    db_create_online_log_dest_3          string
    db_create_online_log_dest_4          string
    db_create_online_log_dest_5          string
    
    
  3. (optionally) here one may create a standby controlfile on PRIMARY and copy to planned standby, then mount
    ALTER DATABASE CREATE STANDBY CONTROLFILE 
      AS '/disk1/oracle/oradata/payroll/standby/payroll2.ctl';
    
  4. or directly restore from backup (which is much more comfortable)
    connect target sys/pass1;
    connect catalog rman_catalog/pass2@rman_catalog;
    spool log to rman_db_restore_20160505.log
    run
    {
    allocate channel t1 type 'SBT_TAPE' FORMAT 'db_%t_%s_%p' parms 
      'ENV=(TDPO_OPTFILE=/TSM/CFG/tdpo.opt)';
    allocate channel t2 type 'SBT_TAPE' FORMAT 'db_%t_%s_%p' parms 
      'ENV=(TDPO_OPTFILE=/TSM/CFG/tdpo.opt)';
    # in case one restored controlfiles already - next 2 lines has to be removed
    restore standby controlfile; 
    alter database mount;
    SET NEWNAME FOR DATABASE TO NEW;
    restore database;
    SWITCH DATAFILE ALL;
    SWITCH TEMPFILE ALL;
    recover database DELETE ARCHIVELOG MAXSIZE 90G;
    
    release channel t1;
    release channel t2;
    }
    exit;
    

Now it is possible we have to add more archivelogs, but that would mean repeating the last line from the script above, until db is fine. Then one may want to fix some issues like paths to online and standby redo log files, etc.
From this time on the standby is ready to use - the main point at this moment is to restore the managed standby database recovery process. This task would require first checking the parameters like log_archive_config, log_archive_dest_* and log_archive_dest_state_*, fal_server and fal_client if they are properly configured. Of course in the first place the communication between primary and standby db nodes has to be checked by sqlplus for example.
Now we call
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
The output in alert logs on both nodes should confirm that the connections were built and the transfer of archivelogs has started.

Good to know

There exists bug 7553431 - RMAN restore controlfile for standby does not create correct controlfile type (Doc ID 7553431.8), which influences the role of restored database, fixed in 12c version.

Wednesday, 4 May 2016

ORA-29283: invalid file operation

Today I 've got the following problem: the procedure, which was planned to write a file through UTL_FILE to a mounted resource, is unable to do so despite the proper privileges on the directory. At the same time the local OS user (db owner) was able to do it without problem. The whole thing was done on an AIX OS, and the mounted resource was presented through NFS - in general many points of problems.

The investigation

Running a test procedure produced something like:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
ORA-06512: at "TCCS.TEST_UTL_FILE", line 6
ORA-06512: at line 1
29283. 00000 -  "invalid file operation"
*Cause:    An attempt was made to read from a file or directory that does
           not exist, or file or directory access was denied by the
           operating system.
*Action:   Verify file and directory access privileges on the file system,
           and if reading, verify that the file exists.
The truss on the db process while calling the test procedure
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208) (sleeping...)
kread(16, "\0 _\0\006\0\0\0\0\011 i".., 8208)   = 95
statx("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE9140, 176, 010) Err#13 EACCES
statfs("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE8D20) Err#13 EACCES
statx("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE8FF8, 176, 011) Err#13 EACCES
kopen("/home/tccs/files/out/utlout.txt", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH) Err#13 EACCES
kwrite(16, "\0\v\0\0\f\0\0\001\001", 11)        = 11
kwrite(16, "\0\v\0\0\f\0\0\001\002", 11)        = 11
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208)   = 11
lseek(5, 701440, 0)                             = 701440
kread(5, "\00F r V\0\0\0 b r W\0\0".., 512)     = 512
lseek(5, 162304, 0)                             = 162304
kread(5, "\0\n19 n\0\0\0 D19 o\0\0".., 512)     = 512
lseek(5, 701440, 0)                             = 701440
kread(5, "\00F r V\0\0\0 b r W\0\0".., 512)     = 512
lseek(5, 162304, 0)                             = 162304
kread(5, "\0\n19 n\0\0\0 D19 o\0\0".., 512)     = 512
kwrite(16, "\0 �\0\006\0\0\0\0\00401".., 233)   = 233
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208) (sleeping...)
The 1st investigation point was brought by the article at http://jakub.wartak.pl/blog/?p=272. But after a while it become clear that, though the conditions were met and we changed the ownership of the mount point, the problem has persisted.
The 2nd guess was by suggestion in the article on MOS:Session Accessing File On Disk Fail with OS File Permission Errors (Doc ID 392647.1). And simple check that the local user connected on BEQUEATH is able to run such procedure without problem proofed that this is the right way - after the listener restart the remote users also acquired the ability to run procedure without problems. The root cause was that the db owner was added to the group being the owner of the target directory long after the listener was started - hence the remote sessions did not acquire the proper permissions on the directory, while the local sessions with "fresh" permissions worked ok.

Tuesday, 5 April 2016

Long DROP USER command

Lately I called a DROP USER command - as this was a slow environment (and I once called this command on the same user previously) I expected it to take long time. The working hours passed by, I was observing it - it waited on the "db file sequential read", so I did not bother to check the db health at all - I went home.
The next day I saw the command failed with error, that there are no temporary files' space.

Seems I forgot to finish this note as Ivan Korac has pointed me. I do not remember correctly now, as it was some time ago. The finish is I've found somewhere a note about that (possibly on Metalink) with supposition to drop that user objects first - and I did a test for that. It showed that time for dropping a user without preparations is almost the same as dropping objects first (this one schema has really many of them) and drop empty schema afterwards.
One difference though is that dropping one object is faster after all. The lack of temporary files space was an additional issue which went along with my command by accident IIRC.