Friday 6 May 2016

Db restore of the standby (more or less manual)

  1. remove old db files - controlfiles, log files (online and standby), data files, temp files - it is recommended to do it in the first place before any restore will start as it may be quite easy to choose wrong files for removal if the restored files are mixed with the old ones
  2. check db_create_% parameters
    SQL> alter system set db_create_online_log_dest_1='/ora_log/redolog1';
    
    System został zmieniony.
    
    SQL> alter system set db_create_online_log_dest_2='/ora_log/redolog2';
    
    System został zmieniony.
    
    SQL> show parameter db_create
    
    NAME                                 TYPE         VALUE
    ------------------------------------ ------------ ------------------------------
    db_create_file_dest                  string       /ora_db/
    db_create_online_log_dest_1          string       /ora_log/redolog1
    db_create_online_log_dest_2          string       /ora_log/redolog2
    db_create_online_log_dest_3          string
    db_create_online_log_dest_4          string
    db_create_online_log_dest_5          string
    
    
  3. (optionally) here one may create a standby controlfile on PRIMARY and copy to planned standby, then mount
    ALTER DATABASE CREATE STANDBY CONTROLFILE 
      AS '/disk1/oracle/oradata/payroll/standby/payroll2.ctl';
    
  4. or directly restore from backup (which is much more comfortable)
    connect target sys/pass1;
    connect catalog rman_catalog/pass2@rman_catalog;
    spool log to rman_db_restore_20160505.log
    run
    {
    allocate channel t1 type 'SBT_TAPE' FORMAT 'db_%t_%s_%p' parms 
      'ENV=(TDPO_OPTFILE=/TSM/CFG/tdpo.opt)';
    allocate channel t2 type 'SBT_TAPE' FORMAT 'db_%t_%s_%p' parms 
      'ENV=(TDPO_OPTFILE=/TSM/CFG/tdpo.opt)';
    # in case one restored controlfiles already - next 2 lines has to be removed
    restore standby controlfile; 
    alter database mount;
    SET NEWNAME FOR DATABASE TO NEW;
    restore database;
    SWITCH DATAFILE ALL;
    SWITCH TEMPFILE ALL;
    recover database DELETE ARCHIVELOG MAXSIZE 90G;
    
    release channel t1;
    release channel t2;
    }
    exit;
    

Now it is possible we have to add more archivelogs, but that would mean repeating the last line from the script above, until db is fine. Then one may want to fix some issues like paths to online and standby redo log files, etc.
From this time on the standby is ready to use - the main point at this moment is to restore the managed standby database recovery process. This task would require first checking the parameters like log_archive_config, log_archive_dest_* and log_archive_dest_state_*, fal_server and fal_client if they are properly configured. Of course in the first place the communication between primary and standby db nodes has to be checked by sqlplus for example.
Now we call
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
The output in alert logs on both nodes should confirm that the connections were built and the transfer of archivelogs has started.

Good to know

There exists bug 7553431 - RMAN restore controlfile for standby does not create correct controlfile type (Doc ID 7553431.8), which influences the role of restored database, fixed in 12c version.

Wednesday 4 May 2016

ORA-29283: invalid file operation

Today I 've got the following problem: the procedure, which was planned to write a file through UTL_FILE to a mounted resource, is unable to do so despite the proper privileges on the directory. At the same time the local OS user (db owner) was able to do it without problem. The whole thing was done on an AIX OS, and the mounted resource was presented through NFS - in general many points of problems.

The investigation

Running a test procedure produced something like:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
ORA-06512: at "TCCS.TEST_UTL_FILE", line 6
ORA-06512: at line 1
29283. 00000 -  "invalid file operation"
*Cause:    An attempt was made to read from a file or directory that does
           not exist, or file or directory access was denied by the
           operating system.
*Action:   Verify file and directory access privileges on the file system,
           and if reading, verify that the file exists.
The truss on the db process while calling the test procedure
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208) (sleeping...)
kread(16, "\0 _\0\006\0\0\0\0\011 i".., 8208)   = 95
statx("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE9140, 176, 010) Err#13 EACCES
statfs("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE8D20) Err#13 EACCES
statx("/home/tccs/files/out/utlout.txt", 0x0FFFFFFFFFFE8FF8, 176, 011) Err#13 EACCES
kopen("/home/tccs/files/out/utlout.txt", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH) Err#13 EACCES
kwrite(16, "\0\v\0\0\f\0\0\001\001", 11)        = 11
kwrite(16, "\0\v\0\0\f\0\0\001\002", 11)        = 11
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208)   = 11
lseek(5, 701440, 0)                             = 701440
kread(5, "\00F r V\0\0\0 b r W\0\0".., 512)     = 512
lseek(5, 162304, 0)                             = 162304
kread(5, "\0\n19 n\0\0\0 D19 o\0\0".., 512)     = 512
lseek(5, 701440, 0)                             = 701440
kread(5, "\00F r V\0\0\0 b r W\0\0".., 512)     = 512
lseek(5, 162304, 0)                             = 162304
kread(5, "\0\n19 n\0\0\0 D19 o\0\0".., 512)     = 512
kwrite(16, "\0 �\0\006\0\0\0\0\00401".., 233)   = 233
kread(16, "\0\v\0\0\f\0\0\001\002 i".., 8208) (sleeping...)
The 1st investigation point was brought by the article at http://jakub.wartak.pl/blog/?p=272. But after a while it become clear that, though the conditions were met and we changed the ownership of the mount point, the problem has persisted.
The 2nd guess was by suggestion in the article on MOS:Session Accessing File On Disk Fail with OS File Permission Errors (Doc ID 392647.1). And simple check that the local user connected on BEQUEATH is able to run such procedure without problem proofed that this is the right way - after the listener restart the remote users also acquired the ability to run procedure without problems. The root cause was that the db owner was added to the group being the owner of the target directory long after the listener was started - hence the remote sessions did not acquire the proper permissions on the directory, while the local sessions with "fresh" permissions worked ok.