Friday 20 December 2013

My own commandline FU

Shell

# changes the ownership for a particular user
# I use it mainly in cases the uid/gid differs between machines while it was intended
# to be the same 

find /oracle -uid 500 -exec chown oracle {} \;
find /oracle -gid 500 -exec chgrp oracle {} \;

Thursday 19 December 2013

OT: Commandline Fu (external link)

This is not exactly bound with the Oracle RDBMS, but I am impressed by cleverness and flexibility of those oneliners - http://www.commandlinefu.com/commands/browse

Thursday 14 November 2013

The ORA-00001 while resyncing the catalog

Lately I switched over a database to the standby. All went well until I faced the RMAN configuration. Usually there are problems with archived logs, which were backed up on the previous primary node and I usually deal with that calling CROSSCHECK ARCHIVELOG ALL;
This time however I have got the following:


RMAN> resync catalog;

starting full resync of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 10/24/2013 22:57:29
ORA-00001: unique constraint (RMAN.TS_U3) violated

I registered an SR on the MOS, and they stated I probably hit the BUG:12597985 - RMAN RESYNC CATALOG GOT ORA-1 and may install some patch or use workaround in the form:

unregister database;
register database;

I went on with the provided workaround and hit the following:

RMAN> unregister database;

database name is "YYYYYY" and DBID is 688900000

Do you really want to unregister the database (enter YES or NO)? YES
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06004: ORACLE error from recovery catalog database: 
ORA-02292: integrity constraint (RMAN.TSATT_F2) violated - child record found

Now I was informed this time it is a Bug 9799518 RMAN-6004 / ORA-2292 trying to unregister a database from an RMAN catalog and again I can install a patch. I would like to avoid the patch installation, so I did something else. Below the solution:

RMAN> unregister database;

database name is "YYYYYY" and DBID is 688900000

Do you really want to unregister the database (enter YES or NO)? YES
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06004: ORACLE error from recovery catalog database: 
ORA-02292: integrity constraint (RMAN.TSATT_F2) violated - child record found

# here from sqlplus I disabled RMAN.TSATT_F2
# select table_name from dba_constraints where constraint_name='TSATT_F2';
# alter table TSATT modify constraint TSATT_F2 disable;

RMAN> unregister database;

database name is "YYYYYY" and DBID is 688900000

Do you really want to unregister the database (enter YES or NO)? YES
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06004: ORACLE error from recovery catalog database: 
ORA-02292: integrity constraint (RMAN.TSATT_F3) violated - child record found

# here from sqlplus I disabled RMAN.TSATT_F3
# alter table rman.TSATT modify constraint TSATT_F3 disable;

RMAN> unregister database;

database name is "YYYYYY" and DBID is 688900000

Do you really want to unregister the database (enter YES or NO)? YES
database unregistered from the recovery catalog

# here I found the orphaned rows in the TSATT table and delete them - according to the expectations 
# these were the rows making both the constraints unable to be enabled
# after the deletion I enabled the constraints disabled previously
# delete from RMAN.TSATT T where not exists (select 1 from RMAN.CKP C where C.CKP_KEY=T.START_CKP_KEY) ;
# delete from RMAN.TSATT T where not exists (select 1 from RMAN.CKP C where C.C3626503034KP_KEY=T.END_CKP_KEY) ;
# most possibly both queries above would touch the same rows

RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

Friday 11 October 2013

DB upgrade shortly

This is a very generic and concise description of the upgrade operation an Oracle RDBMS environment to higher version.

  1. run the Pre-Upgrade Tool - one may think this is an optional step, yet it seems it is not. Once I tried to run the whole operation without an earlier call to it and here You may look at what may go wrong and how to deal with it.
    sh > sqlplus / as sysdba
    -- the Pre-Upgrade Tool
    -- one have to call it with the ORACLE_HOME set to current binaries 
    -- (in contrast to the new binaries i.e. after upgrade)
    -- there is no need to copy it to the current home, yet one have to provide a full path
    -- of course one may expect a change in the name according to the version 
    -- for example utlu121i for the version 12c
    sql> SPOOL preupg.log
    sql> @/rdbms/admin/utlu112i
    sql> SPOOL OFF
    
  2. change the ORACLE_HOME to the new home (as it is really inadvisable to install the new binaries in-place) and run the catalog upgrade
    -- usually the docs say one should go to ORACLE_HOME/rdbms/admin
    -- before running the catalog upgrade and this is then specified without a path
    -- I used to run it as shown below
    sh > sqlplus / as sysdba
    sql> STARTUP UPGRADE
    sql> SPOOL db_upgrade.log
    sql> @?/rdbms/admin/catupgrd
    -- these steps below are actually not necessary or even impossible due to 
    -- the shutdown included in catupgrd.sql
    sql> SPOOL OFF
    sql> SHUTDOWN IMMEDIATE
    
  3. now a normal start, some additional script and check if all seems to go well
    sh > sqlplus / as sysdba
    sql> STARTUP 
    sql> SPOOL postupg.log
    -- check if the upgrade went well
    sql> @?/rdbms/admin/utlu112s
    -- upgrade actions, which do not need UPGRADE mode
    sql> @?/rdbms/admin/catuppst.sql
    sql> SPOOL OFF
    -- compilation
    sql> @?/rdbms/admin/utlrp
    

That's all assuming everything went well ;-).

Monday 7 October 2013

Breaking through redo corruption

Today we faced a not very welcome message - ORA-00333: redo log read error block 56010 count 8064. The database is in the NOARCHIVELOG mode and we do not perform backups ;-).
Of course I would not smile so eagerly if the problem would be serious - I must admit at once that there was a bright side of the situation - this is a test environment and rather easily able to reproduce, so take it into consideration, while looking what I did to this database.

At the beginning I must state that the traffic on this database was mainly First of all I started to clear logfiles (ALTER DATABASE CLEAR LOGFILE GROUP x;), but quickly run out of progress - the corrupted redo log was the CURRENT one i.e. the one I can not clear.
As we have no backup, the list of options is quite short - the second (and one of the last resorts is to use some hidden parameters to disable the recovery).
The two parameters I was interested in are _allow_resetlogs_corruption and _allow_read_only_corruption. It seems to me the first one is related more to redologs, while the second to datafiles. At the moment I assume one may try to use them together - the first one to get rid of the corrupted online redo, and the second one to open a database despite of corrupted datafiles.
As I found quite many articles on the subject (but mainly by googling and not on the MOS pages), I firstly realize it is enough to set _allow_resetlogs_corruption to TRUE in order to go forward. However when I did this, I hit the ORA-01139.
So I run recovery (RECOVER DATABASE UNTIL CANCEL), and it required archivelog with the 518 sequence (which is the sequence the corrupted online logfile was opened with) - suprisingly it went well, so I tried to open the database with the RESETLOGS option and it failed. Then I set both the _allow_resetlogs_corruption and _allow_read_only_corruption parameters to TRUE and start the database again. This time the instance terminated with the ORA-00704 bootstrap errors. As You already find out I failed to open it successfully - the next try bring the ORA-600 errors, and at this moment I gave up.
And finally a victory! I tried once again. This time I simply mounted the database and call the database recovery (with both parameters reset) - it failed with an info that I have to open the database, so I call the OPEN operation and it opened.

Wednesday 2 October 2013

Why one can not connect to SYS with db link?

You may find the helpful article under this link (can-i-connect-a-database-link-as-sysdba).
If anything wrong with the link, try the article 119885 on the MOS.

Friday 27 September 2013

Cloning 11gR2

In general it seems to be a simple operation.
First one needs to copy ORACLE_HOME content to the new location (local or remote). The documentation suggests the zip utility, but I think rsync is also a right tool (for a local copy switches -av, for a remote -avz).
Second call runInstaller
$ORACLE_HOME/oui/bin/runInstaller -silent -clone ORACLE_HOME="[the new home]" ORACLE_HOME_NAME="[the new home id]" ORACLE_BASE="/oracle"
The docs do not mention the ORACLE_BASE specification, yet this is inevitable as without it the cloning process exits with information about lack of ORACLE_BASE value.

Wednesday 18 September 2013

Ora-30372 and what to do further

The error emerges at an mview creation. In my particular case this was an mview on a synonym, which indicated a remote view. The remote view is a subject of an access policy.
The whole issue is described on Oracle Support in the article Ora-30372: Fine Grain Access Policy Conflicts With Materialized View (Doc ID 604046.1).

Wednesday 11 September 2013

Small tip: moving statistics

Here there is a small experiment. A colleague of mine stated that loading statistics through impdp is a horrible thing in comparison to simple export and import with the dbms_stats package.
begin
-- on the source
dbms_stats.create_stat_table('','','');
dbms_stats.export_database_stats('','','');

-- on the target
-- first impdp stats table
dbms_stats.import_database_stats('','','');
end;
/

I give it a try and I am very glad of the results. The main problem with exporting and importing the statistics with the Data Pump is the operation duration - using dbms_stats will take a fraction of time spent on the export and import with the Data Pump technology.

Monday 12 August 2013

11g and sys.aud$

A short and relevant article on the maintenence of the audit structures (SYS.AUD$).
Also important news on usage of DBMS_AUDIT_MGMT, which may be used instead of the procedure from the previous link.

FIle watcher (external link)

A good article on the them at Eddie Awad's Blog.

Thursday 8 August 2013

On transparent switchover (external link)

Nice article by M.Klier on transparent switchover.

SQL*Plus and useful settings

SQL*Plus has a plenty of settings, but there are few really useful for me.
For formatting the output
  • PAGESIZE - this controls the numbers of rows displayed between displaying headers. Usually I set it to 1000 (which means that I have got headers only once) and for some scripts for 0 (which means no headers at all)
  • LINESIZE - this controls the line width in characters in which a row is fit in. Default 80 fits to default terminal settings. The best shape of data for me is the model 1 row per line, so I often modify this to get such result
  • TRIMOUT - for cutting off unnecessary spaces filling lines
  • COL FOR a - for formatting character columns
  • COL FOR 999999 - for formatting number columns
  • LONG - for setting how much of LOBs to show on the display - very useful especially with DBMS_METADATA.GET_DDL calls
  • HEAD - for turning on and off the headers - important for scripts, when we want some values from within the database - then sqlplus -s /nolog with CONNECT command


For testing purposes
  • FEEDBACK - for turning on and off a message at the end of display summarizing the number of affected rows or PL/SQL command status (SUCCESS|FAILURE)
  • TIME - enables time in sqlprompt, which may be used as a marker for script performance
  • TIMING - returns information about performance time for operations


For scripts creating scripts
  • SQLPROMPT - cool for identifying terminals, when one works on plenty of environments (especially when some of them are production ones), but even cooler as "---> " when it comes to the creation of SQL scripts - prompt messages in the spool are seen as comments


For executing scripts
  • TERMOUT - this works only if one calls commands from underlying script - very important for long-running scripts with plenty of output with enabled spooling, when on terminal output we need only command and information about success or error messages.
  • SPOOL - for logging
  • TRIMSPOOL - useful especially with connection to wide LINESIZE - in order to remove unnecessary spaces (with them logs become very large)
  • SERVEROUTPUT - for turning on|off printing from PL/SQL to screen
  • ECHO - turns on|off the lines with replaced &variables

FAILED_LOGIN_ATTEMPTS and DoS

From time to time I used to think about what is the purpose in setting FAILED_LOGIN_ATTEMPTS in profile for schema used by application (Login issues).
But now I am more convinced that it is good to set this setting also for applications.

Lately I was called to look at a database, which was meant to be used in series of application tests. The main problem was that the performance was so poor that it was ridiculous to try to run with any load at all. After a longer while I logged on and the first observation was that a lot of sessions waited on library cache lock. There was no sql_id for those sessions and the set of those waiting sessions continuously (though slowly) changed - I mean here new sids. No other activity actually.

After further investigation I realize what were those sessions with no sql_id - I assumed these were connection tries from an application server with wrong password set. In order to be sure I set another profile for all the accounts and after a while all problems have gone and the database returned to usable state.
So what really happened was the following:
  • there were few hosts with application servers on board with some applications and the database - all those nodes created an integrated test environment
  • there was a request for the database refresh and new password were applied
  • passwords were changed to the new values, but partially - for 2 kinds of data sources (i.e. connecting to 2 schemas) there were old passwords left intact
  • with start of application servers and applications deployed on them there show the connections to the database
  • after a while the following situation established - there were so many the connection tries with wrong password that the library cache lock was continuously grabbed by one of those tries apparently in order to check password
  • when I enabled the new profile with FAILED_LOGIN_ATTEMPTS set to 10, the 2 accounts were locked, but the rest of the database was available
As a conclusion it seems that connection to locked accounts do not need to acquire library cache lock (and probably any other blocking mechanisms service by shared pool structures) and thus does not put to the database time much, so it is clear that throwing the locked account exceptions is much more reliable than the wrong login or password exception. So (at least for 11g) the setting of the FAILED_LOGIN_ATTEMPTS parameter in a profile even though opens a schema for the DoS type attack, but secures the rest of the database - of course still there is no problem to lock all the accounts with the proper number of connection attempts with wrong password. But I suppose that the human error is more common than a malicious user and simple mistake in a password value may lead to unavailability of all the accounts on the database, which distributes the failure to all the applications dependent on any of the database accounts.

Wednesday 7 August 2013

The problem with rebuilding a chain job

One of our ETL processes is based upon chain jobs and once we had a problem with it after moving to a test environment.
The part of the solution is a procedure, which completely erases the whole chain and builds it from scratch and one would think this should work no matter what. Yet we have got the ORA-24155 exception (rule string.string not in rule set string.string) during the trial to rebuild this chain.

Firstly I tried to add such a rule to the ruleset, but then there was another exception in another step, and another exception... and so on. Despite that the procedure worked longer and longer and at some point it simply run forever (from the standpoint of human being). Apparently there was something more in this.
One thing to realize is that rules have more general character, so they are used in few places in the Oracle RDBMS - with the special place in the Streams technology. The packages for the rules manipulation are mainly DBMS_RULE and DBMS_RULE_ADM, which may be not so obvious assuming all the job for chains is done with calls to DBMS_SCHEDULER.
The relationships between the catalog views regarding rules and rule sets are also not too clear as there are DBA_RULE_SETS and DBA_RULESETS for example. Yet by and large this is organized in the following way:
  • for every chain there is a rule set (or ruleset ;-)) defined, which gathers all the rules for this chain (DBA_SCHEDULER_CHAINS)
  • there are also rules for which there is a defined chain (DBA_SCHEDULER_CHAIN_RULES)
  • despite that there are also the dependencies defined between rule sets and rules (visible in DBA_RULE_SET_RULES)
  • also rule sets are enlisted in DBA_RULE_SETS and DBA_RULESETS (with slightly different information)
What is crucial here is that both rule set and rules are bound with a chain. So what is the problem core? The chain drop operation. Apparently there are situations when not all the rules are dropped with dropped chain, while our code does exactly this - calls DBMS_SCHEDULER.DROP_CHAIN assuming all the dependent objects will be gone.
Now if in next step the procedure tries to build the chain anew, it creates a new rule set bound to the chain and new rules. It seems to me however, that in case of rules it is acceptable for the DBMS_SCHEDULER code to find the already existing rules. But then they have a different rule set than the one created anew, so we get the ORA-24155 exception.

And the solution? In this case I just removed all the rules defined by the chain now and in the past with calls to DBMS_RULE_ADM and run the code again. This time all went well and the chain was created.

Data Pump, SYS account and non working jobs

Lately I have had an issue with a load framework, which stopped working after moving the whole thing to a test environment. From the beginning it was not clear why it does not work. The reason provided in the DBA_SCHEDULER_JOB_RUN_DETAILS was exactly that the job was stopped because it was terminated, so not really helpful :-).
However, every trace indicated that jobs throw the ORA-28031 exception (maximum of 148 enabled roles exceeded).

The only role with such a number of roles was the SYS account. Apparently a job process during a piece of its lifetime is run on the SYS privileges. So the processes had been started and almost immediately terminated. And this actually explains everything. The only mystery left was why the SYS account get so many roles.
I am not sure of the exact scenario, but:
  • we have on these databases the quite numerous quantity of custom roles and exactly those roles were assigned to the SYS account
  • the migration was done with the Data Pump tools
  • the import was done on the SYS privileges and apparently then all those roles were assigned
  • there was no explicit operation of assigning those roles to the SYS schema
  • it is possible in the impdp parameters there was added the exclusion of default roles, which are enabled on the SYS by default
The main solution was to revoke all those unnecessary roles. To make the framework work other adjustments were also required, but rather obvious.

The fun with names and PLS-00302

Quite often it turns out that the more mysterious the problem is the more trivial is the solution.
This is a short note about such a situation.

There is a package, which prepares a report. The package has the VALID status, no problems with the compilation or so. However, the call to the procedure ended with throwing the exception PLS-00302 (component zzz must be declared). The call from the owner schema, so not problems with the visibility of the package, the DESCRIBE command works as expected.
To make the long story short I found out that the exception had been risen, when the call was fully qualified with the schema name. If there was no schema specification, the code had been working properly. The reason was the existence of another package with the same name as the schema name, and the problem was with the context resolution - in the first place there was a search within the package and not within the schema namespace.

The workaround is simple - not to call the package prefixed with the schema name. The true solution is clear (though not necessarily simple due to possible dependencies) - to change the package name to different than the schema name and of course to avoid naming the packages with names of schemas.

The listing of file names in a directory through PL/SQL

There is no straightforward way to enlist file names in a file system directory from PL/SQL level. The UTL_FILE package allows for many operations on the files, yet there is no equivalent to command ls. There is, however, a way to do it from within the database engine.

The first idea is to use Java or extproc. Tom Kyte describes such solution on AskTom.
The second idea is to use the DBMS_BACKUP_RESTORE package. This way is a little tricky and may change between different versions of the Oracle RDBMS binaries, so one has to be careful, when using it. As this is quite interesting I place here a short description.

connect / as sysdba
declare
p_dir varchar2(64) := '/oracle';
p_nul varchar2(1024);
begin
sys.dbms_backup_restore.searchFiles(p_dir, p_null);
dbms_output.put_line(p_nul);
for f in (select fname_krbmsft fname from x$krbmsft) loop
dbms_output.put_line(f.fname);
end loop;
end;
/
I run it on the 11.2.0.2 version and the whole thing works as advertised, however:
  • does not follow symlinks
  • returns all the file names recursively, so enlists also subdirectories' files
Possibly there are other features, I did not spot in the short test.

A very good article on the subject one may find at http://www.morganslibrary.com/hci/hci002.html. The author suggests some security measures, which in this case are obligatory like wrapping the x$krbmsft in a read only view and the searchFiles call in a procedure. Of course this is not only for security - possible changes in the DBMS_BACKUP_RESTORE package may bring an end to this procedure and then it is easier to change the code in custom procedure.

Monday 5 August 2013

The orphaned mview registration on a mview log

Some time ago we migrated a certain application between 2 databases. This operation was quite long-term and went in few steps. The replication based upon mviews was involved for the transition period. The one of the final steps was a drop of a few schemas on the source database. So far, so good...

After several months we realized that one of the mview logs on a table from the migrated schemas had become quite large despite the fact that the mview based upon it had been refreshed regularly and that was the only mview bound with this mview log.

What is behind this tale?
It is quite easy to find the solution by googling or at least some clues to what happened with our log - for example on the blog by A. Prakesh or the blog by D.Vadas. Finally on the MOS there are a few articles describing the whole mview refresh mechanism (236233.1, 258634.1) and dealing by the way with this particular issue.

In our particular case the scenario is the following. The schemas drop operation on the source database affected also the mviews, which were registered with the mview logs on the destination database. Such situation is one of the few possible, when there is performed no explicit unregistration of the mview. In the catalog of the destination database it is seen after performing the following query:
select s.mowner, s.master, s.snaptime, 'exec dbms_mview.purge_mview_from_log('||s.snapid||');' fix1
from sys.slog$ s 
where not exists(select 1 from dba_registered_snapshots r where s.snapid=r.snapshot_id) 
order by mowner, master;
While there are no more entries in the dba_registered_snapshots view, there still exist entries in the sys.slog$, which indicate the database engine, how much data have to be stored to allow for the fast refresh on the registered mviews. The snaptime value indicates the last fast refresh on the registered mviews. In our case we had 2 entries per mview bound with this issue. One of them was the "healthy" one, which has been refreshed frequently, while the second entry was the one without a corresponding entry in the dba_registered_snapshots view with the snaptime far in the past (which corresponded to the oldest rows in the mview log).

So we found the guilty, what's next?
Here is a time for the fix1 column from the query above. The call to the purge_mview_from_log procedure from the dbms_mview package purges all the entries, which are held for a once-in-time existing mviews, which we implicitly dropped with the migrated schemas. The required argument is the snapid column value from s.slog$.
After the purging operation we are left possibly with an enormously grown segment of the mview log, so we need 2 more operations. One is to enable the row movement on the table storing the mview log entries (the system-create name with the pattern MLOG$_{the master table name}) and the next one is the shrink of course, but called this time on the mview log object and not the table (all the issues with shrinking apply .
alter table MLOG$_{the master table name} enable row movement;
alter materialized view log on {the master table name} shrink space [compact];
alter materialized view log on {the master table name} shrink space; -- if previous command with compact

At least in our case the purge even on small tables enabled high waits on the log sync event (though this was not the case on the test environment), so this is important to do the purge in the period of the low database load - for example at night or for weekend. I suppose the reason behind this is that the purge is done by deleting every entry in the mlog. It seems there is a one commit per purge but at least in our case there were lots of commit cleanouts on the production site.

Monday 22 July 2013

ORADEBUG session for AQ

SQL> select owner, job_name from dba_scheduler_jobs 
SQL> where lower(job_action) like '%register_driver%'
OWNER                          JOB_NAME                     
------------------------------ ------------------------------
SYS                            AQ$_PLSQL_NTFN21               

SQL> select SESSION_ID, SLAVE_PROCESS_ID, SLAVE_OS_PROCESS_ID 
SQL> from DBA_SCHEDULER_RUNNING_JOBS where job_name='AQ$_PLSQL_NTFN21'
SESSION_ID SLAVE_PROCESS_ID SLAVE_OS_PROCESS_ID
---------- ---------------- -------------------
        16               26 26823               

SQL> select sid, serial#, paddr, program from v$session where sid=16
SID SERIAL# PADDR            PROGRAM                                        
--- ------- ---------------- ------------------------------------------------
 16       1 000000009F67D7A8 oracle@dt-bpss-db-04-2 (J000)                    

SQL> select spid from v$process where addr='000000009F67D7A8'
SPID                   
------------------------
26823                    
-------------------
SQL> oradebug setospid 26823
Oracle pid: 26, Unix process pid: 26823, image: oracle@dt-bpss-db-04-2 (J000)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug Event 10046 trace name context forever, level 12
Statement processed.
SQL> select sysdate from dual;

SYSDATE
-------------------
2013-07-18 13:57:50

SQL> oradebug Event 10046 trace name context off
Statement processed.

Wednesday 17 July 2013

The crash with 11.2.0.2.9 binaries

I have run a test database and after short time have been informed it crashed. Today's investigation shows that it is bound with parallel recovery done automatically after/during startup. Here IMHO the excellent article on the subject.

Wednesday 26 June 2013

Short way to enable one SQL plan from many

Assuming one has a SQL cursor on a database, for which exist many plans and one wants to get rid of all of them but one here is the quickest way I find to do that:
declare
  n number;
begin
  n:=dbms_spm.load_plans_from_cursor_cache(sql_id=>'gzwhs3zc68ju5', plan_hash_value=>156135624, fixed =>'NO', enabled=>'YES');
end;
/
  • sql_id is an identifier of a SQL we want to create SQL plan baseline for
  • plan_hash_value is a hash value for the plan wa want to use
  • fixed is an attribute, which provides some priority for fixed plans over non-fixed and disables automatic SQL tuning to implement new findings at once (findings are stored as non-fixed plan baselines) - I would say this is reasonable to use it, though in the example it is set to NO
  • enabled is self-explained

Tuesday 16 April 2013

Analyzing chained and migrated rows

Two articles from the renowned authors:
  1. Analyze-this by J. Lewis
    Shortcut:
    -- gathering statistics populates among others user_tables.chain_cnt
    analyze table [tbl] compute statistics for table;
    -- check gathered data
    -- delete statistics 
    -- otherwise the optimizer will use the chain_cnt to modify 
    -- the cost of indexed access to the table
    analyze table [tbl] delete statistics; 
    exec dbms_stats.gather_table_stats([owner], [tbl]);
    
  2. Detect-chained-and-migrated-rows-in-oracle by T.Poder
    Excerpt:
    One way to estimate the impact of chained rows is to just look into the "table fetch continued row" statistic - one can run query/workload and measure this metric from v$sesstat (with snapper for example). And one more way to estimate the total number of chained pieces would be to run something like SELECT /*+ FULL(t) */ MIN(last_col) FROM t and see how much the "table fetch continued row" metric increases throughout the full table scan. The last_col would be the (physical) last column of the table. Note that if a wide row is chained into let's say 4 pieces, then you'd see the metric increase by 3 for a row where 4th row piece had to be fetched.

Friday 12 April 2013

Duplicating database manually on the same host

The assumption is to create a copy of the database on the same host in order to run 2 independent database instances. It can be done in many ways:

  • with RMAN 
  • with expdp and standard dump
  • with expdp and transportable tablespaces
  • with cp command and dbnewid tool
I present here the last way.
  1. --shutdown cleanly a database to be copied
    shutdown immediate; 
    
  2. change the database directory name to something different - it is needed in order to preserve original datafiles, when we would change their paths. By default the Oracle RDBMS deletes original data files when they exist under both paths (ie. old one and the new one; we use OMF) - by directory change we prevent this behaviour - for example new directory name would be TEST.ORIG. After whole operation we return back the original name.
  3. copy all the datafiles, logfiles and controlfiles to a new directory with cp command
    ## for example 
    cp -r TEST.ORIG TES2
    
  4. create new init.ora with new path to the controlfiles, new unique name and other paths with would conflict with   - for example create pfile='inittes2.ora' from spfile
  5. startup nomount - fix possible errors
  6. mount the database - fix possible errors
  7. change the data files paths -
    select 'alter database rename file '''||name||''' to '''
      ||regexp_replace(name, '[old prefix]', '[new prefix]')
      ||''';' ddl1 
    from v$datafile 
    union all
    select 'alter database rename file '''||name||''' to '''
      ||regexp_replace(name, '[old prefix]', '[new prefix]')
      ||''';' ddl1 
    from v$tempfile 
    union all
    select 'alter database rename file '''||member||''' to '''
      ||regexp_replace(member, '[old prefix]', '[new prefix]')
      ||''';' ddl1 
    from v$logfile
    ;
    
    
  8. shutdown immediate
  9. ## ensure open_cursors are set to bigger number than 
    ## all files of the database
    ## (apparently the nid utility opens them all at once)
    nid target=sys dbname=cd2;
    
  10. change parameter file db_name to new name
  11. startup mount
    alter database open resetlogs;
    
  12. restore old files to the old path and startup the original database

Summarizing there are only 2 things, which one must care about - implicit removal of database files in old location (which must be prevented; not sure if this happens without OMF) and the open_cursors parameter, which must be set higher than the number of database files (not sure but better to count temporary and log files as well).

Login issues

The 11g version brought some changes to the DEFAULT profile. Previously the FAILED_LOGIN_ATTEMPTS parameter was always set to UNLIMITED and now it is set to some value, which means a schema lock after this value of failed logins is crossed over.

I must say I am puzzled about this. In general I understand the reason behind the FAILED_LOGIN_ATTEMPTS - it is against password breaking brute force attacks. On the other hand it means that some 'lost' application host with a wrong password becomes the point of a DoS attack. Which is better (or worse), hard to tell.
Usually a database is located after some firewall (or two) as this is quite a deep layer in the application stack. I usually meet with databases, where the schemas are application schemas, so there exists a client application interface between a human and a database - passwords are encoded in the application configuration and a direct access to a database itself is strictly limited. On the other hand there are users (though not numerous), who are allowed to make a direct connection and among them a 'malicious' one may be hidden .

So, how do I imagine dealing with the configuration?
I believe for application it is better to create another profile, which keeps the FAILED_LOGIN_ATTEMPTS parameter to the UNLIMITED, because it is not so rare that some forgotten application or script exists, which would block the schema and thus practically disable the application. Of course there is a monitoring system, but usually the delay in the information feedback to a human is ~5 minutes. There other issues come up (multiple application hosts, and only one of them with a wrong password; few applications sharing the same schema; scripts run from the cron on different shell accounts; etc.) and we get a noticeable delay in the application work, which was meant possibly to work in the 24x7 regime. And this may happen quite frequently and there is no need for malice.
Further, it would be reasonable to move direct users to other databases and possibly connect them through mix of additional schemas and/or database links, so that they would not be able to connect directly to the database with application schemas.

The drawbacks?
  • the human users have got performance penalty, if connected through another database or may try to break passwords if there will be no such prevention measures - so if it is a database with plenty of directly connected human users then this would not be such a great idea.
  • 11g: Multiple failed login attempt can block new application connections - shortly the 11g version has additional security feature against brute force attacks - if set to the UNLIMITED there is a delay enabled when returning an error message due to the failed login attempt after the first few attempts. Due to the bug 7715339 such delayed session keeps library cache lock for prolonged period (due to enabled delay) and new sessions wait on this lock till the number hits the sessions/processes ceiling. It is possible to disable the delay feature with event='28401 trace name context forever, level 1'

Wednesday 23 January 2013

An issue with the UTL_FILE and file permissions

Configuration in which only local user is able to write to indicated directory through UTL_FILE running sqlplus, and for any remote access or access by other users, the code will not work properly Some time ago we met with a little weird behavior of a test database. The code in a package did some processing and wrote to specified directory. However it worked only if the user was logged on the database host as him. Calling the code remotely or locally from a different user (also the database owner) ended always with error about wrong operation on file (the writes were issued through UTL_FILE).
The os directory, which was aliased in the database, had mask 0777, so though not owned by oracle, should be accessible by the database.

The explanation is the following:
While the aliased directory was accessible to anybody, the parent directory to it (ie. home directory of the user) was accessible only to him and his group. Thus if any user tried to use the code, the database can not have written to the destination as it requires at least rX permissions on every directory level. To make the code run was to add the oracle user to the local user group or change permissions on the local user home or anything like that.
But why the local user was able to perform the code successfully? This is bound with the way the connection to the database is handed to a user client. When it is done through listener simplifying it prepares a process/dispatcher, then forwards the connection information to the client. If dealing with connection locally it is done differently - the database process becomes a child of the local user session and inherits user' permissions and thus is able to write to the otherwise inaccessible file.

Conditions in UPDATE statements

This is kind of a fix to common developer misuse. I noticed that sometimes see the solution for searching more complex constructs as this:
[..] where f1||f2||f3 in (select f1||f2||f3 from t1)
While in SELECT this is simply weird, because one may simply join 2 tables and specify conditions with f1=f1 etc., it may be considered when we play with DML. Of course this means the optimizer can not use indexes (unless there is some rather complex index on expression), but developers forget about very easy construct:
[..] where (f1, f2, f3) in (select f1, f2, f3 from t1)
Now plans look much better and we still may create condition on several fields at once.

Confusion at partition maintenance

Here an anecdote rather than a hint, but sometimes helps ;-). Once I have read about our "built-in" inability to find own errors and this is exactly about such case.

Lately I tried to add a new partition for 2013 year to some table. It was subpartitioned, so the syntax little more complex (below - please note, this is with error).
alter table schema1.tab1
add partition p1301 VALUES less than (201302) TABLESPACE tbs1 (
SUBPARTITION p1301_1 VALUES(1), 
SUBPARTITION p1301_2 VALUES(2),
SUBPARTITION p1301_3 VALUES(3), 
SUBPARTITION p1301_4 VALUES(4)));
I constructed the statement and tried to execute. But on and on I have got the same error:
ORA-14048: a partition maintenance operation may not be combined with other operations
. And it was not as complex as I may miss an additional operation. So I stuck. After several minutes I ask a colleague of mine for help. He came to me and I have explained him the issue. At the same moment I realize what was wrong - additional parenthesis.
If it would be missing, the error message would be correct
ORA-00907: missing right parenthesis
but with additional parenthesis the statement syntax suggests to the RDBMS something completely different, thus returning more enigmatic error, which in turn puzzled the RDBMS user - in this case me.

Some notes on materialized views

Few loosely notes on materialized views.

It is possible to build a mview on a pre-built table. I like this feature mainly because one is able to drop such view and recreate it with slightly changed definition, while the data is preserved, which is very valuable especially when dealing with large source tables.

Another observation is that sometimes the casting to right types is necessary. We met several times the situation when on Linux some mview definition worked properly while on AIX it failed due to not strong type casting, when the solution was to use CAST with precision to type and length (usually the problem core was longer VARCHAR2 than expected).