Remigium: 04/08/2013

Thursday, 8 August 2013

On transparent switchover (external link)

Nice article by M.Klier on transparent switchover.

SQL*Plus and useful settings

SQL*Plus has a plenty of settings, but there are few really useful for me.
For formatting the output

PAGESIZE - this controls the numbers of rows displayed between displaying headers. Usually I set it to 1000 (which means that I have got headers only once) and for some scripts for 0 (which means no headers at all)
LINESIZE - this controls the line width in characters in which a row is fit in. Default 80 fits to default terminal settings. The best shape of data for me is the model 1 row per line, so I often modify this to get such result
TRIMOUT - for cutting off unnecessary spaces filling lines
COL FOR a - for formatting character columns
COL FOR 999999 - for formatting number columns
LONG - for setting how much of LOBs to show on the display - very useful especially with DBMS_METADATA.GET_DDL calls
HEAD - for turning on and off the headers - important for scripts, when we want some values from within the database - then sqlplus -s /nolog with CONNECT command

For testing purposes

FEEDBACK - for turning on and off a message at the end of display summarizing the number of affected rows or PL/SQL command status (SUCCESS|FAILURE)
TIME - enables time in sqlprompt, which may be used as a marker for script performance
TIMING - returns information about performance time for operations

For scripts creating scripts

SQLPROMPT - cool for identifying terminals, when one works on plenty of environments (especially when some of them are production ones), but even cooler as "---> " when it comes to the creation of SQL scripts - prompt messages in the spool are seen as comments

For executing scripts

TERMOUT - this works only if one calls commands from underlying script - very important for long-running scripts with plenty of output with enabled spooling, when on terminal output we need only command and information about success or error messages.
SPOOL - for logging
TRIMSPOOL - useful especially with connection to wide LINESIZE - in order to remove unnecessary spaces (with them logs become very large)
SERVEROUTPUT - for turning on|off printing from PL/SQL to screen
ECHO - turns on|off the lines with replaced &variables

From time to time I used to think about what is the purpose in setting FAILED_LOGIN_ATTEMPTS in profile for schema used by application (Login issues).
But now I am more convinced that it is good to set this setting also for applications.

Lately I was called to look at a database, which was meant to be used in series of application tests. The main problem was that the performance was so poor that it was ridiculous to try to run with any load at all. After a longer while I logged on and the first observation was that a lot of sessions waited on library cache lock. There was no sql_id for those sessions and the set of those waiting sessions continuously (though slowly) changed - I mean here new sids. No other activity actually.

After further investigation I realize what were those sessions with no sql_id - I assumed these were connection tries from an application server with wrong password set. In order to be sure I set another profile for all the accounts and after a while all problems have gone and the database returned to usable state.
So what really happened was the following:

there were few hosts with application servers on board with some applications and the database - all those nodes created an integrated test environment
there was a request for the database refresh and new password were applied
passwords were changed to the new values, but partially - for 2 kinds of data sources (i.e. connecting to 2 schemas) there were old passwords left intact
with start of application servers and applications deployed on them there show the connections to the database
after a while the following situation established - there were so many the connection tries with wrong password that the library cache lock was continuously grabbed by one of those tries apparently in order to check password
when I enabled the new profile with FAILED_LOGIN_ATTEMPTS set to 10, the 2 accounts were locked, but the rest of the database was available

As a conclusion it seems that connection to locked accounts do not need to acquire library cache lock (and probably any other blocking mechanisms service by shared pool structures) and thus does not put to the database time much, so it is clear that throwing the locked account exceptions is much more reliable than the wrong login or password exception. So (at least for 11g) the setting of the FAILED_LOGIN_ATTEMPTS parameter in a profile even though opens a schema for the DoS type attack, but secures the rest of the database - of course still there is no problem to lock all the accounts with the proper number of connection attempts with wrong password. But I suppose that the human error is more common than a malicious user and simple mistake in a password value may lead to unavailability of all the accounts on the database, which distributes the failure to all the applications dependent on any of the database accounts.

Wednesday, 7 August 2013

The problem with rebuilding a chain job

One of our ETL processes is based upon chain jobs and once we had a problem with it after moving to a test environment.
The part of the solution is a procedure, which completely erases the whole chain and builds it from scratch and one would think this should work no matter what. Yet we have got the ORA-24155 exception (rule string.string not in rule set string.string) during the trial to rebuild this chain.

Firstly I tried to add such a rule to the ruleset, but then there was another exception in another step, and another exception... and so on. Despite that the procedure worked longer and longer and at some point it simply run forever (from the standpoint of human being). Apparently there was something more in this.
One thing to realize is that rules have more general character, so they are used in few places in the Oracle RDBMS - with the special place in the Streams technology. The packages for the rules manipulation are mainly DBMS_RULE and DBMS_RULE_ADM, which may be not so obvious assuming all the job for chains is done with calls to DBMS_SCHEDULER.
The relationships between the catalog views regarding rules and rule sets are also not too clear as there are DBA_RULE_SETS and DBA_RULESETS for example. Yet by and large this is organized in the following way:

for every chain there is a rule set (or ruleset ;-)) defined, which gathers all the rules for this chain (DBA_SCHEDULER_CHAINS)
there are also rules for which there is a defined chain (DBA_SCHEDULER_CHAIN_RULES)
despite that there are also the dependencies defined between rule sets and rules (visible in DBA_RULE_SET_RULES)
also rule sets are enlisted in DBA_RULE_SETS and DBA_RULESETS (with slightly different information)

What is crucial here is that both rule set and rules are bound with a chain. So what is the problem core? The chain drop operation. Apparently there are situations when not all the rules are dropped with dropped chain, while our code does exactly this - calls DBMS_SCHEDULER.DROP_CHAIN assuming all the dependent objects will be gone.
Now if in next step the procedure tries to build the chain anew, it creates a new rule set bound to the chain and new rules. It seems to me however, that in case of rules it is acceptable for the DBMS_SCHEDULER code to find the already existing rules. But then they have a different rule set than the one created anew, so we get the ORA-24155 exception.

And the solution? In this case I just removed all the rules defined by the chain now and in the past with calls to DBMS_RULE_ADM and run the code again. This time all went well and the chain was created.

Data Pump, SYS account and non working jobs

Lately I have had an issue with a load framework, which stopped working after moving the whole thing to a test environment. From the beginning it was not clear why it does not work. The reason provided in the DBA_SCHEDULER_JOB_RUN_DETAILS was exactly that the job was stopped because it was terminated, so not really helpful :-).
However, every trace indicated that jobs throw the ORA-28031 exception (maximum of 148 enabled roles exceeded).

The only role with such a number of roles was the SYS account. Apparently a job process during a piece of its lifetime is run on the SYS privileges. So the processes had been started and almost immediately terminated. And this actually explains everything. The only mystery left was why the SYS account get so many roles.
I am not sure of the exact scenario, but:

we have on these databases the quite numerous quantity of custom roles and exactly those roles were assigned to the SYS account
the migration was done with the Data Pump tools
the import was done on the SYS privileges and apparently then all those roles were assigned
there was no explicit operation of assigning those roles to the SYS schema
it is possible in the impdp parameters there was added the exclusion of default roles, which are enabled on the SYS by default

The main solution was to revoke all those unnecessary roles. To make the framework work other adjustments were also required, but rather obvious.

The fun with names and PLS-00302

Quite often it turns out that the more mysterious the problem is the more trivial is the solution.
This is a short note about such a situation.

There is a package, which prepares a report. The package has the VALID status, no problems with the compilation or so. However, the call to the procedure ended with throwing the exception PLS-00302 (component zzz must be declared). The call from the owner schema, so not problems with the visibility of the package, the DESCRIBE command works as expected.
To make the long story short I found out that the exception had been risen, when the call was fully qualified with the schema name. If there was no schema specification, the code had been working properly. The reason was the existence of another package with the same name as the schema name, and the problem was with the context resolution - in the first place there was a search within the package and not within the schema namespace.

The workaround is simple - not to call the package prefixed with the schema name. The true solution is clear (though not necessarily simple due to possible dependencies) - to change the package name to different than the schema name and of course to avoid naming the packages with names of schemas.

The listing of file names in a directory through PL/SQL

There is no straightforward way to enlist file names in a file system directory from PL/SQL level. The UTL_FILE package allows for many operations on the files, yet there is no equivalent to command ls. There is, however, a way to do it from within the database engine.

The first idea is to use Java or extproc. Tom Kyte describes such solution on AskTom.
The second idea is to use the DBMS_BACKUP_RESTORE package. This way is a little tricky and may change between different versions of the Oracle RDBMS binaries, so one has to be careful, when using it. As this is quite interesting I place here a short description.

connect / as sysdba
declare
p_dir varchar2(64) := '/oracle';
p_nul varchar2(1024);
begin
sys.dbms_backup_restore.searchFiles(p_dir, p_null);
dbms_output.put_line(p_nul);
for f in (select fname_krbmsft fname from x$krbmsft) loop
dbms_output.put_line(f.fname);
end loop;
end;
/

I run it on the 11.2.0.2 version and the whole thing works as advertised, however:

does not follow symlinks
returns all the file names recursively, so enlists also subdirectories' files

Possibly there are other features, I did not spot in the short test.

A very good article on the subject one may find at http://www.morganslibrary.com/hci/hci002.html. The author suggests some security measures, which in this case are obligatory like wrapping the x$krbmsft in a read only view and the searchFiles call in a procedure. Of course this is not only for security - possible changes in the DBMS_BACKUP_RESTORE package may bring an end to this procedure and then it is easier to change the code in custom procedure.

Monday, 5 August 2013

The orphaned mview registration on a mview log

Some time ago we migrated a certain application between 2 databases. This operation was quite long-term and went in few steps. The replication based upon mviews was involved for the transition period. The one of the final steps was a drop of a few schemas on the source database. So far, so good...

After several months we realized that one of the mview logs on a table from the migrated schemas had become quite large despite the fact that the mview based upon it had been refreshed regularly and that was the only mview bound with this mview log.

What is behind this tale?
It is quite easy to find the solution by googling or at least some clues to what happened with our log - for example on the blog by A. Prakesh or the blog by D.Vadas. Finally on the MOS there are a few articles describing the whole mview refresh mechanism (236233.1, 258634.1) and dealing by the way with this particular issue.

In our particular case the scenario is the following. The schemas drop operation on the source database affected also the mviews, which were registered with the mview logs on the destination database. Such situation is one of the few possible, when there is performed no explicit unregistration of the mview. In the catalog of the destination database it is seen after performing the following query:

select s.mowner, s.master, s.snaptime, 'exec dbms_mview.purge_mview_from_log('||s.snapid||');' fix1
from sys.slog$ s 
where not exists(select 1 from dba_registered_snapshots r where s.snapid=r.snapshot_id) 
order by mowner, master;

While there are no more entries in the dba_registered_snapshots view, there still exist entries in the sys.slog$, which indicate the database engine, how much data have to be stored to allow for the fast refresh on the registered mviews. The snaptime value indicates the last fast refresh on the registered mviews. In our case we had 2 entries per mview bound with this issue. One of them was the "healthy" one, which has been refreshed frequently, while the second entry was the one without a corresponding entry in the dba_registered_snapshots view with the snaptime far in the past (which corresponded to the oldest rows in the mview log).

So we found the guilty, what's next?
Here is a time for the fix1 column from the query above. The call to the purge_mview_from_log procedure from the dbms_mview package purges all the entries, which are held for a once-in-time existing mviews, which we implicitly dropped with the migrated schemas. The required argument is the snapid column value from s.slog$.
After the purging operation we are left possibly with an enormously grown segment of the mview log, so we need 2 more operations. One is to enable the row movement on the table storing the mview log entries (the system-create name with the pattern MLOG$_{the master table name}) and the next one is the shrink of course, but called this time on the mview log object and not the table (all the issues with shrinking apply .

alter table MLOG$_{the master table name} enable row movement;
alter materialized view log on {the master table name} shrink space [compact];
alter materialized view log on {the master table name} shrink space; -- if previous command with compact

At least in our case the purge even on small tables enabled high waits on the log sync event (though this was not the case on the test environment), so this is important to do the purge in the period of the low database load - for example at night or for weekend. I suppose the reason behind this is that the purge is done by deleting every entry in the mlog. It seems there is a one commit per purge but at least in our case there were lots of commit cleanouts on the production site.