Thursday, 19 April 2012

Automatic TSPITR and dealing with it

The automatic TSPITR is very useful when we have an independent schema, which was logically corrupted by some of our actions. Let's say we started a batch processing, which is not a single transaction able to rollback, but rather hundreds of small changes to many tables and this processing failed being inside the run.
In the 11g database it is highly automated - theoretically one needs to run one RMAN command like below:
recover tablespace until time "to_date('2012-04-16 08:15:00','yyyy-mm-dd hh24:mi:ss')" auxiliary destination '/rman/aux';
Lately we use it for exactly the same reason I mentioned above - we restored a schema from a time before processing.

In the first place of course it is best not to recover at all, for example by making UNDO tablespace bigger (and I mean here a defense against ORA-01555 rather than making FLASHBACK TABLE operations possible).
However let's assume one is already in a trouble.

The automatic instance
According to our experience the most important thing is to tune the automatically run instance by providing additional settings.
The automatic instance by default runs on the following parameters:
System parameters with non-default values:
processes = 50
sga_target = 280M
db_block_size = 8192
compatible = "11.2.0.0.0"
log_archive_dest_1 = "location=/rman/aux"
db_files = 200
db_create_file_dest = "/rman/aux"
db_name = "TEST"
db_unique_name = "xgFy_tspitr_TEST"
For us 2 things did not work well. The processes parameter was set too low - at the end of the whole TSPITR process the automatic instance failed due to exceeding the processes limit. This was probably due to some additional job processes, not sure why there were so many of them. Thus the processes value at 150 is much better setting.
Another thing to tune is the SGA_TARGET - for small transactions the default setting is probably ok - our process run smoothly through the files restore phase and majority of the recover phase, but not the whole. In our case doubling this value with setting 200M exclusively for shared pool helped, but this may vary.

What if it failed anyway
Failing the automatic/auxiliary instance is really a disaster to the TSPITR process, but we are able to run it again reusing partially the work done in the previous run. Unfortunately when failing the auxiliary instance removes all the files in auxiliary destination, which means we loose auxiliary control file and datafiles for UNDO, SYSTEM and SYSAUX.
The datafiles for recovered tablespaces are usually restored in the final production destination and they are left as of the moment of the auxiliary instance crash and this is what we could reuse, assuming the TSPITR process did not crash after open resetlogs operation. To use them we need to add to the RMAN RUN block CONFIGURE AUXNAME commands.
Depending on the TSPITR phase the whole process may fail leaving also files in the auxiliary destination, thus we are able to continue the TSPITR process manually. In such case first thing is to remove causes of the failure. One needs to create also its own pfile, where especially important is to know the value of the db_name parameter - those parameters may be taken from alert.log file of the auxiliary instance. After that we start the auxiliary instance.
There is a clone control file in use, so we need to mount database as clone database (ALTER DATABASE MOUNT CLONE DATABASE;).
Open it was a little tricky - we had to use RMAN (and not SQLPLUS) and connect as to auxiliary instance (rman auxiliary /), then simply open with resetlogs. After this moment the rest is quite simple - the best way is to follow a log from some previous TSPITR (such log may be found on Metalink for example). One needs to set the recovered files as READ-ONLY and create an alias for a dictionary to use by Data Pump (it is good to set the same dictionary as in the primary database). Then goes an expdp (expdp dumpfile= transport_tablespaces=, possibly TRANSPORT_FULL_CHECK, if needed).
Afterwards one may close the auxiliary instance, as this is not needed anymore. Then impdp to the primary database, setting tablespace to READ-WRITE again and voila - the thing is done.

Kudos to Pawel Smolarz :-)

Partitioning by interval

Few notes:
  • interval needs to be a constant - we had here a time recorded as UNIX timestamp, thus in the database it was a NUMBER column to store those values - in this case to use interval we may partition by days or weeks
  • interval constant type needs to be compatible with partition key data type (ie. number with number, date with time interval) - which is somewhat obvious
  • at least one partition needs to be created in a "static" way (with the definition of the table) - so a database will have a starting point for next partitions, despite that the last "statically" added partition can not be removed - such activity finish with error ORA-14758.
  • in order to archive/delete older partitions one may:
    • truncate the first "static" partition (ALTER TABLE blabla TRUNCATE PARTITION dont_delete DROP STORAGE UPDATE INDEXES;)
    • drop chosen partitions despite of the first one (ALTER TABLE blabla TRUNCATE PARTITION sys_536 UPDATE INDEXES;)
    • it is easy to automate dropping old partitions by dropping after every period the partition with partition_position=2 in dba_tab_partitions (or possibly an equivalent for subpartitions)