Tuesday, 20 November 2012

Distributed transactions holding locks on some resources

One of the main problems with XA transactions seems to be locking of rows by distributed transaction. While this is perfectly normal, when locking lasts a short time, it becomes a nightmare when it lasts forever. There are few scenarios, which may end up with such long-lasting lock.
Many comprehensive articles may be found on Metalink:
  1. Solving locking problems in a XA environment [ID 1248848.1] - short, thorough and I like it
  2. Manually Resolving In-Doubt Transactions: Different Scenarios [ID 126069.1] - long and boring
In many scenarios the issue should resolve automatically. If this is not the case then theoretically the distributed transaction should be recognized by the resource manager as 'in-doubt' and show up in the dataset from the dba_2pc_pending view. If so then a DBA should settle with a transaction manager administrator if the transaction 'in-doubt' should be commited or rollbacked and then on the resource manager side the DBA should call appropriately commit or rollback with force option and local transaction id as an argument. Sometimes additionally one needs to call DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('[LOCAL_TRAN_ID]');
Even easier such situation may be resolved from XA client side - we may prepare a fake client, which binds to the database as a side of the chosen distributed transaction and commit or rollback it according to the needs, while whole the cleaning will be done by standard mechanisms (which is recommended way of dealing with such issues) - such example client code may be obtained from the first mentioned article.
Yet another way of resolving the issue is the database reboot (but this is not always possible).

The distributed transaction goes through few phases.
When in 1st phase - not yet prepared - the locks are seen in v$lock view and diagnostic information is easy to obtain.
After prepare call and response those locks are no longer seen in v$lock and then the diagnostics is more hard. Yet another possibility here is the transaction process on the transaction manager side was killed by the system - then we end up with an active transaction still blocking some resources, but there is no locks seen (there is however entry in the v$locked_object view, so at least we may find objects on which there are some locked rows) and no session attached.
In this scenario we may use the excellent view from Tanel Poder changing the equal join to a left join between sys.x$k2gte2 and v$session - from this we may see that some transactions in x$k2gte2 have no corresponding row from the v$session view - these are our candidates for forced transaction finish - in x$k2gte2 we may find the global id of the transaction 'in-doubt' and the local id can be constructed from the join with sys.x$ktcxb (x$k2gte2.k2gtdxcb=x$ktcxb.ktcxbxba) on columns kxidusn, kxidslt, kxidsqn (it seems no need for left join here) so we may do something about it.
Once there was a workaround published on Metalink, which involved an insert into pending_trans$ and pending_session$, but I could not find it lately - so I suppose they recommend rather the use of fake client here. However one may still read about it after some google search - voila http://tazlambert.wordpress.com/2011/11/24/ora-01591-lock-held-by-in-doubt-distributed-transaction-string/ (look at the end of the article).
As a kind of curiosity I may add that I found even a x$k2gte2 entry, which had no corresponding entry in the sys.x$ktuxe structure or the v$transaction view. In sys.x$ktcxb there still was an entry however(that is why I assumed no left join needed here), but all the parts of local transaction id were equal to 0, so I suppose this trash has no impact on the database activity and the only way to clean it is a reboot.

No comments: