Monday 9 March 2015

Jammed AQ notification

Problem definition

One of frequently performed activities in my job is preparing a test environment based on data from production. This is quite lengthy task as the main goal is to create a database as close to a productional one as possible. There are of course some differences - cheaper platform is one of them, another one is limited data volume, yet another limited number of external systems - in turn they implicate further differences.
Thus the most effective way to achieve such goal is to leverage the Data Pump technology and number of home made scripts for partially "manual".
From time to time, while creating such environment, shows up a problem with the automatic AQ notification feature, which is used quite extensively, so we have to get it working properly. One of scenarios, when it comes to the AQ notification jam is the following:
  • we try to check the AQ notification health
  • we run a script, which creates a demo queue and put there a message
  • however during load we implicitly get the situation described here
  • the notification job fails
  • we (i.e. AQ) are stuck
After such sequence of events further enqueues to the AQ queue, which should run automatic callback, fail. I am unable to provide exact reason as there may be a plenty of them - old messages left in notification queue (SYS.AQ_SRVNTFN_TABLE_1), error messages in corresponding error queue (SYS.AQ$_AQ_SRVNTFN_TABLE_1_E), automatic AQ callback job failure due to ORA-28031 and some internal reminiscences after that and so on, but I provide here a sequence of activities to fix it.

Solution

The main goal of the solution is to get rid of the old messages in the queue. First one may look for a post like this one, which tells us, what to do with messages, which are moved to the error queue. In general it is necessary to enable the error queue for dequeue and get all the messages from there.
Next step is to run very similar piece of code to dequeue "manually" all the messages from notification queue - after that the AQ notification works like a new. Additionally I restarted the database, but not sure if that was important. It may be possible You find a locking problem, when dealing with queues - in my case there was a job, which was fired on the notification queue and I was unable e.g. to enable the error queue for dequeue - I managed with it by killing the offending session.