Wednesday, 18 January 2012

ROLL_INVALID_MISMATCH reason for not sharing cursor

Since 10g there was introduced a change in the way the cursors are invalidated. Now they are invalidated in a rolling fashion (i.e. not all at once), hence the reason ROLL_INVALID_MISMATCH.
The excellent article on this theme at The Dutch Prutser's Blog

Monday, 16 January 2012

Cursors with high VERSION_COUNT

Symptoms
Since upgrade to 11gR2 from time to time we experience problems with excessive number of cursor children per some cursors. Symptoms are common - many active sessions with waits on mutex X, mutex S, library cache latch, and few others, high VERSION_COUNT for those cursors.
What happens?
It seems those all waits are due to very slow work of shared pool for those cursors. My understanding of this issue is all the children of such cursor are sharing the same sql_id and hash_value, so eventually they end up in the same bucket of library cache and the more of them, the longer sessions need to hold the latch to library cache or mutexes in order to check all the children to choose the possibly matched for sharing.

Root causes
The root causes for multiplying the cursor children are various. Some reasons are provided by the V$SQL_SHARED_CURSOR, there is a plenty of bugs (search term high VERSION_COUNT on Metalink).
In our case the reasons are few and not sure the concretes, as few explanations are possible. The bug is one of them, however as probable is wrong use of bind variables. From V$SQL_SHARED_CURSOR we get as a reason mainly BIND_MISMATCH, with possible addition of BIND_LENGTH_UPGRADEABLE. Partially this is due to the change of a bind variable length (the buffer for bind value is allocated up to 32 bytes or up to 128 bytes or up to 2000 bytes - at least those were revealed by observing trace 10046, so I would not say how it is beyond 2000 bytes - are there other thresholds or the bind possible maximum is the limit).

Workarounds
Whatever the reasons workarounds are similar and I dare to say "officially" used. What is important this behavior is not seen on 10g (or I would rather say it is hidden). This is due to the obsoleting cursors with more than 1024 children, what is not a case with 11g. So the clue of a workaround is mimic this by:
  1. workaround in the form of purging excessive children with use of dbms_shared_pool.purge
  2. hidden parameter _cursor_obsolete_threshold and the MOS article 10187168.8
True solutions They depend on the root causes. This varies from applying patch to changing the SQL to changing bind variables (size or type) to impossible or better say independent of our doings - great example/article is the entry by Martin Klier.